Using Bigeye data lineage for actionable root cause and impact analysis
In this post, we’ll explore how data engineers use Bigeye’s built-in data lineage features to see where data problems are impacting critical downstream tables and identify the upstream root cause for faster, easier resolution.
Bigeye gives data teams comprehensive visibility into the health of their data pipelines so they can find and resolve issues faster. In this post, we’ll explore how data engineers use Bigeye’s built-in data lineage features to see where data problems are impacting critical downstream tables and identify the upstream root cause for faster, easier resolution.
Lineage-driven insight into which problems are impacting your business—and how to fix them.
In a Data Operations context, lineage is the path data takes from creation, through various databases and transformation jobs, all the way down to an analytics dashboard, machine learning model, or application. Visualizing lineage is useful for any DataOps tasks that can benefit from knowing where data is flowing to (like understanding the impact of a data quality problem), or where it’s flowing from (like understanding where a data quality issue originated).
At Bigeye, we use data lineage to give our customers a clear, complete view into how data issues are impacting their environment. As soon as you connect Bigeye to your data warehouse, it begins automatically parsing your query history and creating your lineage graph. The graph is available from the catalog, as well as the Issue management workflow. This allows users to see exactly which downstream tables, BI tools, and applications are being impacted by a particular data issue—and which upstream data sources it may have originated from.
Armed with this knowledge, Bigeye users can quickly prioritize an issue based on its impact radius, take steps to mitigate and alert business users, locate the root cause of the issue, and use helpful investigation and debug tools to quickly triage and fix it.
A data lineage use case
Let's take a look at a day in the life of Juan, a data engineer. A few months ago, before his company implemented Bigeye, Juan found out about a data issue after an executive noticed an error in her BI report and notified Juan’s boss. This kicked off a painful, time consuming investigation and resolution process. Now that Juan has Bigeye, let’s see how he can take advantage of lineage-driven impact and root cause analysis to help him solve issues more effectively and improve his quality of work.
Prioritizing issue response with impact analysis
One afternoon, Juan receives two alerts from Bigeye in Slack. Without impact analysis, Juan may have decided to just tackle these two issues in sequential order or choose to focus on the alert that is further outside its given threshold. Now, however, he can quickly review the lineage graph and see that the first issue isn’t directly impacting any downstream sources. He updates the issue priority to “low” and hops over to the second notification.
Juan filters the lineage graph alerting path to Tableau dashboards and immediately discovers this issue is directly impacting a critical executive dashboard. Within seconds, Juan has the context and insight he needs to take action. He sets the Bigeye issue priority to “high”, updates the status to “investigating”, and alerts the downstream dashboard users.
Investigating issue source with root cause analysis
From the same graph, Juan can now switch his view from downstream to upstream and investigate where this problem originated. He finds the furthest upstream table with issues that may be related to the one he’s investigating. By drilling into the table, Juan can see that the cardinality of the ‘user_email’ column has dropped dramatically and he suspects someone may have broken the ingestion on the source list. Once Juan has identified the pattern, he chooses to mute all downstream alerts from the source table, allowing him to focus on the resolving issue without extra noise.
Triaging with row-level anomaly debugging
Once he’s confirmed the pattern, Juan can use Bigeye’s automatically-generated debug query to investigate affected rows in the source table without leaving Bigeye. This gives him the context and information he needs to quickly fix the ingestion job that caused the issue.
Once the issue is resolved on the source table, Juan can use bulk actions to close the seven related issues on downstream tables in a single click.
Conclusion
Data engineering teams don’t have the time or resources to babysit data pipelines or write and maintain tests for every potential data quality issue in their environment. Even if they find a problem through a basic test, complexity and interdependence make it hard to properly prioritize the issue and resolve it. Bigeye gives data teams the tools they need to get clear insights into which issues are impacting the business, and how to fix them.
Monitoring
Schema change detection
Lineage monitoring