Company
-
September 14, 2023

Enabling self-serve data quality with Bigeye

Is "self-serve" data quality possible? Sure it is. Take a spin through the ways Bigeye enables data teams to self-serve the data they need.

Liz Elfman

Many data teams aim for"self-service", with analysts who have direct access to data and tools. But in practice, access alone doesn't always enable frictionless self-service.

Without proper data quality monitoring and governance, "self-service" wastes time and muddies the clarity that analytics are supposed to bring. Enabling comprehensive data quality capabilities for analysts is easier said than done.

It requires an advanced, streamlined system to constantly tune alerts, debug root causes, and resolve issues. This ends up being kicked back to the data platform and engineering teams, creating bottlenecks. So how can organizations make self-service data quality attainable?

The key is using technology to remove the manual toil from quality management and accelerate issue resolution. This is where Bigeye comes in.

How Bigeye enables self-serve data quality

Bigeye was designed to make it easy for data teams to monitor their data quality, set SLAs, and get notified of any issues before they turn into downstream problems. Some key capabilities that enable self-service data quality:

Metadata metrics

Bigeye's Metadata Metrics provide instant observability across your entire data warehouse by automatically tracking key operational metrics. They require zero manual configuration and are enabled as soon as you connect your data source. They work by scanning existing query logs in your data warehouse to monitor metrics like:

  • Time since the last table refresh
  • Rows inserted per day
  • Number of queries per day

This allows you to quickly detect common data pipeline issues like stale data, irregular data loads, and changes in table usage, without any additional load on your warehouse.

Metadata metrics are a key part of Bigeye's "T-shaped monitoring" approach:

  • Broad coverage across all data via Metadata Metrics
  • In-depth monitoring on critical tables using suggested and custom metrics

Bigconfig templates

Bigconfig is Bigeye’s configuration-as-code tool that allows easy setup of comprehensive data monitoring across warehouses. It uses a simple YAML syntax to define tags, metrics, and monitoring collections.

Bigconfig empowers self-service data quality in two key ways:

Efficient monitoring setup

Analysts can instantiate monitoring for common data types like IDs, amounts, and timestamps with just a few lines of configuration. Tags identify field patterns across tables. Metrics apply checks like uniqueness and distributions to tagged columns. This removes the manual work of configuring every single column.

Custom business logic

Bigconfig makes it easy to define custom metrics using SQL snippets. Analysts can build specialized checks for business data without coding entire scripts. For example, validate expected values in JSON data, flag payment amounts outside of ranges, etc.

Other benefits include:

  • Organizing metrics into collections for clear organization and ownership
  • Adjusting monitoring as needs change by updating the config file
  • Deploying monitoring across environments using Infrastructure-as-Code tools
  • Metrics adapt intelligently over time with anomaly detection

With Bigconfig, analysts can set up and evolve data quality monitoring themselves. By coding metrics instead of rules, Bigeye removes friction from self-service governance. This means faster observability and accelerated delivery of analytics.

Autothresholds

A key challenge with manual threshold-based alerts is that they require constant tuning as data patterns change. With autothresholds, Bigeye removes this friction by analyzing historical data to calculate dynamic thresholds that adapt as trends evolve. Analysts don't have to manually define rigid thresholds that go stale.

Anomaly detection

Bigeye uses advanced anomaly detection techniques to identify unexpected changes or unusual patterns in data. This is a key capability for enabling effective self-service data quality.

Simple threshold-based alerts require manual setup and tuning. With potentially thousands of metrics to monitor, this doesn't scale. Bigeye's anomaly detection is automated, adapting to changing data patterns over time.

Additionally, accurate anomaly detection finds subtle issues missed by basic methods. It understands trends and seasonality in data. Bigeye uses multiple statistical models to minimize false positives and false negatives.

The system also learns from user feedback, improving over time. Excluding anomalies from baselines prevents noisy data from skewing detection.

Easy root cause detection

When anomalies are found, Bigeye provides insights for rapid investigation:

  • Root cause analysis shows upstream data lineage and related issues.
  • Impact analysis reveals how downstream data may be affected.
  • Timelines, graphs, and sample queries assist with debugging.

This means analysts can easily diagnose and resolve many data issues without engineering support. If needed, they can clearly describe the problem to route to the right team.

BI integrations

Bigeye integrates directly with Tableau to provide visibility into report usage and data lineage. This empowers analysts to quickly trace data issues impacting business intelligence. For example, tables used in Tableau are mapped to backend warehouse sources. Analysts can trace report failures back to underlying data issues through interactive lineage graphs.

Bigeye also displays popularity metrics for Tableau reports and dashboards. Analysts get visibility into the most accessed visualizations to prioritize critical data flows.

With these capabilities, Bigeye provides the baseline data quality and trust needed for advanced self-service analytics. Users have the flexibility to define rules and get alerts tailored to their specific needs, while also benefiting from organization-wide standards, governance, and reliability. Let's walk through some examples.

Financial data

Many companies rely on Stripe transaction data to make key business decisions around revenue, refunds, fraud, and more. Bigeye makes it easy to monitor critical aspects of Stripe data quality and integrity. Its Stripe Bigconfig template provides out-of-the-box monitoring collections for:

General data integrity metrics

  • Unique transaction IDs across tables
  • Non-null ID fields
  • Proper join integrity between transaction tables

Balance transaction integrity

  • Valid currency codes
  • Reasonable transaction amounts
  • Matching amounts for charges and refunds

Accurate sales data metrics

  • Number of successful sales and refunds
  • Chargebacks and disputed transactions
  • Invoice totals match charges
  • Fee amounts in expected ranges

Custom business metrics

  • Revenue totals by currency, product, geo
  • Refund rates by product line
  • Average subscription fee over time
  • Expected JSON fields and values

CRM data

Another common data source that people want to monitor is CRM data, e.g. from Hubspot. Bigeye’s Hubspot Bigconfig template makes it easy to monitor things like:

  • Validating that primary keys like company IDs and contact IDs are unique and not null, and that they stay intact across tables like contacts to companies.
  • Validating that key fields like industry, lead source, conversion events, tech stack, segments, and deal stage/type are the expected values and in the expected ranges.
  • If integrating external enrichment data, validating that these fields stay in sync.

Final thoughts

High-performance self-service analytics relies on trusted, high-quality data. Otherwise, analysts spend their time fighting data issues rather than driving insights. Bigeye provides a flexible, collaborative platform that empowers users to take control of their data quality. The result is happier analysts, faster insights, and analytics that scale across the business.

Try out Bigeye's self-service approach to data quality today by requesting a demo today.

share this episode
Resource
Monthly cost ($)
Number of resources
Time (months)
Total cost ($)
Software/Data engineer
$15,000
3
12
$540,000
Data analyst
$12,000
2
6
$144,000
Business analyst
$10,000
1
3
$30,000
Data/product manager
$20,000
2
6
$240,000
Total cost
$954,000
Role
Goals
Common needs
Data engineers
Overall data flow. Data is fresh and operating at full volume. Jobs are always running, so data outages don't impact downstream systems.
Freshness + volume
Monitoring
Schema change detection
Lineage monitoring
Data scientists
Specific datasets in great detail. Looking for outliers, duplication, and other—sometimes subtle—issues that could affect their analysis or machine learning models.
Freshness monitoringCompleteness monitoringDuplicate detectionOutlier detectionDistribution shift detectionDimensional slicing and dicing
Analytics engineers
Rapidly testing the changes they’re making within the data model. Move fast and not break things—without spending hours writing tons of pipeline tests.
Lineage monitoringETL blue/green testing
Business intelligence analysts
The business impact of data. Understand where they should spend their time digging in, and when they have a red herring caused by a data pipeline problem.
Integration with analytics toolsAnomaly detectionCustom business metricsDimensional slicing and dicing
Other stakeholders
Data reliability. Customers and stakeholders don’t want data issues to bog them down, delay deadlines, or provide inaccurate information.
Integration with analytics toolsReporting and insights

Join the Bigeye Newsletter

1x per month. Get the latest in data observability right in your inbox.