The Data Letter

The Data Letter

When Your Tests Pass But Your Data Fails

Testing code isn't enough. Test your data too.

Hodman Murad's avatar
Hodman Murad
Dec 04, 2025
∙ Paid

Most data pipeline failures aren’t code failures. They’re data failures.

Here’s the testing framework that catches them.

Every Data Team Hits This Testing Gap

We write unit tests for our Python functions. We write integration tests for our API endpoints. We’re good engineers. So why do data disasters still happen?

Because traditional software testing assumes a stable environment. Your code runs the same way every time, but data pipelines operate in unstable environments:

Data failures.

Schema drift.

Freshness decay

Volume anomalies.

Here’s what I’ve learned after debugging too many of these: Code tests verify your logic. Data quality tests verify reality hasn’t changed unexpectedly.

Four Testing Patterns Your Data Pipeline Needs

After years of building pipelines that became production systems, I’ve converged on four testing patterns. Each catches different failure modes:

Business Logic Tests test that your transformations implement business rules correctly. These are your traditional unit tests, but focused on domain logic rather than language features.

Data Quality Tests validate that your actual data meets expectations, i.e., freshness, volume, and schema stability. These catch environmental changes your code tests can’t see.

Statistical Tests detect gradual drift in your data distributions. Volume tests catch sudden spikes. Statistical tests catch the slow degradation that precedes disaster.

Golden Dataset Tests verify your entire pipeline works end-to-end using small, known datasets with expected outputs. Integration testing without the pain of production data dependencies.

You need all four. Each catches failures the others miss.

Most teams start with business logic tests because that’s what we know from software engineering. Then they hit their first silent failure and realize they need data quality tests. Then they catch drift too late and add statistical monitoring. Then they struggle with integration testing and discover golden datasets.

Here are all four patterns with working code examples. Each pattern addresses a specific failure mode your current tests miss.

Keep reading with a 7-day free trial

Subscribe to The Data Letter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Hodman Murad · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture