When Your Tests Pass But Your Data Fails
Testing code isn't enough. Test your data too.
Most data pipeline failures aren’t code failures. They’re data failures.
Here’s the testing framework that catches them.
Every Data Team Hits This Testing Gap
We write unit tests for our Python functions. We write integration tests for our API endpoints. We’re good engineers. So why do data disasters still happen?
Because traditional software testing assumes a stable environment. Your code runs the same way every time, but data pipelines operate in unstable environments:
Here’s what I’ve learned after debugging too many of these: Code tests verify your logic. Data quality tests verify reality hasn’t changed unexpectedly.
Four Testing Patterns Your Data Pipeline Needs
After years of building pipelines that became production systems, I’ve converged on four testing patterns. Each catches different failure modes:
Business Logic Tests test that your transformations implement business rules correctly. These are your traditional unit tests, but focused on domain logic rather than language features.
Data Quality Tests validate that your actual data meets expectations, i.e., freshness, volume, and schema stability. These catch environmental changes your code tests can’t see.
Statistical Tests detect gradual drift in your data distributions. Volume tests catch sudden spikes. Statistical tests catch the slow degradation that precedes disaster.
Golden Dataset Tests verify your entire pipeline works end-to-end using small, known datasets with expected outputs. Integration testing without the pain of production data dependencies.
You need all four. Each catches failures the others miss.
Most teams start with business logic tests because that’s what we know from software engineering. Then they hit their first silent failure and realize they need data quality tests. Then they catch drift too late and add statistical monitoring. Then they struggle with integration testing and discover golden datasets.
Here are all four patterns with working code examples. Each pattern addresses a specific failure mode your current tests miss.
Keep reading with a 7-day free trial
Subscribe to The Data Letter to keep reading this post and get 7 days of free access to the full post archives.

