‘It Works on My Laptop’ Is Not a Product Strategy

A Tale of Two Disciplines in Building Data Products

Dec 21, 2025

Organizations hire data scientists to discover what’s possible and engineers to keep systems running, then act surprised when these groups struggle to hand off work. The collision stems from asking two professions with opposing optimization criteria to share artifacts without designing how that exchange should work.

Understanding this as an interface design problem rather than a coordination failure opens the path to systematic solutions.

Welcome back to The Data Letter. 👋🏿👋🏿👋🏿 I’m Hodman! Here are some recent articles you may have missed:

How to Detect Model Drift When You Can’t Measure Performance - A four-pillar framework for monitoring ML models in production when ground truth labels arrive too late to prevent damage.
The Machine Learning Reality Gap - A practical maturity assessment that reveals where your MLOps capabilities actually stand and provides concrete actions to progress to the next stage.
When Your Tests Pass But Your Data Fails - The four testing patterns data pipelines need beyond unit tests: freshness monitoring, volume checks, statistical drift detection, and golden dataset validation.

A Tale of Two Disciplines

To understand why this handoff breaks down so predictably, we need to examine what each discipline actually optimizes for.

Data Science and Analytics: Discovery and Validation

Data scientists, ML researchers, and analytics professionals work to prove what’s possible. Their core deliverable is insight or validated capability. Interactive development environments (whether Jupyter notebooks, R Markdown, or equivalent tooling) enable the rapid iteration, visual feedback loops, and exploratory interrogation that drive their work forward. They’re done when the hypothesis holds, the model performs, or the analysis produces something actionable. Projects have clear beginnings and ends.

This mirrors academic and scientific methodology, where the goal is to produce a correct answer to a well-defined problem. You validate under controlled conditions, measure against benchmarks (AUC, RMSE, statistical significance), and move to the next question once you’ve proven your point.

Data Engineering and Platform: Reliability and Scale

Data engineers, ML engineers, and platform teams operate under different paradigms. They build for continuity. What they ship today must still function next quarter when context has faded, and the original author has moved teams. Their artifacts (services, pipelines, packaged components) are designed for observation and recovery, not just initial correctness.

Software engineering principles dominate here: build systems that degrade gracefully, instrument everything, assume the unexpected will happen. Measuring success means tracking uptime, monitoring latency distributions, and minimizing time to recovery when things break. Correctness matters, but sustained operation under real-world conditions matters more.

The Core Conflict

Both capabilities are necessary for data product development. The structural problem emerges when organizations treat the primary artifact of data science (the exploratory notebook) as a suitable input for engineering workflows without deliberate translation. A notebook demonstrating that a model achieves target performance is categorically different from a deployable service. Most research on production friction lives in this translation zone.

Where the Translation Fails

You’ll recognize these patterns if you’ve deployed before.

Reproducibility breaks down because exploratory development and production deployment operate under opposing assumptions. Reproducibility breaks down because environments become artifacts themselves. The model that works on your laptop depends not just on the code you wrote, but on months of accumulated state that shaped how that code runs. Handing this off means explaining what you built and reconstructing everything it relied on to function. This is where ‘it runs in my environment’ becomes ‘it won’t run in yours.’

Transferring this to production means archaeological reconstruction: tracing which dependencies actually matter, determining which versions interact safely, and uncovering what configuration the model requires. Production systems demand reproducible builds where containerized environments, locked dependencies, and externalized configuration eliminate ambiguity.

Integration costs compound next. Notebook code runs top-to-bottom with variables and data living in shared memory throughout the session. This makes experimentation fast because you can tweak one section and rerun without rebuilding everything from scratch. Deploying this requires transformation into modular packages with explicit boundaries, comprehensive tests, and programmatic entry points that schedulers can invoke. The refactoring burden often exceeds the original development effort.

Ownership remains undefined throughout. Data science considers a project complete when the model’s performance meets thresholds. Engineering inherits perpetual operational responsibility: serving infrastructure, monitoring dashboards, retraining pipelines, and drift detection. Without explicit handoff protocols, accountability becomes clear only when production incidents demand answers about who fixes what.

Architecting the Interface: From Conflict to Contract

The solution requires formalizing the handoff between disciplines through explicit specification rather than attempting to homogenize their working styles.

Define the Handoff Contract

Co-author a minimal production-readiness specification with your engineering counterparts. This document replaces arguments about whether code is ‘production ready enough’ with concrete, measurable requirements. Include:

Packaged code following consistent project structure conventions
Dependency declarations through pyproject.toml or requirements.txt with pinned versions
Input and output schemas using Pydantic models or equivalent type systems
Performance baselines and acceptance thresholds
Test coverage for core logic paths
Documentation of resource requirements and expected failure modes

With a clear contract, handoff decisions become binary. Does the model satisfy the specification? Then it’s ready. Does it fall short? Then, specific gaps must be resolved before deployment proceeds.

Build the Paved Road

Engineering teams create infrastructure where compliance becomes the path of least resistance. This means:

Standardized project scaffolding through Cookiecutter or Copier templates that implement production patterns from initialization
Automated CI pipelines that validate contract requirements on every commit without manual review
Staging environments mirroring production configuration, enabling integration testing before deployment
Reusable abstractions for recurring patterns: model serving frameworks, structured logging, observability integrations

When infrastructure makes the right approach the easy approach, data science teams concentrate on discovery and validation while production tooling handles operational concerns.

Building Products That Ship

For experienced teams, the next competitive edge comes from how deliberately they architect their handoffs. Technical sophistication matters less than interface design quality.

Your container orchestration platform and feature store might not be your most valuable infrastructure. Often, it’s the shared contract language and standardized handoff process between data science and engineering that determines whether products ship reliably. Teams that architect this interface deliberately, with clear roles, explicit contracts, and infrastructure bridging the gap, build data products that deliver both innovation and operational stability.

Dr Sam Illingworth

Dec 21

Thanks for another excellent post Hodman! Do you think that with the continued adoption of vibe coding that people will start to develop a more effective product strategy as they can really start to build everything from the ground up themselves?

1 reply by Hodman Murad

Ame_data scientist

Dec 22

I like it when people talk about scalability. if it works for you, can it still work for others when you're not around?

excellent post.

2 more comments...

The Data Letter

Discussion about this post

Ready for more?