dbt vs. Dataform: Which Should You Choose in 2026?
Three months ago, I sat in a conference room watching a data team explain why their feature engineering pipeline was broken. They’d built everything in dbt, carefully orchestrating transformations that fed their fraud detection models. Then a consultant convinced leadership that Dataform was the future because it was native to BigQuery and free. The migration took two months. The ML models stopped retraining correctly for another three weeks after that because nobody realized that Dataform’s JavaScript templating worked differently from dbt’s Jinja. The engineering time and the fraud that slipped through while models went stale cost far more than any licensing fees. Understanding which tool actually solves your problems matters enormously when you’re building production ML systems, where the wrong infrastructure choice can lead to broken pipelines and missed model retraining windows.
Hey there! 👋🏿 I’m Hodman, and I help teams build reliable data infrastructure
Here are some recent articles you may have missed:
Both tools solve the same fundamental problem, but their approaches differ in ways that matter for your infrastructure.
Core Capabilities
dbt is an open-source tool that enables data analysts and engineers to transform data in their data warehouse more effectively, allowing users to define transformations as SQL queries and manage their dependencies. dbt compiles and runs your analytics code against your data platform, enabling you and your team to collaborate on a single source of truth for metrics, insights, and business definitions. The tool has become what many consider the industry standard for data transformation, with a massive community providing packages, integrations, and proven patterns.
Dataform takes a different philosophical approach. Dataform is an ELT service, fully integrated in BigQuery, that allows analytics teams to develop, test, and schedule complex SQL workflows, serving as the orchestrator for data transformation happening in BigQuery via SQL. While dbt uses Jinja templating alongside SQL, Dataform uses JavaScript with SQL that runs on top of BigQuery. This JavaScript approach offers more powerful scripting capabilities for dynamic model generation, but it also means you’re committing to the Google Cloud Platform ecosystem.
The deployment models differ in ways that matter for both your budget and your infrastructure. dbt Cloud offers the fastest, most reliable, and most scalable way to deploy dbt, allowing data teams to optimize their data transformation by developing, testing, scheduling, and investigating data models through a single, fully managed service via a web-based user interface. dbt Cloud typically costs around $100 per user per month. Dataform is provided as part of Google Cloud Platform, where you can trigger SQL workflows manually or schedule via Cloud Composer, Workflows, BigQuery Studio’s data pipelines, or third-party services. Datform itself has no licensing cost, though you pay for BigQuery compute to execute queries. The “free” label trips up teams constantly because they forget about the compute costs that scale with query complexity and frequency.
dbt supports data quality tests and unit tests for SQL logic, enabling you to validate the accuracy and reliability of your transformations in a structured, automated way. It offers core capabilities such as modular code development, automated testing, integrated CI/CD pipelines, built-in documentation, and clear data lineage tracking. With Dataform, data teams manage their SQL code and data asset definitions using software engineering best practices, such as version control, environments, testing, and documentation. Both tools handle dependency management automatically, but the ecosystem maturity around each tool determines how well they integrate into your broader data platform.
Data Engineering: Testing, Orchestration, and CI/CD
Production data pipelines require robust testing, clear documentation, and integration with orchestration systems that can handle complex dependencies and scheduling requirements.
dbt empowers analysts to build and manage data pipelines using SQL, reducing the need for dedicated data engineers. It uses an ELT approach that pushes transformations directly to the data warehouse, improving speed, scalability, and maintainability. dbt champions software engineering practices such as testing, documentation, and reusable macros, making it ideal for transformations that live primarily in the data warehouse. The testing framework is mature and extensible. You can write custom tests, leverage community packages for common patterns like data quality checks, and build sophisticated validation that runs as part of your CI/CD pipeline.
dbt tracks dependencies between models, ensuring that when you update one model, dependent models automatically re-run, keeping your transformations up to date. Every change to a dbt project is tracked, allowing you to roll back changes or view the complete history of transformations and data model updates. dbt automatically generates documentation that describes the models, sources, and relationships within your data pipeline, which can be viewed in a web-based UI, providing a visual map of your entire data transformation process. This lineage tracking becomes critical when you’re debugging why a downstream model produces unexpected results.
For teams using multiple data warehouses, dbt’s adapter architecture provides flexibility. dbt works across cloud environments such as AWS, Azure, and GCP, making it easy to integrate with modern data infrastructure.
Dataform offers direct integration with BigQuery and handles the operational infrastructure to update your tables based on dependencies between your tables and the latest version of your code, with lineage and data information tracked seamlessly through Dataform integrations. Dataform serves as an orchestrator, with all heavy lifting handled by BigQuery, and provides scheduling capabilities to run ML pipelines at regular cadence.
Both tools work with Git repositories for version control. With Dataform, you can connect your repository to third-party providers such as GitHub and GitLab, and commit changes and open or push code reviews from your web browser. dbt has more mature CI/CD workflows documented in the community due to its longer history and broader adoption across different cloud platforms.
ML and Data Science: Feature Pipelines and Model Training
This is where the comparison becomes critical for teams building production ML systems. Your transformation tool is the foundation of your feature engineering pipeline, and the architectural choices you make here determine whether you can ship models in weeks or get stuck in months of infrastructure work.
The fundamental challenge in ML feature engineering is consistency between training and serving. Data science teams often end up performing their own feature engineering, duplicating much of the data engineering team’s work and maintaining their own pipelines, because data scientists primarily work in Python. It doesn’t make sense to force them to use SQL or Java/Scala just to align with the data engineering team. This separation creates the risk of training-serving skew, where feature logic differs between model training and production inference, causing models that perform well in development to fail in production.
dbt is a powerful tool for feature engineering in data warehouses, enabling teams to build features modularly, facilitating collaboration, and ensuring seamless maintenance of data pipelines. Feature selection and feature engineering are crucial steps in my data science project, where raw variables are used to create meaningful, relevant features for machine learning models. However, businesses often deal with extensive, complex data that could quickly become a maintenance nightmare. Using dbt as the feature transformation layer means you write feature logic once as version-controlled SQL models, and both training and inference pipelines consume from the same transformations.
The practical workflow looks like this: dbt can orchestrate BigQuery ML workflows that handle feature engineering, model training, and predictions, with the dbt execution engine running steps in the correct order and enabling scheduling of model training and serving using the same infrastructure that drives analytics. You create dbt models that generate feature tables with proper timestamps and entity keys. When training a model, data scientists query these feature tables to build training datasets. When serving predictions, inference pipelines query the same tables, filtering for recent data. If macros are defined in the training pipeline to do feature engineering, they can be reused to transform data into the same shape for serving predictions.
Feature store integration is where dbt becomes particularly powerful for ML teams. Teams that have already built data pipelines in dbt can continue to use them with the Snowflake Feature Store, enabling data scientists and engineers to collaborate on features using both Python and SQL. This combination allows teams to share and reuse features while improving model time-to-value. Using dbt for data transformation pipelines ensures the quality and consistency of data products, which is critical for ensuring successful AI/ML efforts. The workflow works like this: you build feature tables in dbt, register them with your feature store, and the feature store handles point-in-time-consistent retrieval for training and low-latency serving for inference.
Point-in-time correctness is non-negotiable in ML systems. Feast joins tables with robust logic that ensures point-in-time correctness, preventing future feature values from leaking into models. Without a point-in-time, accurate view of the data, models are trained on datasets that are not representative of what is found in production, leading to a drop in accuracy. While dbt doesn’t handle point-in-time joins natively, it prepares the underlying feature tables with proper timestamps that feature stores can then join correctly. Feature stores like Feast generate point-in-time feature sets to prevent data leakage by ensuring that future feature values do not leak into models during training. This separation of concerns matters: dbt computes and materializes features, and feature stores handle the complex temporal joins.
Snowflake Feature Store retrieves point-in-time features using ASOF Joins, removing the significant complexity of generating the correct feature value for a given time period, whether for training or batch prediction retrieval. The integration between dbt and Snowflake Feature Store demonstrates this pattern clearly. The source data is managed in a Snowflake database. At the same time, the feature pipelines are orchestrated and executed using dbt, and the resulting feature tables are stored in Snowflake and registered as Feature Views to build ML datasets. This integration ensures that features stay up to date with each dbt run.
For teams using BigQuery ML, both dbt and Dataform can orchestrate end-to-end ML pipelines directly in the warehouse. dbt can orchestrate BigQuery ML workflows in which transformations such as feature engineering, model training, and predictions occur in sequence, using centralized ML features that are well-documented and tested. With Dataform and BigQuery ML, you can build end-to-end machine learning pipelines that extract data, prepare it for training, train models, evaluate model quality based on predefined criteria like accuracy or RMSE, and trigger inference on new data. Dataform sends SQL queries to BigQuery to retrieve relevant ML model metrics and, based on predefined criteria, asserts the model’s quality, stopping execution if model performance is unacceptable; otherwise, continuing to the prediction stage.
The main difference between the tools for ML workflows lies in their templating capabilities. For data scientists, what’s missing in dbt is a quick hook into the feature store and the ML workflow, though integrations like those offered by platforms such as Continual aim to bridge this gap. Dataform’s JavaScript templating becomes valuable when you need to generate many similar feature transformations programmatically. One of the most common use cases for the JavaScript API is to perform a similar action repeatedly, such as creating rolling-window features across multiple time periods. For example, you can define arrays for organizational structures and time periods, then use JavaScript’s map or forEach functions to generate feature tables for each combination dynamically. In dbt, you’d need to use macros and Jinja templating to achieve similar results.
For data quality in ML systems, the testing capabilities matter enormously. When paired with business logic validations, such as threshold-based tests for feature drift or freshness checks, dbt provides a feature pipeline that’s both trustworthy and auditable. dbt lets you write data quality tests quickly and easily on the underlying data, since many analytic errors stem from edge cases. Testing helps analysts handle those edge cases. These same testing patterns apply to feature data, allowing you to catch data quality issues before they corrupt model training or predictions.
The question for ML teams is whether you’re building features for batch or real-time inference. Both dbt and Dataform excel at batch feature engineering, where you refresh features on a schedule and serve predictions from precomputed feature tables. dbt’s incremental models dramatically reduce the time queries take to run by leveraging metadata to find long-running models to optimize, which matters when you’re materializing large feature tables with only new data arriving incrementally. For a real-time feature requiring sub-second latency, neither tool provides the complete solution. You’d need streaming feature pipelines using Kafka or Flink, with dbt or Dataform handling batch features, and streaming infrastructure handling real-time features, all flowing into your feature store.
Decision Framework
After working with dozens of teams making this decision, the pattern is consistent. The architecture of your data platform, not feature checklists, should drive your choice.
Choose dbt if you’re running a multi-warehouse environment, even if BigQuery is your primary platform today. Choose dbt if you have a mature data engineering team that wants to leverage the extensive package ecosystem and documented community patterns. The dbt Slack community and its wealth of best practices mean you’ll spend less time solving problems others have already solved. Choose dbt if you’re building ML systems that need to integrate with multiple feature stores.
Choose Dataform if you’re fully committed to Google Cloud Platform and BigQuery is your data warehouse. Dataform’s main advantage is that it’s a Google service natively integrated into BigQuery, and Google is actively developing the product with new features and integrations. The zero licensing cost becomes an advantage for teams where $100/user/month for dbt Cloud is a budget consideration, though you need to factor in BigQuery compute costs. Choose Dataform if you need JavaScript templating for dynamic model generation. Choose Dataform if you’re building primarily on BigQuery ML.
Conclusion
Both tools excel at what they do. Your feature store, orchestration layer, model registry, and serving infrastructure all need to work together coherently.
Choosing Dataform purely because it’s free without accounting for the commitment to Google Cloud Platform and the compute costs that scale with usage is a mistake. Choosing dbt because it’s “industry standard” without considering whether your team can effectively support the community ecosystem is equally problematic. Building elaborate feature engineering pipelines in either tool when you haven’t proven that ML is driving business value yet wastes engineering time.
If you’re a data engineer focused on analytics, either tool will serve you well as long as it fits your warehouse strategy. If you’re a data scientist building production ML systems, your transformation tool needs to integrate cleanly with your feature store and serving infrastructure. Test both tools against your actual use cases before committing. Build a proof-of-concept feature pipeline, combine it with your intended feature store, run it through a complete training and serving cycle, and see which workflow works for your team and infrastructure.
- Hodman Murad
Visit The Data Letter on Gumroad
References
Hopsworks. “Feature Engineering with DBT for Data Warehouses.” December 7, 2023. https://www.hopsworks.ai/post/feature-engineering-with-dbt-for-data-warehouses
dbt Developer Hub. “What is dbt?” https://docs.getdbt.com/docs/introduction
dbt Labs. “Deliver trusted data with dbt.”
https://www.getdbt.com/
DataCamp. “What is dbt? A Hands-On Introduction for Data Engineers.” October 30, 2024. https://www.datacamp.com/tutorial/what-is-dbt
Analytics8. “dbt (Data Build Tool) Overview: What is dbt and What Can It Do for My Data Pipeline?” September 1, 2025. https://www.analytics8.com/blog/dbt-overview-what-is-dbt-and-what-can-it-do-for-my-data-pipeline/
Jordan Volz (Medium). “Feature Engineering on the Modern Data Stack.” March 3, 2022. https://medium.com/@jordan_volz/feature-engineering-on-the-modern-data-stack-86387a001b41
Start Data Engineering. “dbt(Data Build Tool) Tutorial.” https://www.startdataengineering.com/post/dbt-data-build-tool-tutorial/
Airbyte. “What is dbt in Data Engineering, and How to Use It?” September 3, 2025. https://airbyte.com/data-engineering-resources/what-is-dbt-in-data-engineering
STX Next. “Intro to dbt in Data Engineering: Transforming Data.” April 16, 2025. https://www.stxnext.com/blog/introduction-to-dbt-in-data-engineering
Towards Data Science. “Agile Machine Learning with dbt and BigQuery ML.” March 5, 2025. https://towardsdatascience.com/agile-machine-learning-with-dbt-and-bigquery-ml-c067431ef7a9/
Google Cloud. “ML pipelines overview | BigQuery.” https://docs.cloud.google.com/bigquery/docs/ml-pipelines-overview
Mirko Gilioli (Medium/Google Cloud Community). “MLOps made easy with Dataform & BigQuery ML— Part 1.” October 15, 2024. https://medium.com/google-cloud/mlops-made-easy-with-dataform-bigquery-ml-part-1-22d74d14a2a2
Google Cloud. “Dataform.” https://cloud.google.com/dataform
Google Cloud. “Introduction to AI and ML in BigQuery.” https://cloud.google.com/bigquery/docs/bqml-introduction
Google Cloud. “Create a machine learning model in BigQuery ML by using SQL.” https://cloud.google.com/bigquery/docs/create-machine-learning-model
Dataform. “Tutorial: Building a Bigquery ML pipeline.” https://dataform.co/blog/bq-ml-pipeline
GCP Weekly. “Google Cloud Platform Resources Dataform.” https://www.gcpweekly.com/gcp-resources/tag/dataform/
Dataform. “Google BigQuery.” https://docs.dataform.co/warehouses/bigquery
Google Cloud. “Dataform documentation.” https://docs.cloud.google.com/dataform/docs
Google Codelabs. “Getting Started with BigQuery ML.” https://codelabs.developers.google.com/codelabs/bqml-intro
Feast. “Quickstart | Feast: the Open Source Feature Store.” https://docs.feast.dev/getting-started/quickstart
Kedion (Medium). “Creating a Feature Store with Feast.” April 28, 2022. https://kedion.medium.com/creating-a-feature-store-with-feast-part-1-37c380223e2f
Kubeflow. “Introduction to Feast.” July 31, 2025. https://www.kubeflow.org/docs/external-add-ons/feast/introduction/
Ong Xuan Hong (Medium). “MLOps 03: Feast Feature Store — An In-depth Overview Experimentation and Application in Tabular data.” June 28, 2024. https://medium.com/@ongxuanhong/mlops-03-feast-feature-store-an-in-depth-overview-experimentation-and-application-in-tabular-b9d1c5376483
Kubeflow. “Introduction | Kubeflow.” July 31, 2025. https://www.kubeflow.org/docs/external-add-ons/feature-store/overview/
feast-dev/feast (GitHub). “feast/docs/getting-started/quickstart.md at master.” https://github.com/feast-dev/feast/blob/master/docs/getting-started/quickstart.md
Made With ML. “Feature Store - Made With ML by Anyscale.” https://madewithml.com/courses/mlops/feature-store/
Feast. “Feast Python API Documentation.” http://rtd.feast.dev/en/v0.10.7/
Feast. “Feature Store - Feast Python API Documentation.” https://rtd.feast.dev/en/v0.10.5/
Red Hat. “Feast: The open source feature store for AI.” May 16, 2025. https://www.redhat.com/en/blog/feast-open-source-feature-store-ai
Snowflake. “Getting Started with Snowflake Feature Store and dbt.” https://quickstarts.snowflake.com/guide/getting-started-with-feature-store-and-dbt/index.html
dbt Developer Blog. “Snowflake feature store and dbt: A bridge between data pipelines and ML.” October 8, 2024. https://docs.getdbt.com/blog/snowflake-feature-store
Snowflake-Labs (GitHub). “sfguide-how-to-manage-features-in-dbt-with-snowflake-feature-store.” https://github.com/Snowflake-Labs/sfguide-how-to-manage-features-in-dbt-with-snowflake-feature-store
Snowflake-Labs (GitHub). “snowflake-demo-notebooks/Manage features in DBT with Feature Store.” https://github.com/Snowflake-Labs/snowflake-demo-notebooks/blob/main/Manage features in DBT with Feature Store/Manage features in DBT with Feature Store.ipynb
Snowflake Documentation. “Snowflake Feature Store.” https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/overview
Snowflake. “Getting Started with Snowflake Feature Store and dbt.” https://www.snowflake.com/en/developers/guides/getting-started-with-feature-store-and-dbt/
IntellaNOVA (Medium). “Unleashing Machine Learning Potential with Snowflake: Feature Store Explained.” November 19, 2024. https://medium.com/@ibbyrahmani/unleash-machine-learning-potential-with-snowflake-feature-store-explained-f7ca67e852aa
IntellaNOVA (Medium). “Snowflake Feature Store: Transform Machine Learning with Scalable, Reusable Features.” November 14, 2024. https://medium.com/@ibbyrahmani/snowflake-feature-store-transform-machine-learning-with-scalable-reusable-features-d3b1cb615f4b
Snowflake Documentation. “Working with feature views.” https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/feature-views


Hey, great read as always. Amazing how human factor matters.