The Data Letter

n8n. local.

Hodman Murad — Thu, 28 May 2026 12:14:32 GMT

On Monday, I wrote about Google’s I/O keynote and the five overlapping consumer agents they shipped under one brand. Gemini. Gemini Spark. Android Halo. Information Agents in Search. Daily Brief. Five products, a hundred dollars a month, and you still have to figure out which one fits your task.

Operations teams want one working agent on one recurring job this week that they own end-to-end. Google has options on both ends of the spectrum (consumer apps like Gemini Spark and enterprise platforms like Gemini Enterprise). Neither is built for a small ops team looking to operate a recurring task.

Today, we build that agent.

In the next 30 minutes, you’ll have a private AI agent running on your own laptop. It costs zero dollars. No model weights leave your machine. Once the basic loop runs, plugging in a tool turns the agent into something that handles jobs for your team, such as drafting replies in your support inbox, pulling a number from a spreadsheet on a schedule, and triaging incoming customer support requests.

Subscribe now

What you’ll find inside this post

The full step-by-step setup
A downloadable n8n workflow file.
The setup errors you’re likely to hit and how to fix each one.
Three ways to put the agent on a job your team already does.
A video walkthrough of the finished agent

Build Your Own AI Agent Before Google Ships You Theirs

Hodman Murad — Mon, 25 May 2026 14:38:36 GMT

Google’s I/O 2026 keynote focused on fixing the repetitive work your team still does by hand every week.

AI Mode in Search just crossed one billion monthly users, with queries more than doubling every quarter. Liz Reid, who runs Search, described what’s happening as ‘the era of Search agents,’ and said Google’s new information agents run in the background, performing repetitive work and notifying users when something changes.

What an AI agent does

An agent has four moving parts:

It perceives an input.
It reasons through what to do.
It acts in your tools.
It remembers what happened, so the next run starts smarter.

Chat does the first two. Agents do all four.

Chat changed how quickly you can think and draft. Agents now manage what gets done while you’re working on something else.

Subscribe now

Why Google’s version isn’t the one your team needs

Google packaged these capabilities for consumers, but the packaging is messy. Google now has Gemini as the model, Gemini Spark as a personal assistant, Android Halo as a notification system for Spark, Information Agents inside Search, and Daily Brief as a morning digest. Five products under one company, each with its own brand, each doing a slightly different version of the same job. A consumer trying to figure out which one to use for which task is doing work Google should have done before the keynote.

The combination of a $100-a-month entry price and five overlapping product names means a team lead trying to put an AI agent into a workflow this month has to first decode which Google product to buy, then justify the spend, then hope the consumer-facing version maps onto a work task. None of that is how teams adopt new tools.

Agents are a genuine new capability. The consumer-facing version Google led with at I/O is built for personal life. Google’s enterprise agent products, such as the Gemini Enterprise Agent Platform and Agentic Data Cloud, operate on the Cloud side and target large organizations with platform teams and procurement cycles. Neither one helps a small operations team that wants an agent to triage their inbound support emails or pull their weekly numbers from three tools into one report.

Subscribe now

Your team’s first AI agent is closer than a Google subscription

You can put one agent on one recurring job your team already does, this week, without waiting for Google.

A data scientist can put one into a pipeline. A team lead can put one into a recurring workflow. A head of ops can put one into a standard process. The underlying parts (perception, reasoning, action, memory) are the same even when the job changes.

The skill that’s becoming valuable for every team isn’t knowing how to pay for a subscription.

This Wednesday, May 27th, at 8:30 AM ET, I’m running a live build: How to Build an AI Agent for Your Team. We’ll put a working agent together in real time using n8n.

🚀 Thursday’s post is the full write-up with the build and deploy steps. Get it here now.

Build a RAG System with NotebookLM in Under an Hour

Hodman Murad — Thu, 21 May 2026 14:01:19 GMT

I went live yesterday to talk about RAG and why it sits underneath every enterprise AI tool worth using. The session covered what RAG is, why operators and senior managers need to understand it, and how to manage one as it rolls out across a company.

RAG stands for retrieval-augmented generation. It’s a way to build AI systems that can answer questions using your own documents rather than guessing from general training data. When you ask a RAG system a question, it searches your documents, finds relevant pieces, and writes an answer based on what it finds. Every internal AI assistant your company is piloting right now uses some version of this.

Today, you’re going to build a RAG system on your own documents in under an hour. You’ll do it in Google’s NotebookLM, a polished interface built on top of the same retrieval-and-generation architecture your company is paying engineers to build. By the time you finish, you’ll have a working private RAG running on your laptop, and you’ll know roughly ten engineering terms well enough to use them in a conversation.

What You’ll Have at the End

A working RAG system reading from a folder of your own documents, returning answers with citations to the source files, running in your browser, free.

You’ll also have the vocabulary to walk into your next engineering meeting and say things like ‘what’s our chunking strategy?’ or ‘how are we handling grounding at the retrieval layer?’ and sound like someone who’s done the work.

Subscribe now

What You Need Before You Start

A Google account. NotebookLM is free with any Google account, and it doesn’t require Workspace.

A folder of documents you’re allowed to upload. PDFs, Word docs, Google Docs, plain text, web URLs, and YouTube transcripts all work. For this build, I’d suggest using your own writing or a project’s documentation. Don’t use your company’s confidential materials in a personal Google account. That’s a violation of every company AI policy I’ve ever read, and the point of this build is learning, not getting yourself in trouble.

For this build, I used three articles from TDL and one from my other publication, Between Thinking and Doing (BTD).

The TDL pieces:

From BTD:

AI Keeps Losing Your Train of Thought

It’s important for three of these articles to operate within the same domain so that synthesis questions have relevant material to work with. The BTD piece is unique, allowing retrieval to select from multiple sources rather than choosing any one. You can do the same with any documents you own or have permission to upload.

It takes about twenty minutes, but you can finish faster if you move through the steps quickly.

Step 1: Open NotebookLM and Create Your First Notebook

Go to notebooklm.google.com. Sign in with your Google account.

Click ‘Create new notebook.’

Subscribe now

What just happened, in operator terms. What you just did was initialize an empty RAG. There’s a structure waiting for documents, but no documents are in it yet. In engineering terms, you’ve created an empty vector store. A vector store is an indexed library where your documents are stored in a format that the AI can search. NotebookLM uses Google’s own vector store behind the scenes, so you don’t pick one or configure it. At your company, engineering will pick one (you’ll hear names like Pinecone, Weaviate, or Chroma), and that choice affects cost, speed, and the country where your data is stored.

That's the vocabulary from Step 1. Below the paywall, I'll walk you through seven more steps that get you to a working RAG running on your own documents. You'll learn how to ingest sources, watch chunking and embedding happen, run the three stress tests that show what your RAG can and can't do, and walk away with the ten engineering terms you'll need to lead any AI conversation at your company.

RAG in Enterprise AI: Why Most Companies Get It Wrong and How to Build It Right

Hodman Murad — Wed, 20 May 2026 13:19:38 GMT

If you missed the live this morning, here is the replay!

I walked through RAG, the technology behind every enterprise AI tool worth running, and the four areas senior managers and operators need to own when their company is rolling one out. About 20 minutes.

Tomorrow morning, I’m publishing a hands-on build: how to set up a RAG system on your company’s internal docs in under an hour. Same concepts from the Live, this time as something you can follow along with and have running by the end.

If you’ve been thinking about upgrading, this is a good week to do it.

UPGRADE

See you in the morning,

Hodman

NVIDIA and AI Inference Economics in 2026

Hodman Murad — Mon, 18 May 2026 17:36:48 GMT

Google and NVIDIA spent Google Cloud Next last month pitching the same idea from different angles: serving AI is getting cheaper, and they’re the ones doing the cutting. Google announced new chips designed specifically for serving AI to users, separate from the chips used to train models, a sign that running AI at the user-facing layer is now a distinct enough cost problem to deserve its own hardware. NVIDIA, partnering with Google on a new generation of cloud machines, claimed up to 10x lower cost per AI response and 10x more responses per unit of electricity compared to the previous generation.

Last week, I wrote about the infrastructure providers behind Frontier AI and the over $100 billion deals that Anthropic signed with AWS, Google, and Broadcom, which are shaping the future of frontier AI technology. That piece was the macro view: who controls the compute, the chips, and the power contracts that frontier AI runs on. What cheaper inference does, and doesn’t do, for the people doing the work is the micro level of this issue.

So why do operators, managers, and students still feel buried? Because cheaper inference doesn’t automatically translate into less friction in your day.

📡 Going live this week. RAG is the engine behind every enterprise AI tool you’ve already used and trusted. Glean. Copilot. Notion AI. The internal assistants your company is piloting right now. It’s also the thing nobody outside the data team is talking about. That’s a problem, because if you’re a manager or operator and you don’t understand RAG, you can’t tell the difference between an AI rollout that earns adoption and one that will be sundowned in six months. I’m going live on Substack this Wednesday, May 20th, at 8:30 AM EST to break it down: what RAG is, why it’s the foundation of every useful enterprise AI deployment, and why operators (not just engineers) need to understand it.
Join Me Here

Back to inference economics.

Cheap Tokens, Same Overwhelm

Gartner expects agentic AI workloads to burn 5x to 30x more tokens per task than standard chatbots, which means the falling per-token price is already being offset by rising consumption. The companies serving you AI will keep a healthy share of those savings, and the ones they pass along will get poured into longer context windows, more tool calls, and more autonomous loops. None of that, on its own, fixes the underlying human problem: the work itself keeps outrunning the worker’s ability to stay in context. Cheaper inference makes it economically viable to throw more AI at a worker without making the work itself any easier to do well. If you’ve felt that the tools got smarter but your workload didn’t get lighter, you’re reading the curve correctly. Cheaper inference is a supply-side phenomenon. It doesn’t reach the worker until something on top of the model reduces friction.

Who Pays for Cheaper Inference

The cost of running AI doesn’t disappear when per-token prices fall. It gets redistributed:

Frontier Labs absorbs some of it itself to keep its models competitive.
Hyperscalers recover it by bundling inference into platform contracts, the same playbook AWS ran with storage and bandwidth a decade ago.
Enterprises pass it through to end users as seat prices, usage caps, and tiered features.

The worker sits at the bottom of that chain. A 10x cost reduction at the chip level rarely translates into a 10x improvement in a worker’s day. By the time it filters through cloud contracts, vendor pricing, and product packaging, your team will end up with a marginally better tool and a slightly larger software budget. The savings are reinvested in additional capabilities for vendors to sell, rather than in capacity that the worker keeps.

What This Means for the Future of Work

The shape of work over the next few years will be decided by who can afford to deploy frontier inference broadly, and by how that inference is packaged before it reaches a worker’s desk. AI capability is becoming an organizational asset rather than an individual one. The worker at a company with a rich inference budget will get more out of frontier AI than one without, and that difference will widen as agentic workloads burn 5x to 30x more tokens per task than today’s chatbots.

Once every team has a frontier model, the orchestration layer will be what separates teams. Whether your context, decisions, and in-flight work are held together by a system or scattered across tabs determines how much of that frontier capability you can actually use.

And the human cost of bad orchestration grows with model capability. More powerful tools used badly create more interruptions, more half-finished threads, more cognitive debt. Cheap inference, poorly wrapped, is a faster way to feel overwhelmed.

Subscribe now

The infrastructure layer is solving its own problem. What still needs building is the layer between cheap compute and a working day, the one that decides whether all that capability turns into leverage for a person, or just more input to sort through. Asaura AI is one bet on that layer, built for people who already feel the difference between having access to a powerful model and having a successful day at work. The broader point applies regardless of which tool you use. In a world where the model is cheap and the work keeps expanding, the system that organizes your context, your priorities, and your decisions is the part that compounds.

Per-token prices will keep falling. Token consumption will keep rising. Both can be true, and both already are. What decides whether that ends up as leverage for you, or as a faster firehose pointed at your inbox, is the orchestration layer sitting between the chip and the chair. I’ll keep following this trend, both on the economics side and on what the layer between the model and the worker has to look like.

Subscribe to The Data Letter for more on the economics of AI and the future of work.

If you want a system that keeps your context, your goals, and your work organized on the days your brain pushes back, try Asaura AI.

GET ASAURA

Build Your First AI Data Pipeline in Python: From Raw CSV to Predictions

Hodman Murad — Sat, 16 May 2026 20:28:38 GMT

Most beginner machine learning tutorials end at model.fit() and model.predict(). They skip the part where your preprocessing has to run the same way on training data and on new data, every single time, in the right order, without you remembering seven different steps. A pipeline solves all of that.

In this tutorial, you’ll build a working scikit-learn pipeline that ingests a raw CSV of vehicle data, fills in missing values, scales the numeric features, encodes the categorical text features, trains a Random Forest model to predict CO2 emissions, scores its predictions, and saves everything to disk for reuse. You’ll also learn how to read your evaluation metrics skeptically, so you know whether your model has learned anything worth shipping.

What you’ll need before you start

You’ll need Python 3.10 or newer, a code editor (I’m using VS Code), and four libraries. Open your terminal and run:

pip install pandas numpy scikit-learn joblib

For the dataset, grab the vehicle emissions CSV linked at the bottom of this article. Drop it into a folder called firstaipipeline on your desktop. Open that folder in VS Code, then create a file called aipipelinetutorial.py next to the CSV. This will be your workspace.

Why pipelines matter for any serious ML work

A pipeline is an object that holds all the steps of your data preparation and your model together in a fixed order. When you call fit(), the pipeline runs each preprocessing step on the training data, learns the parameters it needs (the mean of each column and the set of categories in each text field), and trains the model. When you call predict() on new data, the pipeline applies the same learned parameters in the same order and then runs the model.

Without a pipeline, every time new data arrives, you have to manually fill in missing values, rescale numeric columns, and convert text columns in exactly the same way you did when you trained the model. Different settings on new data, and the model gets confused inputs and returns confused predictions. A pipeline handles both sides of this for you. It applies your training settings to new data automatically, and keeps your test data out of those settings while you’re training.

There’s a second benefit. Once your pipeline is built, swapping models becomes a one-line change. The quality of models improves with a fast pace of iteration.

If you like tutorials like this, I also just published a guide on creating your own AI productivity app on my other Substack, Between Thinking and Doing.

Loading the data and choosing your target

You’ll be working with this vehicle emissions dataset from Kaggle. Download the CSV, drop it in the same folder as your script, and you’re ready to load it.

Open aipipelinetutorial.py and start with the imports:

import joblib

import numpy as np

import pandas as pd

from sklearn.compose import ColumnTransformer

from sklearn.ensemble import RandomForestRegressor

from sklearn.impute import SimpleImputer

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

from sklearn.model_selection import train_test_split

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder, StandardScaler

Each import has a job:

Pipeline chains the preprocessing steps together
ColumnTransformer routes different columns through different pipelines
SimpleImputer fills missing values
StandardScaler puts numbers on a common scale
OneHotEncoder turns text categories into numbers
RandomForestRegressor is your model
The three metrics at the top tell you how well it performed

Now load the data and take a look:

[df = pd.read_csv(”vehicle_emission_dataset.csv”)

print(”Dataset shape:”, df.shape)

print(df.head())

print(df.info())]

You should see a table with 10,000 rows and 19 columns. Some columns are numeric, such as engine size, mileage, and speed. Others are text-based, such as vehicle type, fuel type, and road type. A machine learning model can only read numbers, so the text columns need to be encoded before they ever reach the model. This is one of the jobs your pipeline will handle.

You’re predicting CO2 Emissions. That column has to come out of your feature set, because if the model sees the answer during training, it isn’t learning anything useful.

target = “CO2 Emissions”

leakage_cols = [

“NOx Emissions”,

“PM2.5 Emissions”,

“VOC Emissions”,

“SO2 Emissions”,

“Emission Level”,

]

X = df.drop(columns=[target] + leakage_cols)

y = df[target]

Why drop the other emission columns, too? Because all the emissions in this dataset come from the same source: the engine burning fuel. CO2, NOx, and PM2.5 rise and fall together because they’re produced by the same event. If you let your model see NOx while it’s trying to predict CO2, it’s not really predicting anything. It’s looking up the answer in a different column. Researchers call this data leakage, and it’s one of the easiest ways to fool yourself into thinking you’ve built a great model when you haven’t. Drop these columns now, and you force the model to predict CO2 from the things you’d realistically know about a vehicle before it ever started its engine.

Code w/ Claude

Hodman Murad — Mon, 11 May 2026 10:02:22 GMT

6 MINUTE READ

I’m still thinking about the ideas from the talks at Anthropic’s Code with Claude conference in San Francisco, which I attended last week. Note to self: go to more developer days.

A lot of Code w/ Claude was aimed at software engineers building agents, but if you read between the lines, almost every announcement carries a direct consequence for how we’ll do data science, MLOps, and applied AI work. So instead of summarizing each session, I’ve put together five trends I think are coming for the data scientist role.

Subscribe now

1. A 2027 data scientist treats model upgrades as dependency bumps

The clearest theme across the keynotes was a directive to ‘build for the next model’. The teams getting the best results from Claude invest in automated evals, lightweight harnesses, and ambitious prototypes that don’t quite work today, so they’re ready when the next model lands.

For data scientists, this is a familiar idea wearing new clothes. We’ve spent our careers retraining and re-evaluating on rolling windows. What’s different now is the cadence. Anthropic shipped eight frontier models in twelve months. If your data product is wired tightly to the quirks of one specific model version, you’re going to spend 2026 doing migration work instead of building.

The 2027 version of this job looks more like careful pipeline work than reckless shipping. You build evaluation suites that catch regressions automatically, so you can swap in a new model and trust the results before any of it reaches a customer. You keep a list of ideas that are slightly out of reach today, so you can revisit them when the next model jump makes them feasible. Teams that ship safely treat model releases like any other software update. Same review process. Same rollback plan.

Subscribe now

2. A 2027 data scientist owns the connections, not the wrappers

A year ago, getting Claude to behave reliably meant writing a lot of supporting code. One of the talks showed how much of that work has now moved into the model itself, so developers no longer have to build it from scratch.

Picking the right tool for a task. Claude now decides which tool to use on its own, reliably enough that hand-written rules get in the way more than they help.
Retrying when tools fail. Claude notices the failure, recovers, and tries again, without a wrapper to babysit it.
Splitting documents into chunks for retrieval. The new million-token context window, paired with server-side memory management, removes the need for most custom retrieval pipelines.
Translating screen coordinates for browser automation. Claude now reads native screen resolutions and clicks the right place without conversion code.
Spinning up servers to run the code Claude writes. Claude now has its own sandbox built in, so it can write code, run it, fix it, and report back, all in one step.

The lesson for data science teams here is that the code you write today to patch model weaknesses will be irrelevant in a few months, because Anthropic will fix those weaknesses in the next release.

The 2027 data scientist spends less time hand-rolling RAG pipelines and retry logic, and more time building integrations into proprietary data and bespoke evaluation harnesses. The plumbing is being commoditized. The data, the evals, and the domain integrations are what stay valuable.

In case you’re new here, here are some recent articles that were very popular with our subscribers:

3. A 2027 data scientist thinks in two scaling axes

For years, the conversation about scaling language models was about training. Bigger datasets, bigger models, more upfront compute. The Code with Claude talk on test-time compute reminded me how much that conversation has expanded. There’s now a second way to scale, and it happens after the model is trained.

Reasoning models give you a second dial. Spend more tokens at inference, and you can get measurably better outcomes on the same model. Anthropic showed Opus 4.7 running a traffic simulation prompt at low, high, and max effort. At low effort, Claude finished in 50 seconds and produced a basic simulation, with the traffic light awkwardly placed in the middle of the road. At maximum effort, it took ten times longer and used ten times as many tokens, but produced realistic driving behavior, varied car types, and a correctly placed traffic light.

Anthropic also shipped a few new framework pieces. Adaptive thinking lets Claude decide when to think, in what order, and how much to think, instead of forcing a single thinking phase up front. Effort levels (low, medium, high, extra high, max) and task budgets (caps on tokens, time, or cost) let you tell Claude how hard to work and when to stop, so you can trade off cost, latency, and quality on purpose.

Subscribe now

For data science, this opens up some interesting design space. You can pick a smaller model with high effort instead of a bigger model with low effort, and the trade-off curves aren’t what you’d expect. When latency matters, a small model gives you the fastest first response. When the final answer matters more than the first token, a larger model running at low effort often finishes the whole task faster than a small model grinding through it. Run your own evaluations across a few effort levels, plot accuracy against tokens spent, and pick the setting where spending more tokens stops paying off.

There’s also an advisor strategy worth flagging, where a smaller model executes while reaching out to a bigger model for advice. One customer hit frontier-quality results at five times lower cost. For high-volume LLM workloads, that pays for itself in a quarter.

The 2027 data scientist evaluates models on two dimensions, size and effort, and reports performance per dollar rather than accuracy alone.

Subscribe now

4. A 2027 data scientist curates memory like a feature store

The session on memory and dreaming was the one I keep thinking about. Anthropic released two related capabilities here. The first is memory inside Cloud Managed Agents. Claude treats memory like a folder of files, reading and writing them with the same tools it uses for code. Each store has access controls, full version history, and protection against multiple agents overwriting each other’s notes. The second is Dreaming, a background process that reviews recent agent transcripts, finds patterns and shared mistakes across many sessions, and updates the memory store so the next day’s agents start out smarter.

If that doesn’t sound like online learning to a data scientist, I don’t know what does.

One customer saw a sixfold increase in task completion rate on a benchmark after turning on Dreaming. Another customer cut the number of wrong answers their internal AI tools gave employees on the first try by 90%. Same models, but with a memory layer that gets curated overnight.

ML teams now have a new surface to think hard about. Memory raises familiar questions about what to store, how to keep it fresh, who can write to it, and how to share it across teams. The substrate is different from anything we’ve worked with before. Claude reads memory as a folder of files it manages directly, which is a different setup from the vector databases data scientists usually reach for. Familiar retrieval techniques won’t always carry over.

The 2027 data scientist treats memory stores the way we currently treat feature stores. Versioned, observable, owned, with explicit policies for what’s allowed to write and what gets pruned. Whatever we end up calling the role, someone on the team will own this work.

Subscribe now

5. A 2027 data scientist runs work in parallel

A few demos showed Cloud Code running sessions in parallel via routines, kicked off by webhooks, schedules, or API calls, with the agent prompting itself rather than waiting for a human. Others showed multi-agent orchestration with a commander coordinating sub-agents, each with its own context window.

A new outcomes feature lets you write a markdown rubric describing success criteria, and the agent iterates until it satisfies the rubric or surfaces a clear failure mode. There are also prebuilt Google Cloud MCP services (BigQuery, Looker, and the Developer Knowledge API), giving Claude direct access to data warehouses and BI tools.

Data scientists already work this way. The work splits naturally into independent threads. Trying different feature combinations, comparing model variants, running evaluations, and exploring different slices of a dataset. We’ve done this work one cell at a time in notebooks because notebooks were the best tool we had.

The new pattern is asynchronous by default. You describe the task, write down what success looks like as a rubric, and a fleet of agents works on it in the background. You come back to a finished result, or a clear failure mode you can debug.

The 2027 data scientist spends less time running cells and more time writing rubrics, reviewing diffs, and curating memory. That’s a different skill mix than what got many hired.

Subscribe now

What to do this quarter

If I were running a data science team right now, I’d make 3 moves:

First, audit your eval coverage. If you can’t swap models in a day and re-run, you’re going to fall behind. Hard evals are the moat.

Second, identify one workflow where async multi-agent makes sense and prototype it. Hyperparameter sweeps, multi-model bake-offs, and bulk data quality checks are obvious candidates.

Third, start thinking about memory as a first-class artifact. Even before you deploy a system that uses it, the discipline of writing down what your agents and pipelines should remember between runs will sharpen how you think about reproducibility and learning.

The 2027 data scientist isn’t a different person from the 2025 one. Same instincts about evaluation, same suspicion of overfitting, same love of a clean baseline. The job is just bigger now. We have a second scaling axis to reason about, a memory layer to curate, and fleets of agents to orchestrate instead of notebooks to babysit.

The work ahead is the work data scientists already do well. Evaluating new tools rigorously, building thoughtful pipelines, and knowing when a 10x improvement is a signal versus a fluke.

Two years isn’t long. The data scientists who’ll thrive in 2027 are the ones who started experimenting today.

One more thing. Most data scientists I know are great at the work and terrible at starting it. The 2027 version of this job has more surfaces to manage, more decisions to make, and more opportunities to stall out. That's the exact problem I'm building Asaura AI to solve. It's for high performers who hit paralysis, whether that comes from ADHD, executive dysfunction, a project too big to wrap your head around, or a long day that left nothing in the tank. If any of that lands, Asaura 2 is live.

Databricks vs Snowflake vs Fabric

Hodman Murad — Tue, 05 May 2026 09:01:40 GMT

I spent four years managing the consumer banking portfolio at my consulting job. The most interesting projects my team worked on were always in the DACH region.

Although Canada, Singapore, and the UK have heavier overall regulation, DACH regulators demand a specific mix of data lineage, audit trails, and sovereignty that few other jurisdictions match.

That combination of capital rules, governance expectations, and cross-border complexity makes DACH a useful stress test. A platform good enough for a DACH bank handles strict requirements anywhere else.

The three data platforms my team always recommended to DACH clients and eventually helped procure were Databricks, Snowflake, and Azure Fabric. The choice among these three leading platforms always came down to three questions:

What cloud contracts do you already hold?
How skilled is your ML team?
Do you need to share data across legal entities?

Below is how each platform answers those questions.

Hey there! 👋👋👋 I’m Hodman Murad. I spent my entire career working in data science, most of it in consulting, some in SAAS. Earlier this year, I decided to start my own AI platform, Asaura AI. It’s a productivity tool designed for individuals with ADHD, neurodivergent minds, and those experiencing decision fatigue and executive dysfunction, helping them complete their tasks.

Yesterday, we released the first post-beta version of Asaura. If you’re looking to gain a productivity edge during your workday, or fall into any of the buckets I just mentioned, I think you should try Asaura.

I’m also documenting the build, as well as other topics that interest me regarding ethical AI, building AI for neurodivergent minds, and productivity frameworks on my other Substack, Between Thinking and Doing. You can check that out here.

Here is how Databricks, Snowflake, and Azure Fabric compare on those three questions.

Azure Microsoft Fabric

Microsoft’s unified analytics platform. Works best for organizations already deep in Microsoft 365 and Azure.

Strengths

OneLake stores one copy of data for reporting, AI, and governance
Native Purview governance satisfies BaFin, FINMA, and FCA documentation
EU data residency built in

Weaknesses

Newer platform; advanced ML features are still maturing
Licensing complexity without a Microsoft partner

Best if you already have a heavy Microsoft footprint.

Subscribe now

Databricks

Unified data and AI platform built on Apache Spark. Leads the market for MLOps.

Strengths

Mature MLflow integration preferred by ML engineers
Delta Lake's open format prevents vendor lock‑in
Runs on Azure, AWS, or GCP

Weaknesses

Higher cost than Fabric for pure warehouse workloads without significant ML
Steeper learning curve without existing Spark or Python skills

Best for organizations with substantial ML engineering capacity, or for any industry where fraud detection, personalization, or predictive maintenance requires daily retraining.

Subscribe now

Snowflake

Cloud data warehouse with strong SQL analytics and data sharing.

Strengths

Strongest SQL analytics performance
Familiar SQL interface reduces migration friction
Data Clean Rooms enable GDPR-compliant sharing among group entities, which is important for any holding company or franchise network that must share customer data across legal boundaries without violating privacy laws.

Weaknesses

MLOps significantly less mature than Databricks
Cost scales rapidly with compute‑intensive ML workloads

Best for teams where SQL analysts drive decisions and ML is a secondary concern (e.g., retail, logistics, or any business with distributed franchise or subsidiary data)

Subscribe now

Which one fits your situation?

Azure Microsoft Fabric was the strongest default choice for most DACH clients I worked with. Data sovereignty and existing Microsoft integration push it ahead.

Choose Databricks only when your team already has strong ML engineering capacity, and you need top‑tier MLOps.

Choose Snowflake only when multi‑entity data sharing and SQL analytics dominate over any ML workload.

Subscribe now

The next step I think you should take

The most important step before making your decision is to evaluate your current licensing, the skills of your machine learning team, and your compliance obligations in relation to the three options you are considering. Conduct a small pilot project with your top two candidates using a single real workload. This test will uncover costs that a comparison matrix might not reveal.

After four years of watching DACH banks navigate BaFin and FINMA, I trust this method. If a platform holds up under those conditions, it will hold up for you.

Subscribe now

IT Keeps Blocking Database Access. Here’s the Strategy That Gets It Approved.

Hodman Murad — Sun, 12 Apr 2026 20:12:01 GMT

Seven years of consulting will teach you something fast: the company hired us to solve their data problems, but IT didn’t always get the memo. I’d walk into an engagement with a clear mandate, and spend the first two weeks in a standoff with an IT department that had every reason to say no and almost no incentive to say yes.

I learned early that winning that standoff had nothing to do with being right. It had everything to do with speaking the right language. IT is optimizing for zero-surprise bills and zero data leaks. If you want access, you have to make their job easier, not harder. This article is the playbook I refined over seven years of doing exactly that.

IT Always Defaults to No

Before you write a request that gets approved, understand why no is the default. It’s risk aversion driven by three factors: compliance, cost, and liability.

Cloud cost exposure. A single analyst running SELECT * FROM large_table on a dashboard can double a cloud data warehouse bill overnight. A virtual warehouse consumes Snowflake credits while it runs. Credit limits don’t cut off spending instantly. When a threshold is reached, the assigned warehouse may continue running for a period before it is suspended, consuming additional credits in the process. This is true even when the most aggressive suspension setting is used. IT’s budget is on the line, and even well-intentioned queries can cause cost spikes when analysts don’t understand consumption-based pricing.

Identity and Access Management attack surface. Every additional person with a direct database connection string is a potential entry point. Unauthorized disclosure of credentials is among the leading causes of data incidents across enterprise environments. The principle of least privilege, which governs how IT controls access, means database access should be granted only to the degree necessary to accomplish a specific task. Access is meant to be the exception, not the baseline.

Liability and regulatory exposure. If you export sensitive data to an unencrypted Excel file on your laptop and that laptop goes missing, regulators can fine the company. Cybersecurity frameworks treat the loss of control over sensitive data as one of the most serious risks a company can face, and often one that must be disclosed. Storing sensitive data on laptops in unencrypted Excel files, without any record of who accessed it or when, is exactly the kind of control failure that creates regulatory and legal exposure. Excel sprawl creates audit nightmares that governed database access prevents.

What read-only access means to IT. Analysts say read-only and think ‘I can’t delete rows.’ IT hears read-only and thinks about all the things that become possible with that access: pulling down the entire customer database, running queries so heavy they slow down core business systems, or viewing compensation data and confidential deal information without authorization.

When to accept no versus push back. Accept no if IT offers a curated view, reporting replica, or sandbox environment. These signal a mature data organization that balances security with analyst access. If IT’s proposed alternative is to manually pull data into Excel, push back. Spreadsheets saved on local computers with no access controls, no version history, and no audit trail are a far bigger security and compliance problem than a governed database connection. That conversation needs to go up the chain.

If I’ve gotten your attention so far, what follows is the part that took seven years to learn. The templates, the compliance documentation, the risk mitigation proposals, the escalation scripts, and the three case studies that show how this all plays out. It’s all below.

5 BigQuery Features That Changed How I Write SQL

Hodman Murad — Sun, 05 Apr 2026 16:16:25 GMT

Happy Sunday! I hope you’ve all been well! I haven’t been keeping up with The Data Letter as much as I should have these past few weeks because I’ve been busy building and launching the beta for Asaura AI. For those of you who are new here, Asaura AI is a tool designed to help people who struggle with the high energy cost of starting a task. ADHD brains. Neurodivergent brains. Just plain tired brains. It’s built to give you a simple entry point so you can stop negotiating with yourself and just move.

The beta is finally ready for macOS and Linux, and I’d love for you to test the architecture. If you’ve ever felt paralyzed by a project that felt too big, your feedback is exactly what I need to refine the system. You can try it out and share your experience through our survey at the links below.

DOWNLOAD THE ASAURA AI BETA

TAKE THE ANONYMOUS USER SURVEY

I’ve been documenting the build over a 100 Days of Building AI video marathon. You can check that out on Substack notes or LinkedIn. Today is Day 91.

I’ve also been writing technical deep dives about the Asaura AI build on my other Substack, Between Thinking and Doing. Feel free to subscribe to that as well, or connect with me on LinkedIn.

Now that the main build is stable, I’ve got the space to get back into the rhythm of writing here every week.

Today’s article covers BigQuery features I've found myself teaching constantly over the past year. I share at least three of these with every data team I meet. These features solve problems people have worked around for years.

Part of what makes them so useful is that SQL has a structural quirk that we all know too well: you write clauses in a fixed order that doesn’t match the order they actually execute. BigQuery’s GoogleSQL dialect has been shipping fixes for this. The five features below are the ones I keep coming back to.

Prompt Engineering.

Hodman Murad — Sun, 22 Mar 2026 15:26:33 GMT

Ad-hoc prompting has a shelf life. It works fine when one person owns a single prompt and checks it regularly. It breaks down when three engineers are touching the same prompt string in different files, nobody’s tracking what changed, and malformed outputs are the first sign that something went wrong. At that point, the prompting is the least of your concerns. What’s missing is basic software management.

Why Ad-Hoc Prompting Breaks at Scale

Ad-hoc prompting works fine for prototypes. You’re iterating fast, the model’s responses are good enough, and the cost is trivial. When a prompt is serving real users under real SLAs with real billing attached, the economics look very different from a prototype.

A prompt is a dependency, just like a library version or a database schema. When it changes, outputs change. When outputs change, downstream parsers, classifiers, and user interfaces can break. Teams that don’t version control prompts lose institutional knowledge, can’t reproduce bugs, and have no rollback path.

The compounding factor is model updates. When using AI via third-party APIs, the LLMs behind those APIs can unexpectedly change. Like traditional ML models, LLMs can be refreshed or tuned without a significant version bump, meaning the model’s performance on your set of prompts can change without any notice. Without a versioning system, you can’t determine whether a degradation came from a prompt change or a model change, and that ambiguity is expensive.

Prompts Drift Like Code

Software engineers have spent decades building systems to manage code drift: version control, branching strategies, code review, CI/CD. Prompts need the same discipline, and most teams apply almost none of it.

There’s an unavoidable tension between keeping prompts close to the code versus an environment that non-technical stakeholders can access. Leaving it unresolved gets more expensive as the team grows.

Prompt management tools from observability platforms like Arize, Braintrust, and LangSmith offer Git-like version control for prompts and allow rollback if changes reduce quality, but tools alone won’t help if the discipline isn’t there. Prompt management tools are inherently limiting because they can’t easily execute your application’s code. Even when they can, there’s often significant indirection involved, making it difficult to test prompts with your system’s full capabilities, including tools, RAG, and agents. Keeping prompts in the same repository as application code, with the same review gates, removes that indirection.

A prompt change is a logic change. It controls what your system does and how it responds, and it deserves the same review process as any other change to your codebase.

Subscribe now

Testing Prompts vs. Testing Code

Testing code is relatively well understood. You write unit tests, integration tests, and end-to-end tests. Give a traditional function the same input twice, and you’ll get the same output twice, which makes it straightforward to write tests that check for exact results.

Prompt engineering is about conditioning a probabilistic model to generate a desired output. Each additional instruction or piece of context steers the model’s generation in a particular direction. That probabilistic nature means you can’t assert exact outputs. Instead of checking for an exact answer, you check whether the response has the right shape: whether it’s valid JSON, whether the label the model chose is one you’d accept, and whether the thing it extracted actually appeared in the source text.

Structured evaluations tell you whether a prompt change made things better or worse before it reaches users. Without them, you’re iterating on feel, which works fine until something goes wrong in a way you didn’t anticipate and can’t reproduce.

Eyeballing is useful as a final check, but it doesn’t hold up at volume or across teams.

When Prompt Engineering Pays Off (and When It Doesn’t)

Before committing engineering resources to a prompt infrastructure, it’s worth being honest about ROI.

Prompt engineering pays off when a model’s output goes directly into another system without a human reviewing it first, when the same prompt runs thousands of times a day, when each call carries a meaningful cost, or when output quality has a direct and measurable effect on what users experience. Without reliable evaluation, you can’t iterate, and without iteration, you can’t improve. Projects stall here more than anywhere else in the LLM lifecycle.

For tasks reviewed by humans before use, or where call volume is low enough that failures surface quickly, a lightweight log of prompt versions in a shared document may be sufficient. For anything in the first category, treating prompts as software assets is worth the investment.

In the next section, we get into the practical architecture: how to structure and version prompts, how to run A/B tests without exposing users to untested changes, how to build a test suite that runs automatically when a prompt changes, how to organise a prompt library across a growing team, how to track cost at the prompt level, how to detect when a deployed prompt starts degrading, and two failure patterns that illustrate what happens when these systems aren’t in place.

LLM Inference Costs

Hodman Murad — Sun, 08 Mar 2026 10:02:07 GMT

Training a model costs between $2K and $50K. You pay it once and move on. Inference is different. Every time your model responds to a user, you’re billed again. There’s no finish line.

Inference doesn’t work like that. Inference is a subscription you didn’t fully price. Once a model hits production, inference consistently accounts for 60-80% of total LLM operational spend across a 12-month horizon. For teams shipping to real users at scale, that number trends toward the high end.

A team that spends $5K on training will often spend $40K-$50K running that model in year one. This happens consistently across production teams.

Inference Costs Catch Production Teams Off Guard

Training has a natural end state. You’re optimizing toward a metric, you hit it, and spending stops. Inference has no finish line. Every user request is a billing event, and billing events compound.

Three mechanisms drive the bulk of that spend:

KV cache memory consumption. When your model reads a prompt, it saves a record of every word it processes so it doesn’t have to re-read them for each new word it generates. That saved record is called the KV cache, and it sits in GPU memory. The longer the prompt, the more memory it occupies. For long prompts, 40-60% of your GPU’s available memory can go to KV cache alone.

Idle GPU time. The GPUs powering your model rent for $2-$8/hour, whether they’re processing requests or sitting idle. User traffic isn’t steady; it spikes during business hours and drops overnight, but you can’t spin up a new GPU instantly. Bringing one online takes between one and three minutes, which is too slow to respond to a traffic spike. So teams keep extra GPUs running at all times just in case, and pay for them around the clock.

Per-token economics. Every word sent to the model and every word it sends back costs money. Tokens are just the unit the model uses to measure text, roughly three-quarters of a word each. What teams often miss is how quickly prompt length adds up. A request that includes 2,000 words of retrieved documents, a 500-word system instruction, and a 200-word question costs three to four times as much as a tightly written 700-word equivalent. Bloated instructions, uncompressed retrieved text, and redundant examples all push costs up faster than usage alone would.

Standard Optimizations Don’t Finish the Job

The first instinct is usually to optimize at the model level: quantize to INT8, swap to a smaller model, apply distillation. These are legitimate tools, but they’re not a complete strategy.

Quantization can yield significant savings on its own (up to 50% at INT8 and higher at INT4), but model-routing or batching-only approaches typically plateau at 20-30%, and even quantization’s gains erode if retrieval context and serving efficiency aren’t addressed alongside it. Smaller models help when the workload is simple, but they degrade on complex instructions. Teams frequently apply one of these levers in isolation, see diminishing returns, and assume they’ve hit a ceiling.

They haven’t. The ceiling is generally much higher.

Inference cost reduction isn’t a one-dimensional problem. It spans caching strategy, batching behavior, model selection, serving architecture, and hardware configuration simultaneously. Pulling a single lever and treating it as a complete solution is why teams frequently plateau at 20-30% savings with most optimization approaches, whereas a systematic approach can achieve 60-70%.

How to Know Which Lever to Pull First

The rest of this article walks through five specific optimization levers and the signals that tell you which to prioritize for your workload type.

I also cover a full case study from my most recent client: they were spending $50K/year on inference, and we reduced that to $15K without affecting model quality. Every tradeoff gets named.

LLM Inference Optimization: Cutting Costs by 70% Without Sacrificing Quality

This content is for paid subscribers. If you’re reading this and haven’t upgraded yet, you can do so below.

Your Dashboard Looks Great and Changes Nothing

Hodman Murad — Sun, 01 Mar 2026 10:01:05 GMT

Most dashboards get opened once and ignored for every decision that matters. Here’s how to tell if yours is one of them.

This problem is everywhere. Data teams build beautiful visualizations that stakeholders glance at once, nod appreciatively, then ignore for every decision that matters. You see the same confession across analytics forums, Slack channels, and team retrospectives. The analytics industry has mastered the art of building. We’ve failed at measuring whether it changes anything.

Welcome back to The Data Letter! Here are some recent articles you may have missed:

Subscribe now

Vanity Metrics vs Decision Velocity

There’s a difference between dashboards that look right and dashboards that change behavior. Most teams optimize for the first. They measure views, shares, and executive compliments. These are vanity metrics dressed up as success signals.

What matters is decision velocity: how quickly can someone move from seeing data to taking action? If your stakeholder opens your dashboard, scrolls through twelve trend lines, then schedules a meeting to ‘discuss next steps,’ you’ve built decoration, not infrastructure.

Back to diagnosing the problem…

Kill This Dashboard Test

Here's how you diagnose it. Ask your stakeholders these questions:

What decision did you make because of this dashboard in the last two weeks?

If they pause or pivot to talking about how useful it ‘could be,’ you’re watching theater.

If I removed this chart tomorrow, what would break?

If nothing breaks, nothing mattered.

When this metric moves, what do you do differently?

If they can’t name a specific action, the metric is decorative.

Signs you’re building something nobody uses:

Stakeholders request more charts instead of fewer
They want ‘visibility’ but can’t define what changes when they have it
They complement your color palette more than your insight clarity

Subscribe now

Building for Impact, Not Applause

Start with the decision, not the data. Before you touch Tableau, ask: ‘What’s the one thing this person needs to do differently?’ Then build backward from that action.

Strip everything else. Your stakeholder doesn’t need context, trends, and breakdowns. They need the number that triggers the action, and they need it unmissable.

Make the action as easy as the view. If someone sees ‘conversion rate dropped 12%’ and then opens Jira to create a ticket, you’ve failed. The button to create that ticket should be right there.

Test by removing features, not adding them. Show version A with ten charts. Show version B with two. Measure which one changes behavior faster.

Subscribe now

How to Build Dashboards That Drive Action

Ask yourself whether what you’re building will change anything.

Close the dashboard and watch what happens. If nobody’s behavior shifts, if no tickets get created, if no priorities change, you’ve answered your own question. Strip out everything that doesn’t trigger an action. Give stakeholders fewer charts and clearer next steps. Measure whether anyone does something different after seeing your work, not whether they say they like it.

Next up: a decision-first framework for building dashboards that stakeholders use to take action, complete with templates and a stakeholder questionnaire.

From Manual Export to Automated Pipeline: A Practical Playbook

Hodman Murad — Thu, 26 Feb 2026 10:01:46 GMT

Most data professionals have at least one workflow that should have been automated years ago. A weekly email with an attachment that gets manually copied somewhere. A folder of CSV exports that gets combined by hand every Monday. Work that doesn’t require judgment, just repetition, and repetition is exactly what code is for.

The first pattern uses Google Apps Script to automatically pull CSV attachments from Gmail and load them into a Google Sheet. The second uses Python and pandas to consolidate multiple CSV exports into a single clean output file on a schedule. Both have been tested and work.

The methods are practical enough for analysts automating their first workflow and straightforward enough for engineers who want a low-overhead solution they can hand off to a less technical teammate. Each section includes the code, setup instructions, and scheduling guidance.

All three files referenced in this article (both scripts and one sample CSV) are available in a secret gist for paid subscribers. The link is at the end.

Trapped in the Export Loop

Hodman Murad — Sun, 22 Feb 2026 10:41:01 GMT

You were hired to analyze data. What you actually do is move it. You click export, rename the file, drop it into a shared folder, and email the link. Then you do it again tomorrow. And the day after. At that point, the job title and the job description have fully parted ways.

This is the analyst-as-middleware problem, and it’s more common than most organizations admit. Someone with Python skills and SQL fluency spends their days downloading CSVs and managing attachment threads. No queries. No transformations. No insight generation. Just scheduled, manual data transport dressed up with an analytical job title.

Why Organizations Create Data Dead Ends

This doesn’t happen because companies are malicious. It happens because systems calcify. IT locks down database access for security reasons that made sense in 2011 and never got revisited. Legacy platforms don’t expose APIs or connect cleanly to modern tooling. Business units developed their own manual workflows years ago, and ‘how we’ve always done it’ carries more institutional weight than efficiency ever could.

The result is an informal architecture built from browser downloads, shared drives, and email attachments. Staying sharp is possible without system access. What it requires is intentional practice with the materials you already have.

Subscribe now

How Skill Atrophy Happens to Good Analysts

The cognitive cost isn’t obvious at first. You’re still technically working with data, but there’s a meaningful difference between analyzing data and transporting it.

When your job doesn’t require you to write queries, you stop writing them. Python loops you wrote from memory are starting to require documentation lookups. SQL syntax blurs at the edges. The joins you knew cold become joins you have to think about. Skills erode in proportion to how rarely they’re exercised, and manual export workflows exercise almost none of the skills that define analytical competence.

Six months in, you’re slower. A year in, you’re rusty. Two years in, you may start to believe the rusty version is the real one.

If your daily work has drifted away from the technical side, the previous issue is worth your time. LLM Fine-Tuning on a Budget walks through dataset preparation, how LoRA works, and how to evaluate a model when you don’t have labeled test data. It’s structured, technical, and built for people who want to stay capable.

Red Flags That Confirm You’re Trapped in Manual Export Work

Watch for these patterns in your own workflow. Your browser history is dominated by downloads, report portals, and file transfers. Shared drives function as your team’s data architecture, with folders named by date and maintained by convention rather than design. Email is your ETL pipeline, literally, attachments moving data between people because systems won’t talk to each other. You haven’t written a query against a live database in months. Your ‘analysis’ begins after someone else pulls the data and ends before anyone asks how it was prepared.

If more than two of those describe your week, you’re not doing data analysis. You’re doing data custody.

Subscribe now

How to Preserve Your Skills Without Changing Your Job

Getting out of this situation runs through your relationship with the data you already handle, not through a job search.

Local sandboxing means running every export you receive through a script before you manually touch it. Parse it with Python. Load it into a local database. Write the query that answers the question, even if you’re just going to paste the answer into a report anyway. The analysis muscle stays active because you’re actually using it.

Shadow analysis means building things no one asked for. You have the data. Model something. Build a dataset that answers a question your team hasn’t thought to ask yet. Don’t send it anywhere. Just build it. The practice is the point.

Document the waste. Track, in hours, how much time you spend manually transporting data each week. This isn’t venting. It’s evidence. Quantified inefficiency is the clearest argument for change.

Turning Manual Exports into Automated Pipelines

Getting out of this situation runs through your relationship with the data you already handle, not through a job search. Escaping manual export workflows doesn’t always mean landing somewhere better. More often, it means automating the situation you’re already in.

‘Automation Playbook for Trapped Analysts’ gets into that next. Specific tools and workflows. No direct database access required. Just the CSV in your inbox and a decision to stop moving it by hand.

Happy Sunday! See you next week.

Paid subscribers get the full archive. That includes the vector database implementation guide covering Pinecone, Weaviate, and Chroma, with cost formulas and query optimization patterns; the advanced model drift detection guide with diagnostic workflows and threshold tuning; and the fine-tuning issue guide with dataset templates and a cost-tracking spreadsheet. The Automation Playbook for Trapped Analysts will be paid. If automating your way out of manual export work is relevant to you, it’s worth being there for it.

Subscribe now

LLM Fine-Tuning on a Budget

Hodman Murad — Fri, 20 Feb 2026 12:19:12 GMT

Most fine-tuning problems start before training begins. You’ll waste time and money if your dataset’s messy, your cost tracking’s nonexistent, or you don’t understand the mechanics of how fine-tuning works.

This guide gives you the preparation toolkit needed for successful fine-tuning. You’ll get dataset format examples, an explanation of how LoRA works, platform research to compare your options, a cost tracking spreadsheet template, and evaluation methods for models without labeled test data. This guide focuses on preparation and planning rather than training code. The planning infrastructure covered here determines whether your fine-tuning project succeeds or wastes resources.

Choosing Between Fine-Tuning, RAG, and Prompt Engineering: A $10K Decision Guide

Hodman Murad — Sun, 15 Feb 2026 11:48:16 GMT

Before you spin up training runs or build retrieval pipelines, answer these three questions. Most teams can’t, which is why they end up with overengineered solutions that underperform simpler alternatives.

Choosing between prompt engineering, RAG, and fine-tuning means matching your requirements to the right level of investment. Get this wrong, and you’ll spend weeks building something you could’ve solved in an afternoon.

Let’s fix that.

3-Layer Stack: What You’re Actually Choosing Between

Prompt engineering means crafting better instructions for a base model. You’re working entirely at the inference level, modifying inputs to get better outputs. The model stays the same, and you’re not pulling in external data.

RAG (Retrieval-Augmented Generation) adds a knowledge layer. You’re pulling relevant context from your documents or databases and injecting it into prompts at runtime. The model stays unchanged, but now it has access to your specific information.

Fine-tuning means retraining the model itself on your data. You’re modifying the model’s weights to specialize its behavior, teaching it patterns, formats, or domain knowledge that weren’t emphasized in the base training.

Subscribe now

Three Questions That Determine Your Approach

Does the model already know how to do this task? If you’re asking GPT-5 to write marketing copy or summarize documents, the base capability exists, even if you need to refine the outputs through better prompts. Begin with prompt engineering to dial in the style and format. If you’re asking it to generate outputs in a proprietary format or follow domain-specific conventions it’s never seen, you might need fine-tuning.

Does the task require knowledge the model doesn’t have? When you need the LLM to reference your internal documentation, product specifications, or company policies, that’s a RAG problem. The model can reason about information, but it can’t memorize your entire knowledge base. Don’t fine-tune when you just need to feed in context.

Is consistent behavior across thousands of examples more important than flexibility? Fine-tuning shines when you need the model to reliably produce outputs in a specific style, follow particular reasoning patterns, or handle specialized scenarios that would require increasingly complex prompts. If you’re solving one-off tasks or experimenting with different approaches, the rigidity of a fine-tuned model works against you.

This week’s paid article:

A senior data scientist’s framework for using DeepSeek effectively, including three workflow patterns for messy CSVs, slow SQL, and legacy code.

UPGRADE

Once you’ve answered these questions, consider the cost implications of each approach.

Cost Comparison at a Glance

Prompt engineering costs you time upfront, but keeps inference costs standard. You’re paying for the experimentation phase, not the deployment.

RAG adds infrastructure overhead. You need vector databases, embedding models, and retrieval logic. Your per-query cost increases because you’re making multiple API calls and processing additional tokens.

Fine-tuning flips the cost structure. High upfront investment in compute, data preparation, and evaluation. Lower per-query costs if you’re using a smaller model, higher costs if you’re fine-tuning large models and still using them at scale.

Use Case Examples

Here’s how these questions play out in real projects:

You’re building a customer support chatbot that needs to reference your FAQ documents and troubleshooting guides. That’s RAG. The model can already reason with the information you provide. RAG lets you surface the right documents at query time.
You’re generating SQL queries from natural language, and your database schema uses unconventional naming patterns that confuse base models. That’s a fine-tuning candidate. You want the model to internalize your specific patterns.
You’re summarizing meeting transcripts in a specific format with particular sections and bullet styles. Start with prompt engineering. Most models can follow formatting instructions if you’re clear enough. Only escalate to fine-tuning if prompt complexity becomes unmanageable.
You’re extracting structured data from unstructured medical notes, where the model needs to understand clinical abbreviations and context-dependent terminology. This might need fine-tuning if your domain vocabulary differs significantly from what’s in the training data.
Subscribe now

‘Is This Overkill?’ Checklist

Before you commit to RAG or fine-tuning, run through these checks:

Can you solve this with 10 examples in your prompt instead of 10,000 examples in a training set? (If yes, you don’t need fine-tuning.)
Does your use case actually require real-time access to changing information? (If yes, RAG. (If no, maybe just better prompts.)
Will you be running this model hundreds of thousands of times? (If no, the cost optimization from fine-tuning probably doesn’t matter.)
Do you have clean, representative training data that teaches the model something it doesn’t know? (If no, don’t fine-tune.)
Can you describe your desired behavior in words? (If yes, try prompt engineering first.)

Bottom Line

Most teams skip straight to the expensive solution because it feels more sophisticated. But prompt engineering handles more use cases than people expect, RAG solves the knowledge problem without model changes, and fine-tuning should be your last resort, not your first instinct.

Start simple, escalate when you have evidence that simpler approaches won’t work.

This Wednesday, for paid subscribers, I’m sharing a complete fine-tuning guide, including dataset preparation templates, a conceptual LoRA walkthrough, platform research comparisons, and a cost-tracking spreadsheet you can adapt for your own projects.

Become A Paid Subscriber

DeepSeek for Data Analysis

Hodman Murad — Thu, 12 Feb 2026 10:02:15 GMT

A decision framework and three reusable workflows

I don’t use ChatGPT for my data work. For public datasets and personal projects, like the examples in this article, I use DeepSeek.com. It has a 1 million token context window, it’s free, and it supports complex reasoning without losing track during a multi-table join.

When I’m working with sensitive data (which is often the case for my work), I run DeepSeek models locally using Ollama + Continue.dev. The local versions support 128k tokens. 128k tokens sounds abstract, but in practice, it means I can paste the entire schema of a 50-table data warehouse and still have room to ask follow-up questions. I’ve never hit the limit during normal analysis.

This article gives you a decision framework for when to use DeepSeek and when to code manually, a quality control checklist for catching AI mistakes before they cost you hours, and reusable workflow patterns you can adapt to your own analysis tasks. You’ll walk away knowing exactly when to delegate work to an LLM and when to just type the damn code yourself.

Why DeepSeek? And Addressing Privacy Concerns

There are two camps. One group won’t touch DeepSeek because of data residency concerns. The other is excited about publicly available models that match or beat proprietary options on reasoning benchmarks. I’m in the latter camp for my work, unless a client I’m contracting with already has an enterprise account with OpenAI or Anthropic that they prefer I use (which happens most of the time).

My position is that DeepSeek.com is fine for public datasets and the kind of examples I’m sharing here. For sensitive work, the local setup I mentioned earlier is straightforward: install Ollama, pull the DeepSeek model, and point Continue.dev at your local instance. You get the reasoning power without sending anything to external servers.

This isn’t a setup guide, so I won’t walk through every step. The point is that local deployment is viable if privacy is important to your use case. For everything else, the web interface works perfectly well.

Decision Framework: When to Use DeepSeek vs. Coding Manually

DeepSeek handles tasks that require understanding context, reasoning through approaches, or generating code you’d need to look up anyway. You should code manually when the task is faster to type than to prompt, when you’re learning something new, or when you need absolute certainty.

Here’s how I decide:

If the task requires understanding why something works or what approach to take, use DeepSeek. If you already know what to write, just type it.

Here’s how that plays out across three common scenarios:

“Stop Learning SQL and Python”

Hodman Murad — Sun, 08 Feb 2026 09:02:13 GMT

AI has changed the game. The question now is: what are the new rules?

“I hate leet coding interviews.”

My friend, a senior analyst, told me that over coffee last month. Five years in, crushing it at her job, but dreading the thought of ever having to switch roles because of the interview gauntlet. And honestly? I get it. When you’re spending your days designing data strategies and managing stakeholders, being asked to write a flawless binary search on a whiteboard feels absurd. Especially when Copilot’s sitting right there in VS Code, ready to handle that instantly.

So yeah, the “stop learning SQL and Python” crowd has a point. Just not the one they think they’re making.

📌 Quick note: This is a free article on concepts. Paid subscribers get implementation guides midweek, every week.

Upgrade

Why Everyone’s Fed Up

Two things are happening at once. First, as you get more senior, your job changes. You’re not writing queries all day anymore. You’re figuring out what questions to ask, which systems to build, and how to structure teams around data. Second, AI assistants have gotten really good at syntax. I don’t keep the exact parameters for window functions memorized anymore, and I don’t need to. That’s what documentation is for. GitHub Copilot does.

These tools aren’t the problem. Most interview processes are still stuck in 2015, testing whether you’ve memorized edge cases instead of whether you can actually solve problems. That’s what’s driving people up the wall.

But Here’s Where It Gets Tricky

There’s a huge difference between “I don’t need to memorize syntax” and “I don’t need to understand how this works.” Some people are conflating the two, and that’s dangerous.

AI is incredible at generating code. It’s also incredibly bad at knowing whether that code is right for your specific situation. I’ve watched Copilot suggest a JOIN that would absolutely wreck performance on our data model. If I didn’t understand what was happening under the hood, I’d have shipped it and spent the next week debugging production issues.

What hasn’t changed is that you still need to debug things. You still need to spot when something’s wrong. And if you’re senior, you’re definitely mentoring people and making architectural calls. You can’t do any of that without understanding the underlying concepts. The tool that writes the code might be new, but the requirement to know what good code looks like isn’t going anywhere.

🔒 This Week’s Paid Guide

A technical guide to building production RAG systems that covers vector database selection, AI embedding model costs, chunking strategies, and optimization patterns that control your LLM infrastructure spend.

Upgrade

What Matters Now

So if mindless memorization of code snippets is out, what’s in? I think about it as four core things.

First: problem framing. Can you take a messy stakeholder request and turn it into something concrete? That’s half the battle right there.

Second: critical thinking. When your model spits out results, can you tell if they make sense? Can you reason about uncertainty, edge cases, and what might be missing?

Third: working effectively with AI. Writing good prompts, integrating outputs, and knowing when to trust the suggestion and when to override it. This is a skill now.

Fourth: technical judgment. Whether a human or an LLM wrote it, can you review the code and evaluate whether it’s efficient, maintainable, and correct? Can you explain why one approach is better than another?

These aren’t easier skills than syntax memorization. They’re harder, but they’re the ones that actually matter.

Subscribe now

Where This Is Heading

The good news is that some teams are figuring this out. I’ve seen interviews shift from “write this algorithm from scratch” to “here’s an ambiguous business problem, walk me through how you’d approach it.” You can use whatever tools you want. The interviewer wants to see how you think, how you break down problems, and what trade-offs you consider.

That’s not a lower bar. It’s a higher one. It’s just measuring the right things.

And honestly, that’s the direction all of us need to be moving into, whether we’re hiring or getting hired. Let AI handle the mechanical stuff. That’s what it’s good at. Focus your energy on the conceptual work, the strategic thinking, the judgment calls. That’s where humans still have the advantage, and probably will for a while.

Here’s what paid subscribers got this past month:

Advanced Model Drift Detection - A technical deep dive into statistical methods (PSI, Wasserstein, KL divergence) for catching data distribution shifts before they tank your models, plus a diagnostic framework that takes you from alert to root cause without manual investigation.

MLOps on a $50 Monthly Budget - The complete architecture for running production ML inference at founder-friendly costs using serverless GPUs, DuckDB querying S3, and automated shutdown scripts that prevent runaway cloud bills.

AI Agent Starter Kit - Production-grade agent infrastructure with built-in cost tracking, compliance engines for EU AI Act/GDPR, and structured logging that makes debugging agents forensic analysis instead of guesswork.

DIY Data Catalog Template - A working metadata management system you can test locally in under 2 hours, with validation scripts, tier-based governance, and a Google Sheets backend that proves the pattern before you invest in vendor platforms.

Not a paid subscriber yet?

UPGRADE NOW

Implementation Blueprint for Vector Database: Pinecone vs Weaviate vs Chroma

Hodman Murad — Thu, 05 Feb 2026 18:30:19 GMT

You’ve accepted that production RAG requires a vector database. Now you’re facing implementation decisions with real cost implications: which vendor, which embedding model, how to chunk your documents, and how to control the budget. This guide provides the frameworks and decision matrices for production implementation decisions.