How To Fine-Tune and Deploy an LLM Without an ML Engineer

Your weights stay on your machine, your data never leaves it, and the vendor’s release calendar is no longer your problem. Here’s the build, start to finish.

Jun 11, 2026

∙ Paid

Yesterday, I walked through how to read AI pricing so you know what you’re committing to before you sign up for a model. The piece underneath all of it, the one I’ve been building toward, is this one: you don’t have to commit to anyone’s model at all.

Every build I’ve published over the past few weeks has given you one piece of this. The local agent build put a private model on your laptop with n8n, Ollama, and Docker. No API keys necessary. The Monday metrics build taught the agent to answer from wherever you’re storing your data instead of inventing numbers, and to say ‘I don’t have that’ when the answer wasn’t there. The RAG build in NotebookLM showed how retrieval lets a model answer from your documents. The prompt engineering piece made the case for treating your system prompt as an asset, versioned and tested, not a throwaway string.

Today, those pieces become one thing you own: a support assistant that runs entirely on your own machine, answers from your team’s documents, speaks in your team’s voice, and refuses to make things up. This is the same system I built for Asaura AI. You’ll build your version of it in the next half hour, for free, and when you’re done, the model is yours. No vendor update can take it out from under you, because no vendor is in the loop.

First, the word ‘fine-tune’

Real fine-tuning means retraining a model’s weights on your own examples. It changes the model itself, and it needs either a GPU or a paid training service plus a labeled dataset. For most teams, it’s the wrong tool for what they actually want.

What teams want is a model that answers from their own knowledge, in their own voice, and admits when it doesn’t know. You get that by adapting an open model at the moment you run it: you give it a system prompt that sets its role and rules, you put your own documents in front of it, and you show it a few examples of the answers you expect. The proper name for this is in-context learning, and it gets you the outcome people mean when they say fine-tune, without training anything, without a GPU, and without a bill.

That’s what we’re building today. The model stays a general open model. What makes it yours is everything you wrap around it.

Below the paywall is the complete build, every step from empty workflow to a working assistant you’ve tested and trust, plus the downloadable files ready to paste in.

Keep reading with a 7-day free trial

Subscribe to The Data Letter to keep reading this post and get 7 days of free access to the full post archives.