the ai playbook part 2 - set up a private personal ai toolkit
sunk thought’s - the ai playbook
“the ai playbook” is a mini-book about integrating ai within your software engineering organization. This is part 2, on how to set up your own local/private ai toolkit. I can’t be sure how long there will be between parts, as this isn’t completely written. But, I want to avoid hoarding this as timeliness is of the essence in this space (things are changing rapidly) rather than save it all up until the whole thing is finished.
Set Up a Private Personal AI Toolkit
We all know that Large Language Models (LLMs) are growing ever more capable. Right this second they are transforming how we work. In the future it will be taken for granted the way we take laptops, iPhones, and tablets for granted — the way we take the internet for granted, even when traveling 30,000 feet in the sky across the ocean.
LLMs offer the promise of enhanced productivity, improved creativity and idea generation, better data analysis and therefore decision-making, clearer and more effective communication, opportunities for continuous personal growth and learning, and overall time efficiency that allows professionals to focus on high-value tasks.
There is a sea of available web tools and apps (free and subscription based) out there, but as usual, I wanted something a little bit... different.
The Goal
I had a goal: utilize LLMs to make myself more effective as a professional.
The constraints I set for this project were: I want this toolkit to be available 100% offline, be 100% private, and 100% local.
By installing our LLMs locally, we avoid concerns about privacy, compliance, and giving data to external services that often arise in a corporate environment. This means that any future employer is much less likely to take this tool away from us. A major positive.
It also comes with the other important benefit of being always available: If our computer works, so does the AI toolkit. If the internet goes out on that flight to London, our AI helper won’t. AND, IT'S 100% FREE! No subscriptions, api keys, or tokens to worry about.
The Tools
This is where this article will likely get out of date really quick in the future. Alas. Hopefully this is still the good stuff when you are reading this.
We’re going to use Ollama for running local language models, Open Web UI provides a graphical interface to interact with those models, and Continue is a VS Code plugin that integrates AI directly into the code editor a’la copilot (but free,) which is super handy for coding tasks.
Setting this stuff up was all crazy simple. I wrote no code, spent no time needing to futz with configurations, nor did I even really read any of the installation instructions to get this up and running. I copy/pasted some commands, made no changes, and hit enter in my terminal. It all just works.
For the most part, installation instructions are there to provide a peek at how simple installation is, while not being comprehensive. I expect that info to get stale, and therefore don’t want you to rely on it as anything other than a roadmap to the current procedures from the official websites.
Please hunt down the latest instructions for your operating system if you have any troubles.
I will get into some configuration details you can look at after you’re up and running, but I want to stress that there’s nothing here that requires an engineering degree to pull off.
Performance Concerns
What you will need is a fairly beefy computer. There are smaller versions of models that can run lighter spec machines, but, obviously, the larger the model you can run the better. I’m using a M3Max MacBook Pro with 48GB of RAM and use deepseek-r1 (32b parameters.) While this is an expensive machine, I don’t think it’s outrageously wild considering my primary audience (tech professionals.)
Don’t let this discourage you, however. Many people are completely happy using 8b models or are quantizing larger models to run better on lighter spec machines.
Installation
Ollama: Local Large Language Models Made Easy
Installation: Download Ollama from its official website and install it following the simple instructions provided.
Model Setup: After installation, you can download various models (Search Available Models Here) directly from Ollama. These models run locally, allowing you to generate text, answer questions, or even code without internet access.
Example: to run deepseek-r1 (14b), enter the following in your terminal. It will download and then start the model:
ollama run deepseek-r1:14b
Now that you’re done that, you have your very own LLM running on your laptop, in your terminal!! The next time you run the same model you won’t need to download it again, or you can run another new one and it’ll download that. Pretty cool stuff.
Ask it some questions, have it outline a blog post for you, ask it some gotcha questions like “which number is larger 9.11 or 9.9” - really try it out!
We’ve got an AI in our laps, now!
Now what?
Figure out how big of a model you can run on your machine by trying several out.
Test out several different models, see what works well for you with which types of tasks.
Go another step forward and play with quantized models, that let you run even larger parameter models with less memory usage.
Try ollama’s new quantized context feature, allowing for similar memory saving results, but this time we’re quantizing the session context.
Open Web UI: A User-Friendly LLM Interface
Getting Started: Install Open Web UI by cloning its repository or by using pip.
Launch a Server in Docker: Follow the latest instructions for running the application. The current default command for starting it up (with ollama on your local machine) using docker is:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Pretty slick… And, so easy, too!
Now what?
You’ve now extended your LLM’s capabilities quite a bit! Open Web UI comes with a lot of very cool tools to make your experience better. There are tons of resources and videos online that dive into how to use them all, including:
There’s some light RAG (Retrieval-Augmented Generation) abilities built in, allowing you to search your own documents and get back information, and you can even connect your google drive up to it.
Web Search can be enabled to allow you to have it search the web for information and process those results for you.
You can save sessions to json, store that json in a database, and have that available for your ai’s future recall.
There’s an easy to use GUI to change your LLM’s parameters. A good one to edit is the context size.
The default in Ollama is only 2k tokens, despite what your model can theoretically handle. They do this to help reduce load on our machines, but we can override this.
Just click the “Controls” button in the upper right of Open Web UI’s browser application and set the “Context Size” to something appropriate to the model you’re using. For instance, deepseek-r1 32b can handle 131,072 tokens.
Continue Plugin: AI-Assisted Coding in VS Code
Installation: Install the Continue plugin from the Visual Studio Marketplace.
Open VS Code.
Go to the Extensions view by clicking on the Extensions icon in the Activity Bar or pressing
Ctrl+Shift+X
.Search for "Continue AI" in the Extensions marketplace.
Click on the installation button to install the plugin.
Integration: Once installed, Continue integrates seamlessly into VS Code, offering features like code completion suggestions, debugging assistance, and context-aware help directly within your editor.
After the WYSIWYG setup, click on the settings cog at the top of the plugin view to see your config file. These will configure the model options you can select within the plugin. Feel free to delete any non-ollama model providers that require api keys like I did… we don’t need ‘em!
"models": [ { "title": "Llama 3.1 8B", "provider": "ollama", "model": "llama3.1:8b" }, { "title": "deepseek coder v2", "provider": "ollama", "model": "deepseek-coder-v2:latest" }, { "model": "AUTODETECT", "title": "Autodetect", "provider": "ollama" } ], "tabAutocompleteModel": { "title": "Qwen2.5-Coder 1.5B", "provider": "ollama", "model": "qwen2.5-coder:1.5b"
And, just like that, you have your own pair-programming assistant that lives in your computer and shares no data with MSFT!
Conclusion: Pretty Neat Stuff
Using a Large Language Model (LLM) can offer many benefits to enhance your effectiveness across various domains. After mastering your personal setup, you will have massive creative power available to use completely free, any time, anywhere, even if you don’t have the internet!
What you use it to do is mostly limited by your imagination and patience. Use it as a brainstorming partner, a writer, an editor, a programmer, data analyst, and easily accessible data warehouse. Need multilingual assistance? Done. Tailored Career Advice? Easy. Locations you can rent out for a wedding in Lake Tahoe? Why not?
Thanks for reading! In the following chapters we’ll move on to dialing this personal setup in with Automated Agents, RAG (Retrieval-Augmented Generation), look at larger organizational level implementations, and more.