As I delve deeper into AI, I got tired of the Jan’s and LLM Studio.
I have a handful of goals in mind, but of course to start it’s important to get local stack. Thankfully the tool suites have exploded including docker composes.
My goal after a prototype is RAG support. Anything like integrating Whisper, Stable Diffusion, Vision models or whatever is icing to start but I’ll take it ofc.
So after doing some Googling I found Cole Medin’s ‘Local-AI-Packaged‘ Github repo.
To grab the overview from the repo itself:
Self-hosted AI Package is an open, docker compose template that quickly bootstraps a fully featured Local AI and Low Code development environment including Ollama for your local LLMs, Open WebUI for an interface to chat with your N8N agents, and Supabase for your database, vector store, and authentication.
This is Cole’s version with a couple of improvements and the addition of Supabase, Open WebUI, Flowise, Neo4j, Langfuse, SearXNG, and Caddy! Also, the local RAG AI Agent workflows from the video will be automatically in your n8n instance if you use this setup instead of the base one provided by n8n!
I’m just going to include information or any steps I had to do outside of what’s covered in the Github repo. Let’s jump in.
Since it’s a prototype I’m building, I’m running it within a WSL Ubuntu 22.04. So to utilize my Nvidia GPU I had to ensure that WSL can access it.
Make sure that the graphics drivers (including CUDA) are updated on windows and then enter the WSL you’re going to use to run the stack and make sure your GPU is returned when running the following command:
nvidia-smi

Now post Docker installation in WSL make sure that docker can access it:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

At this point you can clone the repo, copy the .env.example to .env and so on.
Don’t do ‘docker compose up -d’ or it won’t initialize correctly. (Don’t ask me how I know).
Use the start-services.py script and it’ll work properly.
Now enter the ollama container and pull some models:
docker exec -it ollama bash
and pull some models:
ollama pull llama3.2
Now navigate to n8n, http://localhost:5678/
Create the local account, and then I chose to just set up ‘V3 Local Agentic RAG AI Agent workflow.’
from the n8n dashboard, hit the + button on the top left and add some ‘credentials’
Here are the ones I set up in the first chunk:

The openAI account should be set to the ollama openai API spec root.
http://ollama:11434/v1
The rest are within the .env or default.
Use the chat in n8n to ask the model a question it shouldn’t know anything about:

Now follow the instructions and manually copy a file such as PDF once the local file trigger is waiting. You can then ask again:

Boom!
Not toggle the active switchon the top of the workflow, then double click the webhook within the yellow box of the workflow.
Now navigate to the Open WebUI web app.
Now within the Admin panel, select Functions and then select add. Add the n8n-pipe.py code in and then add the webhook production URL to the N8N Url value:

Now, create a new chat within OpenWeb-UI and select the ‘n8npipe’ model.

Now ask it something to do with your document(s):

Boom!
Next post I’ll continue to cover how to configure the rest of the stack.
END TRANSMISSION
