The quickest and simplest guide to spinning up a powerful local AI stack. Part 5 – Open-WebUI To Crawl4AI – Local Files

This part ended up taking longer then I would’ve liked.

There isn’t much to go over, but I ended up spinning up a Crawl4AI container, added a function to Open-WebUI that parses the URL from user input, sends it to a N8N workflow which utilizes Crawl4AI’s API to crawl the URL and then parse Crawl4AI’s output and saves the information into a bunch of different files. For now the function only states success, but next is to implement a function that’ll let Open-WebUI summarize and respond with what the URL is/says.

Then back to the main goal which is to implement RAG Agent that utilizes the files from this workflow. After that, I’ll clean everything up, including removing what I’m not using currently form the docker-compose etc. And I’ll throw my actual docker-compose, and some more stuff onto my Local-AI-Stack GitHub Repo found HERE.

So lets set up Crawl4AI.

So first I made a directory for it to live for now:

mmkdir ~/crawl4ai

Then ofc, the docker-compose.yml file:

version: "3.9"
services:
  crawl4ai:
    image: unclecode/crawl4ai:latest
    container_name: crawl4ai
    ports:
      - "8860:11235"
    volumes:
      - ./crawl_data:/data
    shm_size: "1g"
    restart: unless-stopped

Make sure you create the ~/crawl4ai/crawl_data directory as well.

Now spin it up:

docker compose up -d 

Can confirm it’s running by hitting http://localhost:8860 in a browser or using curl like:

curl -X POST http://localhost:8860/html   -H "Content-Type: application/json"   -d '{
    "url": "https://gainsec.com"
}'

You should see the HTML of the URL wrapped in JSON as the response:

Sick! Now you should import the Open-WebUI function found HERE. See instructions in the previous posts in this series for more information about this process.

And import the N8N workflow HERE. Activate it, etc. See instructions in the previous posts in this series for more information about this process.

Now open a new chat in Open-WebUI and select the n8ncrawl ‘model.’

Ask it something like:

What does this article say? https://gainsec.com/

And note you should see the n8n workflow running and in Open-WebUI depending on which version you grab of the function:

or similar.

Now I’m not going to go through the workflow but it’s very simple.

But ultimately you should look in the ~/local-ai-packaged/shared/ directory for the output or if you have a newer version, ~/local-ai-packaged/shared/crawl/*

Awesome! As a reminder next up is adding functionality for Open-WebUI to take the input like we used for this, and on top of running this flow, use Crawl4AI or similar to hit the website and return a summary in the chat.

Then it’ll be implementing a RAG Agent for this workflow.

After that I will go back and polish up and release the proper configurations/files and the ‘base’ GainSec AI Local Stack will be more or less g2g.

My plans for after that will be shared at a later date!

END TRANSMISSION

Leave a Reply