The quickest and simplest guide to spinning up a powerful local AI stack. Part 4 – Transcription via Whisper

So it turns out that OpenWeb-UI comes with a form of whisper integration out of the box. So I’m going to cover what changes you should make to make it better, and then I’ll include a bonus of a n8n + docker-compose you can use to run a whisper-server + n8n test workflow for it.

To keep it simple, I enabled audio transcription, changed the default whisper model to large and set the compute type to flat16 since I’m using my 4080 SUPER to assist with it.

This is done via the environment entries within the docker-compose:

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    restart: unless-stopped
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - ENABLE_AUDIO_TRANSCRIPTION=true
      - WHISPER_MODEL=large
      - CUSTOM_COMPUTE_TYPE=flat16
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - open-webui:/app/backend/data

Once you spin openweb-ui back up, navigate to the Admin Panel –> Settings –> Audio.

Then you should see STT settings set to Whisper (Local) and then STT Model now shows large.

Now open a new chat, and upload an MP3. Let it say idk what you want but click the source file in the response and you should now see the transcription!

Cool. Now if you want a standalone whisper-server you can integrate into other flows including w/ n8n:

Create a new directory in your user’s home directory, I named mine: whisper-standalone

Create a directory within it called models

In the whisper-standalone directory, create a new docker-compose.yml:

services:
  whisper-server:
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    container_name: whisper-server
    ports:
      - "9002:9000"
    environment:
      - ASR_MODEL=large
      - ASR_ENGINE=openai_whisper
    volumes:
      - ./models:/root/.cache
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Spin it up and boom, you can use curl to test that it’s working properly:

curl -X POST http://localhost:9002/asr   -H "accept: application/json"   -F "audio_file=@AudioFile.mp3"

You should get the transcription in response!

Now lets create a n8n workflow that has a local file trigger for any mp3 files that are uploaded (/data/shared/uploads/*.mp3) and then reads the file, sends it to the whisper server, receives the transcription and then writes to /data/shared/transcripts/filename.txt

In the end it should look like:

You can download my n8n workflow HERE.

Next upload is likely going to be a web crawler/parser? Idk, we’ll see.

END TRANSMISSION

Leave a Reply