So it turns out that OpenWeb-UI comes with a form of whisper integration out of the box. So I’m going to cover what changes you should make to make it better, and then I’ll include a bonus of a n8n + docker-compose you can use to run a whisper-server + n8n test workflow for it.
To keep it simple, I enabled audio transcription, changed the default whisper model to large and set the compute type to flat16 since I’m using my 4080 SUPER to assist with it.
This is done via the environment entries within the docker-compose:
open-webui:
image: ghcr.io/open-webui/open-webui:main
restart: unless-stopped
container_name: open-webui
ports:
- "3000:8080"
environment:
- ENABLE_AUDIO_TRANSCRIPTION=true
- WHISPER_MODEL=large
- CUSTOM_COMPUTE_TYPE=flat16
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- open-webui:/app/backend/data
Once you spin openweb-ui back up, navigate to the Admin Panel –> Settings –> Audio.
Then you should see STT settings set to Whisper (Local) and then STT Model now shows large.

Now open a new chat, and upload an MP3. Let it say idk what you want but click the source file in the response and you should now see the transcription!

Cool. Now if you want a standalone whisper-server you can integrate into other flows including w/ n8n:
Create a new directory in your user’s home directory, I named mine: whisper-standalone
Create a directory within it called models
In the whisper-standalone directory, create a new docker-compose.yml:
services:
whisper-server:
image: onerahmet/openai-whisper-asr-webservice:latest-gpu
container_name: whisper-server
ports:
- "9002:9000"
environment:
- ASR_MODEL=large
- ASR_ENGINE=openai_whisper
volumes:
- ./models:/root/.cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Spin it up and boom, you can use curl to test that it’s working properly:
curl -X POST http://localhost:9002/asr -H "accept: application/json" -F "audio_file=@AudioFile.mp3"
You should get the transcription in response!

Now lets create a n8n workflow that has a local file trigger for any mp3 files that are uploaded (/data/shared/uploads/*.mp3) and then reads the file, sends it to the whisper server, receives the transcription and then writes to /data/shared/transcripts/filename.txt
In the end it should look like:

You can download my n8n workflow HERE.
Next upload is likely going to be a web crawler/parser? Idk, we’ll see.
END TRANSMISSION
