Add MQTT-based STT worker for VAD segments
Some checks failed
Build and Push EVS Bridge Image / docker (push) Has been cancelled

This commit is contained in:
Kai
2026-02-13 17:49:26 +01:00
parent fd1bfb4786
commit 5294c24b08
7 changed files with 292 additions and 2 deletions

View File

@@ -33,3 +33,14 @@ VAD_POSTROLL_MS=1000
VAD_START_THRESHOLD=900
VAD_STOP_THRESHOLD=600
VAD_MIN_SPEECH_MS=300
# STT worker settings (faster-whisper)
MQTT_VAD_TOPIC=evs/+/vad_segment
MQTT_TRANSCRIPT_TOPIC_TEMPLATE=evs/{device_id}/transcript
MQTT_STT_ERROR_TOPIC_TEMPLATE=evs/{device_id}/stt_error
STT_MODEL=small
STT_DEVICE=cpu
STT_COMPUTE_TYPE=int8
STT_BEAM_SIZE=1
STT_LANGUAGE=de
STT_CONDITION_ON_PREV_TEXT=false

View File

@@ -8,6 +8,7 @@ It provides:
- MQTT playback input (`evs/<device_id>/play_pcm16le`)
- Optional Home Assistant webhook callbacks (`connected`, `start`, `stop`, `disconnected`)
- VAD auto-segmentation (`vad_segment`) with pre-roll/post-roll
- Optional STT worker (`vad_segment` -> `transcript`) via MQTT
## 1) Start the bridge
@@ -53,7 +54,7 @@ Then upload firmware.
1. Flash ESP32
2. Open serial monitor
3. Send `s` (stream mode)
3. Wait for WS connect (client switches to stream mode automatically)
4. In bridge logs, you should see the device connection
5. If `ECHO_ENABLED=true`, incoming audio is returned to ESP32 speaker
@@ -63,7 +64,8 @@ Then upload firmware.
- `evs/<device_id>/status` (connection/start/stop/disconnect)
- `evs/<device_id>/mic_level` (mic telemetry)
- `evs/<device_id>/vad_segment` (finalized speech segments)
- reserved for next steps: `evs/<device_id>/transcript`, `evs/<device_id>/stt_error`
- `evs/<device_id>/transcript` (text from stt-worker)
- `evs/<device_id>/stt_error` (stt-worker errors)
- Playback input to device:
- `evs/<device_id>/play_pcm16le`
- payload options:
@@ -101,6 +103,20 @@ You can build automations on these events (for STT/TTS pipelines or Node-RED han
- `VAD_KEEP_FILES=200` limits number of stored VAD WAV files
- `VAD_MAX_AGE_DAYS=7` deletes VAD WAV files older than 7 days
- MQTT is recommended for control/events, WebSocket for streaming audio
- STT worker:
- subscribes: `evs/<device_id>/vad_segment`
- reads `wav_path` from event JSON
- transcribes with `faster-whisper`
- publishes transcript to `evs/<device_id>/transcript`
## 6.1) STT Worker Config
Use these env values (in `.env` or Portainer):
- `STT_MODEL` (`tiny`, `base`, `small`, `medium`, `large-v3`)
- `STT_DEVICE` (`cpu` or `cuda`)
- `STT_COMPUTE_TYPE` (`int8`, `float16`, ...)
- `STT_LANGUAGE` (`de` or empty for auto-detect)
- `MQTT_VAD_TOPIC`, `MQTT_TRANSCRIPT_TOPIC_TEMPLATE`, `MQTT_STT_ERROR_TOPIC_TEMPLATE`
## 7) Build and push to Gitea registry
@@ -161,6 +177,29 @@ services:
volumes:
- evs_bridge_data:/data
evs-stt-worker:
image: git.khnm-zimmerling.de/kai/evs-stt-worker:latest
container_name: evs-stt-worker
restart: unless-stopped
environment:
LOG_LEVEL: "INFO"
MQTT_HOST: "10.100.3.247"
MQTT_PORT: "1883"
MQTT_USER: ""
MQTT_PASSWORD: ""
MQTT_BASE_TOPIC: "evs"
MQTT_VAD_TOPIC: "evs/+/vad_segment"
MQTT_TRANSCRIPT_TOPIC_TEMPLATE: "evs/{device_id}/transcript"
MQTT_STT_ERROR_TOPIC_TEMPLATE: "evs/{device_id}/stt_error"
STT_MODEL: "small"
STT_DEVICE: "cpu"
STT_COMPUTE_TYPE: "int8"
STT_BEAM_SIZE: "1"
STT_LANGUAGE: "de"
STT_CONDITION_ON_PREV_TEXT: "false"
volumes:
- evs_bridge_data:/data
volumes:
evs_bridge_data:
```

View File

@@ -9,3 +9,26 @@ services:
- "${WS_PORT:-8765}:${WS_PORT:-8765}"
volumes:
- ./data:/data
evs-stt-worker:
build: ../stt-worker
container_name: evs-stt-worker
restart: unless-stopped
environment:
LOG_LEVEL: "INFO"
MQTT_HOST: "${MQTT_HOST:-localhost}"
MQTT_PORT: "${MQTT_PORT:-1883}"
MQTT_USER: "${MQTT_USER:-}"
MQTT_PASSWORD: "${MQTT_PASSWORD:-}"
MQTT_BASE_TOPIC: "${MQTT_BASE_TOPIC:-evs}"
MQTT_VAD_TOPIC: "${MQTT_VAD_TOPIC:-evs/+/vad_segment}"
MQTT_TRANSCRIPT_TOPIC_TEMPLATE: "${MQTT_TRANSCRIPT_TOPIC_TEMPLATE:-evs/{device_id}/transcript}"
MQTT_STT_ERROR_TOPIC_TEMPLATE: "${MQTT_STT_ERROR_TOPIC_TEMPLATE:-evs/{device_id}/stt_error}"
STT_MODEL: "${STT_MODEL:-small}"
STT_DEVICE: "${STT_DEVICE:-cpu}"
STT_COMPUTE_TYPE: "${STT_COMPUTE_TYPE:-int8}"
STT_BEAM_SIZE: "${STT_BEAM_SIZE:-1}"
STT_LANGUAGE: "${STT_LANGUAGE:-de}"
STT_CONDITION_ON_PREV_TEXT: "${STT_CONDITION_ON_PREV_TEXT:-false}"
volumes:
- ./data:/data