realtime AI voice Agent with mediasoup

Daanish2003 · March 4, 2025, 10:06am

Hi first of all thank you for building great library.

I have been building realtime AI voice-agent with mediasoup. I have encounter some problem i would like to have your suggestion

How does my app work

From client to server using webRTC transport
Consume the client media with direction transport
Media is transfered to SileroVAD for speech detection and then to Deepgram STT via websocket for speech to text and then to LLM with Langchain and then to deepgramTTS for speech sythensize and then speech is added with rtpHeader and ssrc and then transfer back to client via webRTC transport.

Problem Is:

SileroVAD uses PCM audio with mono channel to detect speech so i need decode the opus media to PCM 16Bit and convert it into float32Array and then mono channel to pass the array which detects the speech

Question:

Should i use plainTransport to transfer the media to other server which has mediasoup. In server B where i can decode it and resample it and pass to sileroVAD and other AI processes
Or should i use websocket or grpc just to decode resample and convert it into mono channel in Server B and then pass the media in Server A where it detects the speech and passes to other processes

Note: SileroVAD uses onnxruntime-node and Deepgram uses websockets

Bdayz · March 4, 2025, 12:19pm

Using Mediasoup plainTransport should be fine. It provides efficient RTP transport, lower latency than WebSocket/GRPC, and allows decoding/resampling (via ffmpeg, …) with less CPU load on the WebRTC server.

namnm · March 7, 2025, 11:38am

I made something similar using mediasoup:

Topic		Replies	Views
Capturing raw audio data on the server side mediasoup libraries	3	101	November 16, 2024
pipeTransport mediasoup libraries	6	1585	April 5, 2023
Help for clarification on Direct/Plain transports and Jitter Buffer Integration	3	875	June 19, 2021
Gstreamer -> PlainTransport send opus Integration	4	696	February 3, 2021
Only Audio Applications mediasoup libraries	7	1286	June 29, 2022

realtime AI voice Agent with mediasoup

Related topics