I am interested in doing audio processing on the input streams from the participants. Ideally, I’d like a single stream of audio (all participants’ input streams merged into 1). From that input, I’d like to output a processed stream to all participants.
Any ideas on where I could integrate this? I’m very new to mediasoup so any pointers would be greatly appreciated!
Ideally we’d like to do it on the server side if we can get access to the streams there. To minimize latency, we want to guarantee performance/speed of our audio processing. Some lower-end phones would have trouble doing the audio processing on the client side.
mediasoup is a SFU, not a mixer/MCU. If you want to mix audio tracks in server side you need to consume those audio tracks in the server by using other RTP stack (ffmpeg, gstreamer, etc), mix them, send them back to the mediasoup Router (mixed into a single track) and then tell mediasoup to relay it to (WebRTC?) clients.
Basically mediasoup is about producers (RTP in) and consumers (RTP out). It does not mean that every producer should be directly relayed to final endpoints. You can consume those producers in server side, do things with them (out of the scope of mediasoup and this forum), produce them again into the mediasoup Router and consume them from final endpoints.