MediaSoup for audio-only - merging all streams?

Rush · June 25, 2021, 9:14pm

Hello, I just found MediaSoup and I’ve been wondering if it’s a good fit for audio-only applications. Can video capabilities be disabled to allow for minimal server CPU usage?

I’d like to have audio sessions with up to 30 users. Obviously not everybody speaking at once, some people on mute or push to talk, but does MediaSoup optimize for the number of producers and consumers automatically? 30 * 30 of streams sounds scary.

On the same note - can MediaSoup merge audio streams on the server side so that clients deal with only a single incoming stream?

jbaudanza · June 25, 2021, 11:13pm

Hello, I just found MediaSoup and I’ve been wondering if it’s a good fit for audio-only applications. Can video capabilities be disabled to allow for minimal server CPU usage?

I use it for an audio-only application and it works great. It’s sort of a clubhouse-style mobile app for language learning.

I’d like to have audio sessions with up to 30 users. Obviously not everybody speaking at once, some people on mute or push to talk, but does MediaSoup optimize for the number of producers and consumers automatically? 30 * 30 of streams sounds scary.

You can put producers into a “paused” state when the user is muted. In that state, they won’t consume as much CPU. If that’s what you mean by optimized, then yes. But it’s not automatic.

On the same note - can MediaSoup merge audio streams on the server side so that clients deal with only a single incoming stream?

No. Mediasoup just forwards RTP packets. It doesn’t do any decoding of the media streams. If you want to do that, you need to mix in ffmpeg or GStreamer. However, that is quite expensive cpu-wise and you’ll add quite a bit of latency to the conversations. This is probably not the route you want to go.

Normando · June 26, 2021, 12:11am

As Jonathan said, maybe you want MCU. mediasoup is SFU. Anyway you can do it with SFU.

https://antmedia.io/webrtc-servers/

Rush · June 26, 2021, 12:32am

Thank you very much! Do you mind sharing how many active (connected) users you can handle on a single CPU core?

jbaudanza · June 26, 2021, 12:54am

I’m using AWS Fargate with 1VCPU tasks. I estimate each task can handle up to around 400 active consumers (unpaused). Right now, that number stays around 40-50 though so I haven’t pushed it

Rush · June 26, 2021, 5:44am

I am wondering if this is really true. For example the API has this function https://mediasoup.org/documentation/v3/mediasoup/api/#router-createAudioLevelObserver - does it mean MediaSoup can analyze volumes just from the RTP streams?

Rush · June 26, 2021, 5:47am

I think I can answer myself - mediasoup/AudioLevelObserver.cpp at 0d7a50169eba3c9ac27a071ac5b2eb2099a34115 · versatica/mediasoup · GitHub

Looks like it’s pretty efficient!

jbaudanza · June 26, 2021, 5:58am

Yup. webrtc adds these volumes to an RTP header extension (RFC6464).

Rush · July 9, 2021, 9:46pm

FYI - I was able to implement a voice-chat only server with MediaSoup. It wasn’t too hard given the abundance of example implementations. If anybody needs help please reach out! THank you for the help given.

BronzedBroth · July 12, 2021, 11:09pm

Why waste the CPU for audio analysis server-sided. Perhaps if you had a talking stick it could really harness the queries per second to a minimal.

Above is a client-side library you attach with ease to your videos, it takes no time to setup and for fun I have it tell borders around videos/audio sources to light up to the dB detected.

As for CPU usage, audio only is single transports, so more units than if you were to host both audio/video. I generally say per-core cause quality is the same across many generations that you’ll get 100-150 of those active transports going at once per-core. How you sort that usage is up to you. I try to avoid loading up based on CPU cause the unstable CPU usage as RTP signals.

Enjoy.

Rush · July 12, 2021, 11:58pm

Thanks for the advice. I actually I wrote my own version of “hark” for voice-level activation. In general I think audio volume detection on the server-side and on the client-side have very different uses.

Client-side is useful for voice-level activation as it needs to be near real-time.
Server-side RTP monitoring is useful for monitoring other clients and when they are actually speaking.

I will soon deploy my solution to production and will report back how much CPU usage I’m seeing. I gave it its own dedicated server so the data should be fairly pure.

Topic		Replies	Views
Effective ways to handle more audience in audio only sessions Deployment & Scalability	6	249	September 15, 2023
Processing participant audio input streams Integration	5	1911	May 20, 2020
Sending stream to only selected participants mediasoup libraries	2	316	October 28, 2020
Feasability Question: 16-member multiplatform group video conference web app mediasoup libraries	4	342	June 16, 2020
Experience with Mediasoup Deployment & Scalability	22	6408	July 6, 2021

MediaSoup for audio-only - merging all streams?

Related topics