MediaSoup for audio-only - merging all streams?

Hello, I just found MediaSoup and I’ve been wondering if it’s a good fit for audio-only applications. Can video capabilities be disabled to allow for minimal server CPU usage?

I’d like to have audio sessions with up to 30 users. Obviously not everybody speaking at once, some people on mute or push to talk, but does MediaSoup optimize for the number of producers and consumers automatically? 30 * 30 of streams sounds scary. :slight_smile:

On the same note - can MediaSoup merge audio streams on the server side so that clients deal with only a single incoming stream?

Hello, I just found MediaSoup and I’ve been wondering if it’s a good fit for audio-only applications. Can video capabilities be disabled to allow for minimal server CPU usage?

I use it for an audio-only application and it works great. It’s sort of a clubhouse-style mobile app for language learning.

I’d like to have audio sessions with up to 30 users. Obviously not everybody speaking at once, some people on mute or push to talk, but does MediaSoup optimize for the number of producers and consumers automatically? 30 * 30 of streams sounds scary. :slight_smile:

You can put producers into a “paused” state when the user is muted. In that state, they won’t consume as much CPU. If that’s what you mean by optimized, then yes. But it’s not automatic.

On the same note - can MediaSoup merge audio streams on the server side so that clients deal with only a single incoming stream?

No. Mediasoup just forwards RTP packets. It doesn’t do any decoding of the media streams. If you want to do that, you need to mix in ffmpeg or GStreamer. However, that is quite expensive cpu-wise and you’ll add quite a bit of latency to the conversations. This is probably not the route you want to go.

As Jonathan said, maybe you want MCU. mediasoup is SFU. Anyway you can do it with SFU.

https://antmedia.io/webrtc-servers/

Thank you very much! Do you mind sharing how many active (connected) users you can handle on a single CPU core?

I’m using AWS Fargate with 1VCPU tasks. I estimate each task can handle up to around 400 active consumers (unpaused). Right now, that number stays around 40-50 though so I haven’t pushed it

I am wondering if this is really true. For example the API has this function mediasoup :: API - does it mean MediaSoup can analyze volumes just from the RTP streams?

I think I can answer myself - mediasoup/AudioLevelObserver.cpp at 0d7a50169eba3c9ac27a071ac5b2eb2099a34115 · versatica/mediasoup · GitHub

Looks like it’s pretty efficient!

Yup. webrtc adds these volumes to an RTP header extension (RFC6464).

FYI - I was able to implement a voice-chat only server with MediaSoup. It wasn’t too hard given the abundance of example implementations. If anybody needs help please reach out! THank you for the help given.

2 Likes

Why waste the CPU for audio analysis server-sided. Perhaps if you had a talking stick it could really harness the queries per second to a minimal.

Above is a client-side library you attach with ease to your videos, it takes no time to setup and for fun I have it tell borders around videos/audio sources to light up to the dB detected.

As for CPU usage, audio only is single transports, so more units than if you were to host both audio/video. I generally say per-core cause quality is the same across many generations that you’ll get 100-150 of those active transports going at once per-core. How you sort that usage is up to you. I try to avoid loading up based on CPU cause the unstable CPU usage as RTP signals.

Enjoy.

Thanks for the advice. I actually I wrote my own version of “hark” for voice-level activation. In general I think audio volume detection on the server-side and on the client-side have very different uses.

Client-side is useful for voice-level activation as it needs to be near real-time.
Server-side RTP monitoring is useful for monitoring other clients and when they are actually speaking.

I will soon deploy my solution to production and will report back how much CPU usage I’m seeing. I gave it its own dedicated server so the data should be fairly pure.