single worker scaling and CPU limitations

In the scalability documentation it states that “depending on the host CPU capabilities, a mediasoup C++ subprocess can typically handle over ~500 consumers in total”. I am curious about how this value was derived and what the options there are for improving single-worker performance.

We recently had upwards of 100 users in a single (audio only) “conference room” and media soup performed admirably well, but we did end up maxing out the CPU of our EC2 instance (c5n.large) as we neared 110 users. If my math is right, 100 users should correspond to 9900 consumers. n*(n-1). Pretty amazing!

I am just wondering what possible options exist to go further here. Sharding the room across routers (and sub-processes) does not appear to give any advantage since as you pipe producers into new routers, you still end up creating the same number of consumers on them.

The obvious solution is to throw bigger CPUs at the problem. We could adjust our instances to focus on fewer but more powerful CPU cores.

I am also curious about how the paused state of producers/consumers impacts the CPU cost of them. The vast majority of the participating users were muted, so this may be why we were able to reach such a high number? One option we considered is to actually destroy (or not initially create) all the consumers of a paused (muted) producer, but we are concerned that the user-experience would suffer as there may be too much overhead in re-creating the necessary consumers when they un-mute themselves and some audio could be missed by the peers.

Going forward we are hoping to be able to support up to 200 users in a single room. Do you think this is at all feasible? Is there any further room for optimization in the C++ library itself that we could look into and contribute back to the project? Is there any way to achieve this within the current implementation/architecture that I am possibly missing?

Thanks

2 Likes

From here: https://mediasoup.org/documentation/publications/

But note that it’s a study with video and it uses mediasoup v2, but v3 should not be worse.

If you interconnect N Routers of different N Workers then you use N CPUs, so yes, it increases the number of suitable Consumers.

Note however that for video this is not a super valid solution. Read the red note in the “Scalability” section about this.

If you call producer.pause() in mediasoup-client it will just send silence audio or black video frames with a much lower bitrate. If you call producer.pause() in mediasoup server, such a producer will not relay any RTP packet to its Consumers.

Don’t do that, it gives no benefit over pausing the producer.

It depends on whether all those users are sending audio and video at the same time. I cannot give exact numbers. Anyway, I’m pretty sure you don’t want to have a conference of 200 users talking all together and sending audio all together at the same time.

If there is room for C++ optimization don’t hesitate to contribute to it. As far as we know current code is as efficient as we can do it.

You are absolutely correct. 90-95% of the users are muted and their producers are correspondingly paused on the server. We just want to ensure that at any time, someone can quickly unmute themselves and speak up if necessary. I am guessing this fact, along with being audio only, is why we achieved such a high consumer count.

I see now that you are right. For some reason my initial math was making think of this in the wrong way.

Router 1 (on Worker1) has 50 connected users. So initially 50 * 49 = 2450 consumers.
Router 2 (on Worker2) has 50 connected users. 50 * 49 = 2450 consumers.
Pipe 50 producers from Router 2 to Router 1 creates another 50 * 50 = 2500 consumers on Router 1.
Pipe 50 producers from Router 1 to Router 2 creates 50 * 50 = 2500 consumers on Router 2.

So we still reach the same 9900 consumers, but each router only has 4950.

Fantastic… this is definitely something we should be able to achieve and I am glad you straightened out my thinking on this!

Thanks so much

The tip here is that creating Producers and Consumers is “free”. The limiting thing is how many of those Producers are really sending RTP to the server and how many Consumers are consuming it.

Thanks, that makes sense but is good to know with certainty. One last question for you: A worker maps to a C++ sub-process, which is running on 1 CPU core. Does mediasoup or libuv do anything to actually distribute these sub-processes to unique cores/CPUs? Or is the left to the OS to manage? From reading the code I don’t see anything like this, so I am wondering if it is possible for 2 workers to be running on a single core, even though we only start a pool of workers equal to the number of CPUs (using Object.keys(os.cpus()).length).

Absolutely no idea about that. I assume the OS decides and manages where to run every new process.

It seems that OS is the one who manages this. In the following test I’m running a single router with lots of consumers. As you can see the process is running on one of the cpu cores at a time but it does not stick to a specific core number and it is changing time by time.

image

image

1 Like