Unable to explain mediasoup-worker behavior - unexpectedly low CPU utilization

that’s absolutely what came to my mind too. There is a way of turning those off I think, and other forms of feedback too. And yes you should definately check CPU utilization on the producer’s end.

1 Like

Quoting from: https://mediasoup.org/documentation/v3/scalability/#multiple-and-separate-mediasoup-routers

When broadcasting a video stream to many viewers (hundreds or thousands of consumers) it's important to be aware of how video RTP transmission typically works:

A viewer may eventually loose video packets so would request packet retransmission to mediasoup. Retranmissions are handled per transport (they do not reach the broadcaster endpoint) so there is no limitation here.
A viewer may connect or reconnect, or may change its preferred spatial layer, or may just loose too many packets. Any of those circumstances would imply a video key frame request by means of a RTCP PLI or FIR that reaches the broadcaster endpoint.
Upon receipt of a video PLI or FIR, the encoder in the broadcaster endpoint generates a video key frame which is a video packet much bigger than the usual ones.
If the encoder receives many PLIs or FIRs (although mediaoup protects the producer endpoint by preventing it from receiving more than one PLI or FIR per second) the sending bitrate of the broadcaster endpoint would increase by 2x or 3x. This may be a problem for the producer endpoint and also for viewers that will receive much more bits per second.
NOTE: This may be mitigated by increasing the default keyFrameRequestDelay value, although that would cause longer “black-video” periods.
And that is the problem.
In those scenarios, a “re-encoder” in server-side is required. This is, an endpoint that consumes the streams of the broadcaster endpoint, re-encodes those streams and re-produces them into a set of mediasoup routers with hundreds or thousands of consumers in total. Since such a “re-encoder” runs typically in the backend network, it's not limited by available bandwidth.

At the end, those scenarios require a proper architecture with distribution of viewers across multiple mediasoup routers (in the same or different hosts) and special “re-encoder” endpoints in the backend that can absorb PLIs and FIRs generated by a subset of those viewers.

mediasoup comes with libmediasoupclient which, among others, can be used as a re-encoder (wink, wink).

One thing comes to mind that hasn’t been mentioned: does your cloud host allow your instances to burst or scale CPU somehow? It could be that the hypervisor at the cloud host is preventing the CPU starvation somehow (or preventing accurate reporting). Perhaps write a quick node app that runs an endless loop and monitor the CPU behavior.

If not, one other possibility is that you’re hitting some other resource limit before MediaSoup’s limit – especially double check your thread usage, maybe something is causing starvation.