We’re using mediasoup for a room based video chat application with mediasoup code roughly based on the demo app. We’re using H264 and simulcast to try and keep cpu & bandwidth requirements low, and all clients are typically running chrome 83+ on a variety of desktop OS. The clients usually encode 2 simulcast streams at 360p and 180p, ~600kb/s and ~100kb/s respectively.
We see a few problems with the WebRTC BandWidth Estimation, and have been using the bwe trace events to try and monitor what’s going on. We’re mostly testing on >=100Mb links with a server hosted at a local data centre so not really expecting to have bandwidth issues.
The most annoying problem is that intermittently a client will seem to get locked in a low available bandwidth situation with the mediasoup server flipping between sending a null spatial layer and layer zero. Simply rejoining the room and hence recreating all transports typically fixes the issue.
We’ve recently un-commented the following line in RTC/TransportCongestionController.cpp:50, anecdotally this seems to have improved this particular situation.
More generally, we often see from the bwe trace events that available < desired bitrate even on supposedly reliable, high bandwidth, low latency, links and as a result low bandwidth streams are getting forwarded when there should be plenty of bandwidth available for the high quality one. Before investigating further, I’m wondering if there’s any value trying to update the libwebrtc code that seems to be used for some of this Bandwidth Estimation. As far as I can see it’s taken from something around m78 and the libwebrtc code gets re-organised a bit after that point so I’m not sure how straightforward updating it would be.
Also wondering if anybody else sees similar issues, or whether I’m barking up the wrong tree looking at the BWE code at all.
It is unfortunate that we are using libwebrtc code for BWE because on one hand we are not certain that we are using the code 100% properly and on the other hand we are not certain of the fact that it is indeed working 100% as it should. Lastly we don’t have full control over it.
We’ve experienced such kind of behaviour too so yes, you are likely barking up to the right tree.
We would like to remove it and make our own implementation but this is not in the near plans and can’t provide any ETA for it.
Any PR for updating the libwebrtc code in the meantime would be appreciated though.
I’ll have a look at updating libwebrtc but I’m not very familiar with the chromium codebase so I’m a little worried that even if I get it working, I might break it subtly. Any tips?
No, we did many tuning tests by the time. The comment is very unfortunate.
We’ve recently un-commented the following line in RTC/TransportCongestionController.cpp:50, anecdotally this seems to have improved this particular situation.
Simply that the video stream seems to recover more consistently to a high quality layer when previously it would sometimes get stuck oscillating between spatial layer null and zero.
I tried this because the bwe seemed to recover much more quickly in situations where this line appeared in the logs.
webrtc::ProbeController::RequestProbe() | detected big bandwidth drop, start probing
In the situation where we were getting trapped in the null/0 oscillation we weren’t seeing this line in the logs.
I think the periodic probe may be helping it break out of some kind of corner case by resetting the bandwidth estimate, but I’m really guessing at this point.
Moving to version 3.6.14 (from 3.6.12), we see endless printouts as below, moving from and to spatial layer 0->1. Could this be the cause?
mediasoup:Channel [pid:34432] RTC::SimulcastConsumer::RTC::SimulcastConsumer::UpdateTargetLayers() | target layers changed [spatial:0, temporal:1, consumerId:38ee785b-a77f-469c-9286-c86ccd2a7df6] +66ms
mediasoup:Channel [pid:34432] RTC::SimulcastConsumer::RTC::SimulcastConsumer::UpdateTargetLayers() | target layers changed [spatial:1, temporal:1, consumerId:38ee785b-a77f-469c-9286-c86ccd2a7df6]