What it all about:
Hello everyone, we are actively working on libwebrtc integrated library update to improve congestion control and bandwidth estimation. As you may know mediasoup has hard fork of libwebrtc module that is responsible for BW estimation and CC. The fork has not being updated for a long time. Among the few bugs fixed here and there, the most important part of this work is to backport new loss estimator, which is a big step forward in comparison to how google-cc alg was dealing with loss before. Prior it, loss adjustments to BW estimate were calculated here: mediasoup/send_side_bandwidth_estimation.cc at bwe_backport · sarumjanuch/mediasoup · GitHub. Very simple and rudimentary formulas. This is how chart looked like prior Loss v2 when link capacity is loss limited:
This is because loss is not differentiated. Main assumption for Loss v2 is that loss can be separated in two categories: Instant Loss and Inherent Loss. Instant loss is basically sporadic bursts and Inherent loss is the one that is Inherent for a given link at certain period of time, both has limits, both by default 5% so over 5% of those limits means we should decrease, and when near top of it we should hold. Loss v2 uses maximum likelihood estimation approach inside. Also it has a meaning candidates and their factors to pick BW estimate from. It is integrated with Delay estimator, Trendline estimator, and Probe estimator (not yet in this branch). And it produces the minimum of three values. Loss v2 estimator is currently active experiment in chrome, you can check it by taking a look at your filed trials. And is still evolving. Unfortunately it is not described at any papers, nor has documentation, so we are kinda on a wild west here. With loss v2 estimator, loss limited downlink estimation looks like:
And here Is the mix of loss and delay estimate, you can clearly see straight lines where there is loss present, still delay based estimation is far from being perfect:
How we test:
So initially we started testing with artificial congestion created by traffic shapers, but they are mostly useless, because they do not reflect how the real network queues behaves. Google CC algorithm relies on two main characteristics: packet loss and delay gradient values linear regression slope. It is very hard to simulate those with traffic shapers as main assumption is that either delay or packet loss start increasing near the top of our link capacity, and therefore turned into a signal to hold or decrease BW estimate. Traffic shapers create sudden bursts on top of the link capacity, in form of loss, because mostly they are simple leaky bucket implementation. GCC will just produce sawtooth signal for such circumstances, because it would be limited only by burst loss. The only useful scenario in such cases is to observer how GCC behaves in random loss environment. That said, our main goal to test GCC is to create real congestion and see how it behaves under this circumstances. To easily reproduce this scenario Jose created special branch GitHub - versatica/mediasoup-demo at jmillan/consumerReplicas. Which allow you to run a single page with producer and then run consumer page, where you would be able to specify enough consumer replicas to create congestion in your downlink. I personally have a separate server on the wild internet where i run mentioned mediasoup demo branch, and I am connecting to it from a different networks all over the place. So for example in my home network BW is limited by loss, on top of link capacity small loss noise start emerging, and los v2 estimator catches it very beautifully. On the other network that I am testing BW is Delay limited, and it is where i am having issues right now that i am trying to fix. So in short in my case I run two browser tabs on a separate machines, where one is pure isolated producer, and second is pure isolated consumer, that are connected to external server, and are sending traffic over the real network, then i pick consumerReplicas value big enough for GCC being continuously congested, and start sampling values for my charts.
So two tabs url params might look like this:
Consumer: https://YOUR_SERVER/?roomId=test&produce=false&consumerReplicas=19
Producer: https://YOUR_SERVER/?roomId=test&consumer=false&consumerReplicas=19
Additionally, we understand that the default loss bounds may not be suitable for everyone, as network characteristics may vary all over the world, so we have added ability to change default libwebrtc filed trial string, when creating worker. This allows you to control params of different modules. For the majority of users two most useful flags params will be: InstantUpperBoundLossOffset and InherentLossUpperBoundOffset respectively, to prevent CC collapse in case when you are aware that there is certain loss in you network that you know is there and you can’t do anything about it, and it is higher than 5%.
The work is still ongoing, and we are aware of few issues here and there. Nevertheless we encourage everyone interest to start testing, and share their results, to form a first solid RC. Please use this topic for all the discussions regarding this. I will be here trying to answer all the arising questions, and share updates.