(First some context, to help future readers that might encounter this same problem; actual questions to be found below)
Lately, I’ve been delving into the technical minutiae of Simulcast (I’m just like several years late to the party right? oh, well) and I’m seeing with horror that the WebRTC people left the SDP description totally under-specified. In particular, the fact that SDP doesn’t declare any temporal layers even if the client will be generating them.
I’m working on a very experimental implementation of a mediasoup SDP Bridge. Using a mixture of mediasoup and mediasoup-client utility functions, each media section of an SDP Offer can be parsed into an RtpSendParameters object, which is then used to create a Producer; this has been working extremely well (for my needs, at least).
An SDP like this…
a=rid:r0 send
a=rid:r1 send
a=rid:r2 send
a=simulcast:send r0;r1;r2
…translates into this with the mediasoup utilities…
{
encodings: [
{ rid: 'r0', dtx: false },
{ rid: 'r1', dtx: false },
{ rid: 'r2', dtx: false }
]
}
…and mediasoup server becomes ready to handle 3 RTP streams.
However, if the sender also includes some temporal layers into each of those encodings, they don’t get signaled in the SDP, and mediasoup doesn’t know about them. This makes the mediasoup Consumer always stuck in the layer 0 (which is always the worst quality one, so in this case, the worst possible framerate).
-
Q1: Is this as of today still the situation? Or are there any recent advancements on this topic that I might have skipped?
-
Q2: Would it be possible at all for mediasoup to “detect” that there are multiple temporal layers on the stream? Or is that exclusively part of the binary stream data, thus mediasoup doesn’t have access to it?
Only solution I’ve found so far is to be explicit when telling mediasoup about the encodings
. This means that generating an RtpSendParameters
from SDP is no longer enough and adding scalabilityMode
by hand is required:
{
encodings: [
{ rid: 'r0', dtx: false, scalabilityMode: 'L1T3' },
{ rid: 'r1', dtx: false, scalabilityMode: 'L1T3' },
{ rid: 'r2', dtx: false, scalabilityMode: 'L1T3' }
]
}
However, the next task becomes knowing what string to put there. Again I’m finding out how web browsers don’t give you a clue about this, and the corresponding APIs are mostly not implemented even today (seems I’m not _that_ late to the party, after all!).
In Chrome, using RTCRtpTransceiver.sender.getParameters()
returns an encodings
array where scalabilityMode is nowhere to be seen.
Googling a bit, one finds forum posts where it seems common knowledge that Chrome is hardcoded to add 3 temporal layers to its simulcast streams. However there is no official WebRTC stance or documentation about this fact, that I could find. One can only hope that all browsers behave the same!!
Even mediasoup-client itself assumes 3 as the magic number e.g. for Chrome, Firefox, and Safari.
-
Q3: As of today, is that still the real situation? (or again I might have unknowingly skipped some recent developments in this space). Is the WebRTC SFU community as a whole just hardcoding
T3
in their scalabilityMode variables, and hoping that’s the correct value, fingers crossed?