Testing simulcast level switching

vpalmisano · May 18, 2020, 2:36pm

Hi,
I’m experimenting various settings for Simulcast. I found that in some cases, the spatial layer setting sent with setPreferredLayers is ignored. This always happens when the video source is a videoclip (not a real webcam source) sent as a fake system webcam, or using plain RTP transports.
I prepared a modified GStreamer broadcaster script (https://gist.github.com/vpalmisano/5716e8b52b258386fdd5c4fb93c97f11). You can test it using the well known Big Buck Bunny sequence (https://peach.blender.org/download/). In my local setup I found that level=0 is never sent, even if manually selected using setPreferredLayers.

ibc · May 18, 2020, 2:45pm

Does level=0 mean spatial 0 or temporal 0?

BTW the proper way to diagnose this is by checking server side Producer’s stats to see what the producer is currently sending. You’ll see an array of stats, one for each ongoing stream (if simulcast and if the sender is really sending N streams).

vpalmisano · May 18, 2020, 2:47pm

Spatial. I’m using only one temporal layer in the GStreamer script.

producer.getStats():
[
  {
    "bitrate": 53859,
    "byteCount": 1448769,
    "firCount": 0,
    "fractionLost": 0,
    "jitter": 0,
    "kind": "video",
    "mimeType": "video/VP8",
    "nackCount": 0,
    "nackPacketCount": 0,
    "packetCount": 6497,
    "packetsDiscarded": 0,
    "packetsLost": 0,
    "packetsRepaired": 0,
    "packetsRetransmitted": 0,
    "pliCount": 0,
    "score": 10,
    "ssrc": 2222,
    "timestamp": 1627638221,
    "type": "inbound-rtp"
  },
  {
    "bitrate": 139811,
    "byteCount": 3926095,
    "firCount": 0,
    "fractionLost": 0,
    "jitter": 0,
    "kind": "video",
    "mimeType": "video/VP8",
    "nackCount": 0,
    "nackPacketCount": 0,
    "packetCount": 6897,
    "packetsDiscarded": 0,
    "packetsLost": 0,
    "packetsRepaired": 0,
    "packetsRetransmitted": 0,
    "pliCount": 0,
    "score": 10,
    "ssrc": 2223,
    "timestamp": 1627638221,
    "type": "inbound-rtp"
  },
{
    "bitrate": 467568,
    "byteCount": 12412844,
    "firCount": 0,
    "fractionLost": 0,
    "jitter": 0,
    "kind": "video",
    "mimeType": "video/VP8",
    "nackCount": 0,
    "nackPacketCount": 0,
    "packetCount": 12025,
    "packetsDiscarded": 0,
    "packetsLost": 0,
    "packetsRepaired": 0,
    "packetsRetransmitted": 0,
    "pliCount": 1,
    "score": 10,
    "ssrc": 2224,
    "timestamp": 1627638221,
    "type": "inbound-rtp"
  }
]

producer.dump():
{
  "id": "f4b3e65a-45f5-4093-8c1e-4aa39629263c",
  "kind": "video",
  "paused": false,
  "rtpMapping": {
    "codecs": [
      {
        "mappedPayloadType": 101,
        "payloadType": 101
      }
    ],
    "encodings": [
      {
        "mappedSsrc": 225588197,
        "rid": null,
        "ssrc": 2222
      },
      {
        "mappedSsrc": 225588198,
        "rid": null,
        "ssrc": 2223
      },
      {
        "mappedSsrc": 225588199,
        "rid": null,
        "ssrc": 2224
      }
    ]
  },
  "rtpParameters": {
    "codecs": [
      {
        "clockRate": 90000,
        "mimeType": "video/VP8",
        "parameters": {},
        "payloadType": 101,
        "rtcpFeedback": [
          {
            "parameter": "fir",
            "type": "ccm"
          },
          {
            "type": "nack"
          },
          {
            "parameter": "pli",
            "type": "nack"
          }
        ]
      }
    ],
    "encodings": [
      {
        "codecPayloadType": 101,
        "ksvc": false,
        "maxBitrate": 512000,
        "scalabilityMode": "S1T1",
        "spatialLayers": 1,
        "ssrc": 2222,
        "temporalLayers": 1
      },
      {
        "codecPayloadType": 101,
        "ksvc": false,
        "maxBitrate": 1024000,
        "scalabilityMode": "S1T1",
        "spatialLayers": 1,
        "ssrc": 2223,
        "temporalLayers": 1
      },
      {
        "codecPayloadType": 101,
        "ksvc": false,
        "maxBitrate": 2048000,
        "scalabilityMode": "S1T1",
        "spatialLayers": 1,
        "ssrc": 2224,
        "temporalLayers": 1
      }
    ],
    "headerExtensions": [],
    "rtcp": {
      "cname": "5ab6b651",
      "reducedSize": true
    }
  },
  "rtpStreams": [
    {
      "params": {
        "clockRate": 90000,
        "cname": "5ab6b651",
        "mimeType": "video/VP8",
        "payloadType": 101,
        "spatialLayers": 1,
        "ssrc": 2223,
        "temporalLayers": 1,
        "useDtx": false,
        "useFir": true,
        "useInBandFec": false,
        "useNack": true,
        "usePli": true
      },
      "score": 10
    },
    {
      "params": {
        "clockRate": 90000,
        "cname": "5ab6b651",
        "mimeType": "video/VP8",
        "payloadType": 101,
        "spatialLayers": 1,
        "ssrc": 2224,
        "temporalLayers": 1,
        "useDtx": false,
        "useFir": true,
        "useInBandFec": false,
        "useNack": true,
        "usePli": true
      },
      "score": 10
    }
  ],
  "traceEventTypes": "",
  "type": "simulcast"
}

ibc · May 18, 2020, 2:55pm

consumer.setPreferredLayers() will request a keyframe for the selected spatial layer (simulcast stream in this case) and become effective once such a keyframe is received in mediasoup, so I think that’s not happening.

You can use produce.enableTrace() to enable keyframes in the producer and check it. Documented in the Debugging section of mediasoup.

vpalmisano · May 18, 2020, 2:59pm

I modified the GStreamer script to produce 1 keyframe per second, nothing changed.

I will try now.

vpalmisano · May 18, 2020, 3:12pm

Using the browser as client and a fake webcam video source, I found that only 2 (over a total of 3) pli trace events are captured when I try to change levels. In the case of regular webcam source, I got 3 pli events, and the layer is always changed. This happens only when the video is generated, maybe a browser problem?

This is the script I’m using to start a fake video source (on Linux): v4l2loopback_script · GitHub

Edit:
Using VP9, I get always 1 pli event (using SVC), but spatial layer is never changed as requested.

ibc · May 18, 2020, 3:47pm

mediasoup sends a PLI for the ssrc of the stream for which it needs a keyframe. This’s been working fine for years. You can also use the trace event to log PLI requests sent by mediasoup to the sender:

In VP9 there is a single SSRC so a single stream.

vpalmisano · May 18, 2020, 5:25pm

Found the problem: the fake webcam source resolution should be at least 1280x720 in order to make simulcast work in a 3 levels configuration, with [4, 2, 1] scaling. It may be the same if you have a webcam with resolution < 720p.

ibc · May 18, 2020, 5:42pm

Nice. But the the missing spatial layer (the one you could not receive) was 2 and not 0.

vpalmisano · May 18, 2020, 5:45pm

Level = 0 for me was the one with lowest resolution.

ibc · May 18, 2020, 5:51pm

Exactly, so that’s always sent, but you said it wasn’t. Lowest one must be the first in the encodings array given to the server side Producer BTW

vpalmisano · May 18, 2020, 5:55pm

I think that when the fake webcam source sends a resolution that is less than the resolution requested by getUserMedia, the lowest level is not sent at all…

vpalmisano · May 18, 2020, 5:58pm

A question: do you know if there are some max/min limitation to the configurable parameters for simulcast? For instance, can I set scaleResolutionDownBy=16, or maxBitrate=10 ?

ibc · May 18, 2020, 6:22pm

There are ugly issues in libwebrtc if you specify encodings with too low maxBitrate, such as having zero RTP as result. I reported one of them to libwebrtc. So caution.

No I don’t know these value limits.

piranna · May 12, 2021, 10:03am

According to media/engine/simulcast.cc - external/webrtc - Git at Google, what it takes in account (at least in Chrome) is that the number of pixels (not resolution itself) is equal or higher to some standard resolution levels. Interest thing, according to media/engine/simulcast.cc - external/webrtc - Git at Google the minimal limit to send 3 spatial layers is at least 960x540 (wide NTSC) pixels, not 1280x720 pixels (HD).

@vpalmisano, what browser did you were using, Chrome or Firefox? Maybe Chrome changed the formats limits, or Firefox use other ones?

vpalmisano · May 12, 2021, 10:08am

I don’t know if Firefox applies the same configuration. If you comply the limits specified in the chromium source code, you always get exactly the number of spatial levels you set.

piranna · May 12, 2021, 10:19am

So, we can be sure now that the 3 layers limit is 960x540, not 1280x720, isn’t it?

I’m interested on this because I’m using the actual on-screen sizes to adjust the layer being used, so if I have an on-screen rendered small video tag, I can ask for a lower resolution layer and reduce client-side bandwidth and CPU usage

vpalmisano · May 12, 2021, 10:39am

Yes, but this is strictly related to the chromium implementation.

piranna · May 12, 2021, 3:12pm

Ok, I’ve searched for the Firefox implementation, and I think it’s located at VideoStreamFactory.cpp - mozsearch, or at least it’s a similar table. It doesn’t have defined the number of spatial or temporal layers to use by default, but instead number can be variable based mostly on the bandwith needed for the desired resolution (not the resolution itself), provided config, and current bandwidth constraints, so if I understood it correctly, it seems Firefox could potentially generate up to 7 layers with a UHD video, when Chrome limit it to max 3. Can somebody confirm this?

Firefox also tries to fulfill the requested number of layers, but layers will be ignored if scaled resolution is too much low that it dissapears, or scaled resolution doesn’t respect aspect ratio. Current bandwidth / resolution limits are:

// XXX Populate this based on a pref (which we should consider sorting because
// people won't assume they need to).
static VideoStreamFactory::ResolutionAndBitrateLimits
    kResolutionAndBitrateLimits[] = {
        // clang-format off
  {MB_OF(1920, 1200), KBPS(1500), KBPS(2000), KBPS(10000)}, // >HD (3K, 4K, etc)
  {MB_OF(1280, 720), KBPS(1200), KBPS(1500), KBPS(5000)}, // HD ~1080-1200
  {MB_OF(800, 480), KBPS(200), KBPS(800), KBPS(2500)}, // HD ~720
  {MB_OF(480, 270), KBPS(150), KBPS(500), KBPS(2000)}, // WVGA
  {tl::Max<MB_OF(400, 240), MB_OF(352, 288)>::value, KBPS(125), KBPS(300), KBPS(1300)}, // VGA
  {MB_OF(176, 144), KBPS(100), KBPS(150), KBPS(500)}, // WQVGA, CIF
  {0 , KBPS(40), KBPS(80), KBPS(250)} // QCIF and below
        // clang-format on
};

First columns are minimum resolution / bandwidth for each “level”. Values are different from the ones used in Chromium except 1280x720 (HD) and 480x270 (WVGA), but in both cases there are 7 entries, and in the case of Chrome, 480x270 is also the limit between one and two layers. I would need to check at least another implementation to see if it makes sense to have a common layers split, or do it per-browser and have a default one, of it default split layer would be senseless at all. What do you think?

Topic		Replies	Views
Spatial layers has lower resolution Integration	17	1893	May 14, 2021
Only 2 out of 3 simulcast layers are available mediasoup-demo	5	877	July 9, 2021
Simulcast issues mediasoup libraries	3	449	November 4, 2021
Using PlainRtpTransport for Simulcast ingesting mediasoup libraries	29	3077	December 2, 2019
Simulcast behavior with Firefox producer mediasoup libraries	4	387	January 4, 2021

Testing simulcast level switching

Related topics