Testing simulcast level switching

Hi,
I’m experimenting various settings for Simulcast. I found that in some cases, the spatial layer setting sent with setPreferredLayers is ignored. This always happens when the video source is a videoclip (not a real webcam source) sent as a fake system webcam, or using plain RTP transports.
I prepared a modified GStreamer broadcaster script (https://gist.github.com/vpalmisano/5716e8b52b258386fdd5c4fb93c97f11). You can test it using the well known Big Buck Bunny sequence (https://peach.blender.org/download/). In my local setup I found that level=0 is never sent, even if manually selected using setPreferredLayers.

1 Like

Does level=0 mean spatial 0 or temporal 0?

BTW the proper way to diagnose this is by checking server side Producer’s stats to see what the producer is currently sending. You’ll see an array of stats, one for each ongoing stream (if simulcast and if the sender is really sending N streams).

1 Like

Spatial. I’m using only one temporal layer in the GStreamer script.

producer.getStats():
[
  {
    "bitrate": 53859,
    "byteCount": 1448769,
    "firCount": 0,
    "fractionLost": 0,
    "jitter": 0,
    "kind": "video",
    "mimeType": "video/VP8",
    "nackCount": 0,
    "nackPacketCount": 0,
    "packetCount": 6497,
    "packetsDiscarded": 0,
    "packetsLost": 0,
    "packetsRepaired": 0,
    "packetsRetransmitted": 0,
    "pliCount": 0,
    "score": 10,
    "ssrc": 2222,
    "timestamp": 1627638221,
    "type": "inbound-rtp"
  },
  {
    "bitrate": 139811,
    "byteCount": 3926095,
    "firCount": 0,
    "fractionLost": 0,
    "jitter": 0,
    "kind": "video",
    "mimeType": "video/VP8",
    "nackCount": 0,
    "nackPacketCount": 0,
    "packetCount": 6897,
    "packetsDiscarded": 0,
    "packetsLost": 0,
    "packetsRepaired": 0,
    "packetsRetransmitted": 0,
    "pliCount": 0,
    "score": 10,
    "ssrc": 2223,
    "timestamp": 1627638221,
    "type": "inbound-rtp"
  },
{
    "bitrate": 467568,
    "byteCount": 12412844,
    "firCount": 0,
    "fractionLost": 0,
    "jitter": 0,
    "kind": "video",
    "mimeType": "video/VP8",
    "nackCount": 0,
    "nackPacketCount": 0,
    "packetCount": 12025,
    "packetsDiscarded": 0,
    "packetsLost": 0,
    "packetsRepaired": 0,
    "packetsRetransmitted": 0,
    "pliCount": 1,
    "score": 10,
    "ssrc": 2224,
    "timestamp": 1627638221,
    "type": "inbound-rtp"
  }
]
producer.dump():
{
  "id": "f4b3e65a-45f5-4093-8c1e-4aa39629263c",
  "kind": "video",
  "paused": false,
  "rtpMapping": {
    "codecs": [
      {
        "mappedPayloadType": 101,
        "payloadType": 101
      }
    ],
    "encodings": [
      {
        "mappedSsrc": 225588197,
        "rid": null,
        "ssrc": 2222
      },
      {
        "mappedSsrc": 225588198,
        "rid": null,
        "ssrc": 2223
      },
      {
        "mappedSsrc": 225588199,
        "rid": null,
        "ssrc": 2224
      }
    ]
  },
  "rtpParameters": {
    "codecs": [
      {
        "clockRate": 90000,
        "mimeType": "video/VP8",
        "parameters": {},
        "payloadType": 101,
        "rtcpFeedback": [
          {
            "parameter": "fir",
            "type": "ccm"
          },
          {
            "type": "nack"
          },
          {
            "parameter": "pli",
            "type": "nack"
          }
        ]
      }
    ],
    "encodings": [
      {
        "codecPayloadType": 101,
        "ksvc": false,
        "maxBitrate": 512000,
        "scalabilityMode": "S1T1",
        "spatialLayers": 1,
        "ssrc": 2222,
        "temporalLayers": 1
      },
      {
        "codecPayloadType": 101,
        "ksvc": false,
        "maxBitrate": 1024000,
        "scalabilityMode": "S1T1",
        "spatialLayers": 1,
        "ssrc": 2223,
        "temporalLayers": 1
      },
      {
        "codecPayloadType": 101,
        "ksvc": false,
        "maxBitrate": 2048000,
        "scalabilityMode": "S1T1",
        "spatialLayers": 1,
        "ssrc": 2224,
        "temporalLayers": 1
      }
    ],
    "headerExtensions": [],
    "rtcp": {
      "cname": "5ab6b651",
      "reducedSize": true
    }
  },
  "rtpStreams": [
    {
      "params": {
        "clockRate": 90000,
        "cname": "5ab6b651",
        "mimeType": "video/VP8",
        "payloadType": 101,
        "spatialLayers": 1,
        "ssrc": 2223,
        "temporalLayers": 1,
        "useDtx": false,
        "useFir": true,
        "useInBandFec": false,
        "useNack": true,
        "usePli": true
      },
      "score": 10
    },
    {
      "params": {
        "clockRate": 90000,
        "cname": "5ab6b651",
        "mimeType": "video/VP8",
        "payloadType": 101,
        "spatialLayers": 1,
        "ssrc": 2224,
        "temporalLayers": 1,
        "useDtx": false,
        "useFir": true,
        "useInBandFec": false,
        "useNack": true,
        "usePli": true
      },
      "score": 10
    }
  ],
  "traceEventTypes": "",
  "type": "simulcast"
}
1 Like

consumer.setPreferredLayers() will request a keyframe for the selected spatial layer (simulcast stream in this case) and become effective once such a keyframe is received in mediasoup, so I think that’s not happening.

You can use produce.enableTrace() to enable keyframes in the producer and check it. Documented in the Debugging section of mediasoup.

1 Like

I modified the GStreamer script to produce 1 keyframe per second, nothing changed.

I will try now.

Using the browser as client and a fake webcam video source, I found that only 2 (over a total of 3) pli trace events are captured when I try to change levels. In the case of regular webcam source, I got 3 pli events, and the layer is always changed. This happens only when the video is generated, maybe a browser problem?

This is the script I’m using to start a fake video source (on Linux): v4l2loopback_script · GitHub

Edit:
Using VP9, I get always 1 pli event (using SVC), but spatial layer is never changed as requested.

mediasoup sends a PLI for the ssrc of the stream for which it needs a keyframe. This’s been working fine for years. You can also use the trace event to log PLI requests sent by mediasoup to the sender:

In VP9 there is a single SSRC so a single stream.

1 Like

Found the problem: the fake webcam source resolution should be at least 1280x720 in order to make simulcast work in a 3 levels configuration, with [4, 2, 1] scaling. It may be the same if you have a webcam with resolution < 720p.

1 Like

Nice. But the the missing spatial layer (the one you could not receive) was 2 and not 0.

Level = 0 for me was the one with lowest resolution.

Exactly, so that’s always sent, but you said it wasn’t. Lowest one must be the first in the encodings array given to the server side Producer BTW

I think that when the fake webcam source sends a resolution that is less than the resolution requested by getUserMedia, the lowest level is not sent at all…

A question: do you know if there are some max/min limitation to the configurable parameters for simulcast? For instance, can I set scaleResolutionDownBy=16, or maxBitrate=10 ?

There are ugly issues in libwebrtc if you specify encodings with too low maxBitrate, such as having zero RTP as result. I reported one of them to libwebrtc. So caution.

No I don’t know these value limits.

1 Like

According to media/engine/simulcast.cc - external/webrtc - Git at Google, what it takes in account (at least in Chrome) is that the number of pixels (not resolution itself) is equal or higher to some standard resolution levels. Interest thing, according to media/engine/simulcast.cc - external/webrtc - Git at Google the minimal limit to send 3 spatial layers is at least 960x540 (wide NTSC) pixels, not 1280x720 pixels (HD).

@vpalmisano, what browser did you were using, Chrome or Firefox? Maybe Chrome changed the formats limits, or Firefox use other ones?

1 Like

I don’t know if Firefox applies the same configuration. If you comply the limits specified in the chromium source code, you always get exactly the number of spatial levels you set.

1 Like

So, we can be sure now that the 3 layers limit is 960x540, not 1280x720, isn’t it?

I’m interested on this because I’m using the actual on-screen sizes to adjust the layer being used, so if I have an on-screen rendered small video tag, I can ask for a lower resolution layer and reduce client-side bandwidth and CPU usage :slight_smile:

Yes, but this is strictly related to the chromium implementation.

Ok, I’ve searched for the Firefox implementation, and I think it’s located at VideoStreamFactory.cpp - mozsearch, or at least it’s a similar table. It doesn’t have defined the number of spatial or temporal layers to use by default, but instead number can be variable based mostly on the bandwith needed for the desired resolution (not the resolution itself), provided config, and current bandwidth constraints, so if I understood it correctly, it seems Firefox could potentially generate up to 7 layers with a UHD video, when Chrome limit it to max 3. Can somebody confirm this?

Firefox also tries to fulfill the requested number of layers, but layers will be ignored if scaled resolution is too much low that it dissapears, or scaled resolution doesn’t respect aspect ratio. Current bandwidth / resolution limits are:

// XXX Populate this based on a pref (which we should consider sorting because
// people won't assume they need to).
static VideoStreamFactory::ResolutionAndBitrateLimits
    kResolutionAndBitrateLimits[] = {
        // clang-format off
  {MB_OF(1920, 1200), KBPS(1500), KBPS(2000), KBPS(10000)}, // >HD (3K, 4K, etc)
  {MB_OF(1280, 720), KBPS(1200), KBPS(1500), KBPS(5000)}, // HD ~1080-1200
  {MB_OF(800, 480), KBPS(200), KBPS(800), KBPS(2500)}, // HD ~720
  {MB_OF(480, 270), KBPS(150), KBPS(500), KBPS(2000)}, // WVGA
  {tl::Max<MB_OF(400, 240), MB_OF(352, 288)>::value, KBPS(125), KBPS(300), KBPS(1300)}, // VGA
  {MB_OF(176, 144), KBPS(100), KBPS(150), KBPS(500)}, // WQVGA, CIF
  {0 , KBPS(40), KBPS(80), KBPS(250)} // QCIF and below
        // clang-format on
};

First columns are minimum resolution / bandwidth for each “level”. Values are different from the ones used in Chromium except 1280x720 (HD) and 480x270 (WVGA), but in both cases there are 7 entries, and in the case of Chrome, 480x270 is also the limit between one and two layers. I would need to check at least another implementation to see if it makes sense to have a common layers split, or do it per-browser and have a default one, of it default split layer would be senseless at all. What do you think?

1 Like