VP9 temporal scalability without SVC?

I think I managed to confuse mediasoup by trying to use S1T3 encoding with VP9, in which case it creates SvcConsumer and thus switching preferred temporal layers doesn’t work (has no effect, I always receive full 30 fps).

Is this something that is supposed to work at all or is VP9 currently hardcoded to only use SVC?

It is also possible that I messed up parsing/generating SDP since I’m working with endpoint that doesn’t have mediasoup-client.

Any help is appreciated.

Remote client offer:

v=0
o=- 3155737008791722389 0 IN IP4 0.0.0.0
s=-
t=0 0
a=ice-options:trickle
a=group:BUNDLE audio0 video1
m=audio 9 UDP/TLS/RTP/SAVPF 96
c=IN IP4 0.0.0.0
a=setup:actpass
a=ice-ufrag:rmPuC4e/GRT6Wb2A+NWJFAbtdHTUFlIh
a=ice-pwd:joXJONEnCJr/eYBiZ/ShuU6bonMv/a5C
a=rtcp-mux
a=rtcp-rsize
a=sendonly
a=rtpmap:96 OPUS/48000/2
a=rtcp-fb:96 nack pli
a=fmtp:96 sprop-maxcapturerate=48000;sprop-stereo=1
a=ssrc:3777738827 msid:user2123010147@host-73af8147 webrtctransceiver2
a=ssrc:3777738827 cname:user2123010147@host-73af8147
a=mid:audio0
a=fingerprint:sha-256 BA:C8:33:45:1B:CF:73:FE:EB:A9:CC:58:4C:A3:CE:7E:3E:16:C3:41:8F:11:87:92:A3:CF:9F:F7:3D:1B:83:B9
m=video 0 UDP/TLS/RTP/SAVPF 97 98
c=IN IP4 0.0.0.0
a=setup:actpass
a=ice-ufrag:rmPuC4e/GRT6Wb2A+NWJFAbtdHTUFlIh
a=ice-pwd:joXJONEnCJr/eYBiZ/ShuU6bonMv/a5C
a=bundle-only
a=rtcp-mux
a=rtcp-rsize
a=sendonly
a=rtpmap:97 VP9/90000
a=rtcp-fb:97 nack
a=rtcp-fb:97 nack pli
a=framerate:30
a=rtpmap:98 rtx/90000
a=fmtp:98 apt=97
a=ssrc-group:FID 2830271215 3679539510
a=ssrc:2830271215 msid:user2123010147@host-73af8147 webrtctransceiver3
a=ssrc:2830271215 cname:user2123010147@host-73af8147
a=ssrc:3679539510 msid:user2123010147@host-73af8147 webrtctransceiver3
a=ssrc:3679539510 cname:user2123010147@host-73af8147
a=mid:video1
a=fingerprint:sha-256 BA:C8:33:45:1B:CF:73:FE:EB:A9:CC:58:4C:A3:CE:7E:3E:16:C3:41:8F:11:87:92:A3:CF:9F:F7:3D:1B:83:B9

Mediasoup answer:

v=0
o=- 10000 1 IN IP4 0.0.0.0
s=-
t=0 0
a=ice-lite
a=fingerprint:sha-512 30:BB:6E:92:E4:7E:6E:AC:90:D9:AE:96:5C:C3:CC:45:E7:B3:BC:44:04:9C:9C:09:84:87:AB:C5:E3:A7:1C:82:A4:CB:C1:68:A8:6B:C4:B9:43:2C:28:DD:D0:7A:66:07:19:4B:C8:35:68:EE:1C:E1:4F:1C:2A:23:63:05:24:2C
a=msid-semantic: WMS *
a=group:BUNDLE audio0 video1
m=audio 7 UDP/TLS/RTP/SAVPF 96
c=IN IP4 127.0.0.1
a=rtpmap:96 OPUS/48000/2
a=fmtp:96 sprop-maxcapturerate=48000;sprop-stereo=1
a=rtcp-fb:96 nack pli
a=setup:passive
a=mid:audio0
a=msid:user2123010147@host-73af8147 e0cbe51a-41d8-4201-bf82-3e996d429490
a=recvonly
a=ice-ufrag:536tzgqompur43fz
a=ice-pwd:uknaktrd3azspooc71831msugz8d2pdh
a=candidate:udpcandidate 1 udp 1076302079 127.0.0.1 58263 typ host
a=candidate:udpcandidate 1 udp 1076276479 172.21.0.4 59707 typ host
a=end-of-candidates
a=ice-options:renomination
a=ssrc:3777738827 cname:user2123010147@host-73af8147
a=rtcp-mux
a=rtcp-rsize
m=video 7 UDP/TLS/RTP/SAVPF 97 98
c=IN IP4 127.0.0.1
a=rtpmap:97 VP9/90000
a=rtpmap:98 rtx/90000
a=fmtp:98 apt=97
a=rtcp-fb:97 nack
a=rtcp-fb:97 nack pli
a=setup:passive
a=mid:video1
a=msid:user2123010147@host-73af8147 54404f2a-fc3e-4b6e-acc2-34895de73209
a=recvonly
a=ice-ufrag:536tzgqompur43fz
a=ice-pwd:uknaktrd3azspooc71831msugz8d2pdh
a=candidate:udpcandidate 1 udp 1076302079 127.0.0.1 58263 typ host
a=candidate:udpcandidate 1 udp 1076276479 172.21.0.4 59707 typ host
a=end-of-candidates
a=ice-options:renomination
a=ssrc:2830271215 cname:user2123010147@host-73af8147
a=ssrc:3679539510 cname:user2123010147@host-73af8147
a=ssrc-group:FID 2830271215 3679539510
a=rtcp-mux
a=rtcp-rsize

Video producer options:

{
  "kind": "video",
  "rtpParameters": {
    "codecs": [
      {
        "clockRate": 90000,
        "mimeType": "video/VP9",
        "parameters": {},
        "payloadType": 97,
        "rtcpFeedback": [
          {
            "type": "nack",
            "parameter": ""
          },
          {
            "type": "nack",
            "parameter": "pli"
          }
        ]
      },
      {
        "clockRate": 90000,
        "mimeType": "video/rtx",
        "parameters": {
          "apt": 97
        },
        "payloadType": 98,
        "rtcpFeedback": []
      }
    ],
    "encodings": [
      {
        "codecPayloadType": 97,
        "rtx": {
          "ssrc": 3679539510
        },
        "scalabilityMode": "S1T3",
        "ssrc": 2830271215
      }
    ],
    "headerExtensions": [],
    "mid": "video1",
    "rtcp": {
      "cname": "user2123010147@host-73af8147",
      "mux": true,
      "reducedSize": true
    }
  }
}

From what I saw in the sources, the consumer type is derived from the encodings size and the number of spatial and temporal layers. As soon as there is exactly one encoding, the choice is only between SVC (if more than one spatial or temporal layers are specified) or simple consumer. For simulcast, there should be more than one encoding. Thus with your producer options, the router thinks it’s SVC. And what comes from the client must be not SVC at all.

Yes, the client sends just one spatial track, but with 3 temporal inside of it.
So it is not simulcast, but I’m not sure it is proper SVC either.

Actually this is codec dependant since VP8 also enables temporal layers when doing simulcast.

VP9 is hardcoded in libwebrtc. By default it generates 3 spatial and 3 temporal layers. There is a chromium flag to change then number of spatial and temporal layers in VP9, but no JS API yet, although there is a experimental fork of libwebrtc that allows passing the scalabilityMode option into each encoding this allows deciding number of spatial and temporal layers.

In mediasoup-client we do a terrible hack to enable VP9 SVC, look at latest Chrome handler.

That’s SVC actually.

That hack however is also turned on when more than one spatial layers are specified in the scalabilityMode.

Well, I’m using GStreamer to produce VP9 and there are no SVC-related options in it yet.
My suspicion is that this is it the mode which is similar or identical to VP8 (but with single S1T3 encoding instead of usual 2-3), and VP9 SVC is a separate thing (but this is a speculation, I didn’t dig deep enough).

What I do must be similar to this command with ffmpeg:

ffmpeg -i INPUT -c:v libvpx -ts-parameters ts_number_layers=3:\
ts_target_bitrate=250000,500000,1000000:ts_rate_decimator=4,2,1:\
ts_periodicity=4:ts_layer_id=0,2,1,2 OUTPUT

Why do you assume that GStreamer is producing a SVC encoded stream with 1 spatial layer and 3 temporal layers? I’d say it produces a single stream with no layers at all.

I’m saying it is not VP9 SVC, but rather similar to what VP8 does. However, mediasoup creates SvcConsumer and treats it as SVC.
ts_* parameters describe how many layers there will be, what bitrates, framerates and ordering by layer_id will be inside of the single stream.

So what is being produced is a single VP9 stream with frames that belong to temporal layers [0][2][1][2][0][2][1][2]and so on.

mediasoup will create a SvcConsumer if the Producer has a single encoding with scalabilityMode set and different than 1S1T.

I would have tried to add some fake encoding when passing RtpParameters to the [server-side] Transport.produce to force the worker to create a simulcast consumer, and see what it would do with the single stream.

mediasoup does not accept simulcast for VP9 since the codec (or existing encoders) do not support it.

Already tried that, but got this:

mediasoup:ERROR:Channel [pid:159 RTC::Producer::Producer() | throwing MediaSoupTypeError: video/VP9 codec not supported for simulcast +0ms
mediasoup:WARN:Channel request failed [method:transport.produce, id:14]: video/VP9 codec not supported for simulcast +30s

So simulcast is not going to work.

I don’t know what the problem is, but as far as you set a proper scalabilityMode in the Producer and the client sends those number of SVC layers, things will work. Check the Producer stats to see how many spatial/temporal layers the client is sending. Try passing S3T3 to tell mediasoup to be ready for 3 and 3. Then in the stats you’ll see how many the client is really sending:

Never sees more than 1 temporal layer:

[
  {
    bitrate: 3187738,
    bitrateByLayer: { '0.0': 3187738, '0.1': 0, '0.2': 0 },
    byteCount: 1940171,
    firCount: 0,
    fractionLost: 0,
    jitter: 0,
    kind: 'video',
    mimeType: 'video/VP9',
    nackCount: 0,
    nackPacketCount: 0,
    packetCount: 1468,
    packetsDiscarded: 0,
    packetsLost: 0,
    packetsRepaired: 0,
    packetsRetransmitted: 0,
    pliCount: 3,
    score: 10,
    ssrc: 646385938,
    timestamp: 3326382870,
    type: 'inbound-rtp'
  }
]

Or with S3T3:

[
  {
    bitrate: 4369437,
    bitrateByLayer: {
      '0.0': 4369437,
      '0.1': 0,
      '0.2': 0,
      '1.0': 0,
      '1.1': 0,
      '1.2': 0,
      '2.0': 0,
      '2.1': 0,
      '2.2': 0
    },
    byteCount: 4395350,
    firCount: 0,
    fractionLost: 0,
    jitter: 0,
    kind: 'video',
    mimeType: 'video/VP9',
    nackCount: 0,
    nackPacketCount: 0,
    packetCount: 3280,
    packetsDiscarded: 0,
    packetsLost: 0,
    packetsRepaired: 0,
    packetsRetransmitted: 0,
    pliCount: 3,
    score: 10,
    ssrc: 2030448534,
    timestamp: 3326842168,
    type: 'inbound-rtp'
  }
]

Well, so the client is just sending a basic stream with layers, that’s all. So don’t even signal any scalabilityMode and mediasoup will create a SimpleConsumer.

Well, I do want to get layers, will tinker with the client that generates VP9 some more then, thanks!

From what I know, some informations about the SVC setup should be encoded into the RTP header (modules/rtp_rtcp/source/rtp_format_vp9.cc - external/webrtc - Git at Google) and then decoded by Medisoup (mediasoup/VP9.cpp at v3 · versatica/mediasoup · GitHub).
@nazar-pc are you using rtpvp9pay for payloading?

Yes, rtpvp9pay and yes, RTP part in it is missing here then.
Video itself has multiple layers, but there is no way for mediasoup to know about it.

I think that the problem is in the missing support for SVC headers in rtpvp9pay GStreamer element.
You should report the issue to GStreamer developers.
In the meantime, you could mangle the rtp header created by rtpvp9pay adding the required fields.