PlainTransport Recovery from RTP interruption

We’re uncertain if this is a demo app issue or a mediasoup library characteristic…we suspect it’s in the C code.

We’re broadcasting a PlainTransport simulcast stream over RTP/UDP + RTCP/UDP into a room. Everything works as expected, and we’ve been working with this scenario for many weeks.

Recently we began to test if our inbound UDP stream paused, and we’ve encountered a situation that we cannot seem to configure our way out of…

If the inbound RTP/UDP stream is paused momentarily (even for just a second, and even for just a second if dtx is set to true), the stream never recovers within mediasoup.

What we see is a server side (mediasoup) announcement that RTP inactivity has been detected, and all scores for the various SSRC streams are set to 0 (which is expected), but it doesn’t kick back in.

Watching the REMOTE video stats, we can see that before the intentional break in the UDP traffic, the SSRC stream is tracked, but it is removed after the break, even though the stream continues.

Our ultimate goal is to allow a broadcast to be “in the room” all the time…meaning, we’d like active, open inbound UDP ports for RTP and RTCP for video and audio, regardless of the status of the streams, so if we pause for several seconds (or even minutes), and then resume the stream, the broadcast would continue.

We’ve tested this same scenario using gstreamer point to point, using the same pathway, causing the same break in RTP/UDP traffic and then restoring it, and although we get artifacts (expected), it continues as expected.

We would assume the open ports would remain available for inbound RTP traffic.

With mediasoup, it seems to place the broadcast in some kind of state where it won’t recover even though traffic is present (we’re watching it in wireshark).

Here’s before…

and here’s after…(well, we can’t add more than 1 pic, but trust me, the second block referencing ssrc 2221 is gone)

We’d love to figure out how to prevent the broadcast from stopping - the actual on-screen display of the producer is there on the demo app, but it doesn’t recognize an inbound srcc after a brief pause in the traffic.

mediasoup does not handle simulcast streams differently depending on whether they use a WebRtcTransport or a PlainTransport. Simulcast handling works obviously fine for WebRTC clients (browsers) and hence I’m pretty sure you are having a problem in the way you are producing simulcast into a PlainTransport. It may be related to wrong CSeq values in the RTP packets after you “resume” them, or maybe the RTCP Sender Reports you are sending to mediasoup do not have the same synchronization source/timing, so it’s not possible for the SimulcastConsumer to switch between them because those streams are not properly correlated.

Tip here: sending simulcast is not just about sending 3 separate streams.

So here some tips IMHO you should check before assuming there is a bug in mediasoup (there is not, I promise):

  1. Just produce simulcast via plain transport. Don’t even consume it yet.
  2. Enable rtp and rtcp log tags in the Worker and check them when you “pause” and “resume” your simulcast streams over a plain transport.
  3. Check the server side producer.getStats() before pausing and after resuming. You should se proper increasing bitrate in all the simulcast streams.

If everything is good (this is, if stats are ok and you do NOT see any rtp or rtcp debugging error in the logs while pausing/resuming, repeat the scenario by creating a Consumer to consume it:

  1. Ensure the Consumer is initially consuming the highest simulcast stream via consumer.currentLayers.
  2. Call consumer.setPreferredLayers({ spatialLayer: 0 }) to just receive the lowest stream. Does it work? Does the consumer side properly receive it?
  3. Call consumer.setPreferredLayers({ spatialLayer: 3 or max value }) to receive the highest stream (assuming the producer is sending it). Does it work? Does the consumer side properly receive it?
  4. If bullets 2 or 3 do not work, enable simulcast and bwe log tags in the Worker and check the logs.

Some other questions:

  • Is you plain RTP sender properly handling RTCP PLI or FIR requests from mediasoup to generate key frames? This is needed for SimulcastConsumer to be able to switch from one stream to another.
  • Does your RTP sender really implement PLI or FIR and are you announcing it properly into the rtpParameters given to plainTransport.produce()?
  • Does it work if you DO NOT produce simulcast but a single stream and you pause and resume the server side Consumer?
  • Without pausing the RTP sender and with simulcast enabled on it: does it work if work if you call consumer.pause() and then consumer.resume() in server side?

I appreciate the debugging support.

Let me preface that we’ve been sending simulcast RTP + RTCP streams into mediasoup successfully for weeks. We can see all of the spatial layers, and the consumer can adjust them on the fly, as well as the system does detect any delays or packet loss and selects the appropriate layer automatically.

That all works. Kudos!!

Our “broadcast” is performed by first opening up a room using the demo app, so we have a room already established, then we go through the handshake over REST to set up the broadcast ports and send it along for the ride.

Our broadcast is generated using gstreamer as an outside source, very similar to your gstreamer.sh example, but a bit more robust.

This too all works flawlessly.

Finally, before I provide this information based on your debugging advice (which is a work in progress), I want to be clear we’re not saying there’s a bug within mediasoup. There may simply be a behavior that works one particular way, and we’re trying to understand it in the event we want to tune it to our specific needs.

By setting the logLevel and logTags accordingly, we did see some interesting information.

I do believe the information is enlightening however…

For our test, we’re only interrupting one of the streams - we send 4 of them…

A multiplexed Video RTP stream (0), with each stream having a separate SSRC.
A Video RTCP stream (1), also multiplexed for each SSRC.
A non-multiplexed Audio RTP stream (2)
A companion Audio RTCP stream (3).

So we use 4 ports that are typical in this setup.

Again, these all work exactly as expected…until we sever the connection…for this experiment, we only broke the connection for stream 0 - which means the Video RTCP stream remained in tact along with the Audio streams.

On initial transmission, we see the following…

mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | setting RTP tuple (comedia mode enabled) +2s
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | [encodingIdx:1, ssrc:2221, rid:, payloadType:96] +0ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | DTX enabled +0ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | [encodingIdx:0, ssrc:2220, rid:, payloadType:96] +0ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | DTX enabled +0ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | [encodingIdx:2, ssrc:2222, rid:, payloadType:96] +1ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | DTX enabled +0ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | [encodingIdx:3, ssrc:2223, rid:, payloadType:96] +0ms
mediasoup:Channel [pid:27541] RTC::Producer::CreateRtpStream() | DTX enabled +0ms

Looks good.

On intentional break of stream 0…

mediasoup-demo-server:Room transport "trace" event [transportId:d9b5a7cf-8a68-402b-b09f-df2c575abd91, trace.type:bwe, trace:{ direction: 'out', info: { availableBitrate: 5400000, desiredBitrate: 4000000, effectiveDesiredBitrate: 4000000, maxBitrate: 5400000, maxPaddingBitrate: 4590000, minBitrate: 30000, startBitrate: 5400000, type: 'transport-cc' }, timestamp: 2782941264, type: 'bwe' }] +184ms
mediasoup:WARN:Channel [pid:27541] RTC::RtpStreamRecv::OnTimer() | RTP inactivity detected, resetting score to 0 [ssrc:2222] +2m
mediasoup:WARN:Channel [pid:27541] RTC::RtpStreamRecv::OnTimer() | RTP inactivity detected, resetting score to 0 [ssrc:2223] +1ms
mediasoup:WARN:Channel [pid:27541] RTC::RtpStreamRecv::OnTimer() | RTP inactivity detected, resetting score to 0 [ssrc:2221] +0ms
mediasoup:WARN:Channel [pid:27541] RTC::RtpStreamRecv::OnTimer() | RTP inactivity detected, resetting score to 0 [ssrc:2220] +0ms
mediasoup-demo-server:Room protoo Peer "request" event [method:getTransportStats, peerId:wr1eqafd] +2s

As expected…

Now, resuming the stream is not starting the stream over, it’s simply us allowing the data to flow again - the source streams have not stopped…so any timing or sequence numbers in the RTP and RTCP packets should be aligned.

After resuming stream 0 (allowing it to flow to the port again), we see this output flying up the screen, it literally overtakes the debug output and continues infinitely until the room is destroyed…

mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | ignoring RTP packet from unknown IP:port +0ms
mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | ignoring RTP packet from unknown IP:port +0ms
mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | ignoring RTP packet from unknown IP:port +0ms
mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | ignoring RTP packet from unknown IP:port +0ms
mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | ignoring RTP packet from unknown IP:port +0ms
mediasoup:Channel [pid:27541] RTC::PlainTransport::OnRtpDataReceived() | ignoring RTP packet from unknown IP:port +0ms

That’s really interesting, as we don’t use the data channel at all, and we’ve not renegotiated anything - we’ve just interrupted the Video RTP output going to the inbound mediasoup port (in this case 46565) for about 8 seconds. All the other streams were flowing.

Incidentally, if we turn dtx off, and we stop the stream for just 1 second, and restart, the same thing happens.

So I think I may know what’s causing this, so let me explain and maybe you can help me understand if there are any configuration options we should examine…

In the evolution of our system, we’ve introduced a middleware server that handles some stream negotiations…

Our source stream is at location A which streams into our middleware, and then the middleware takes the stream and send it to B (mediasoup).

In our “disconnect” operation, we sever the B side - so it goes from gstreamer into A, then A to B, then B to mediasoup, and we temporarily shut down B (although we’ve also done this test shutting down the A side).

So for example…

Let’s say we took stream 0, and we piped that into port 39000, which sends to to our middleware on port 3900, and then our middleware takes the stream and sends it to Port X on mediasoup.

My guess is if we shut down our A side or B side, they renegotiate, and by doing so, the connection on the B side to Port X is a different source port. It’s the same IP, it’s the same destination (Port X), but because A reconnects to B, we don’t force our “source port” a certain way, so when the stream continues to Port X our IP:source port combination is different.

That would explain why it doesn’t pick up the stream. It stops it in its tracks.

Sounds like to resolve this issue, we need to fix our source port side on the B side.

It seems that, when resuming gstreamer, it’s sending RTP from a new port, which is invalid. Even when comedia mode is set in plain transport, it just accepts RTP from the first IP:port from which it received the first RTP packet.

1 Like

Correct. I was typing my response above when you responded!

Right on!

I’m going to see if we can modify our relay code to bind our source port on the B side - if so, then I would think the stream can resume.

This makes sense now, because the same issue we saw on the Audio side, and it would have nothing to do with simulcast or PLI requests.

I’ll try that and report back.

Thanks so much ibc!!

Just learning how to debug this thing is priceless.

1 Like

That worked.

We modified how we bind on the output side, so when we resume it maintains the same original source port and the system just picked back up where it left off.

Many thanks ibc.

Nice :slight_smile: