Track if Producers or Consumers are alive

Hi all,
in some cases (e.g. to clean up “stale” transports in the case a client is not sending or receiving rtp packets anymore after a timeout), it will be useful to keep track if a Producer or a Consumer is receiving / sending packets. I tried enableTraceEvent but I think this could degrade the server performance.
How about introducing a lastRtpProcessedTime property (or similar) for each Producer and Consumer? Or, more better, a rtpInactive event?
Thanks

2 Likes

I’m tying to use transport.getStats() and track bytesSent and bytesReceived values.
I’m seeing that, even if the browser who created the transport instance is closed (without stopping the transport instance on server), the bytesSent value is incremented periodically (RTP probe packets?).

They may be RTP probation packets or regular RTP packets. If the server side transport and/or its consumers were not closed, it’s expected that mediasoup will keep sending RTP to the remote endpoint.

This would be as inneficient as using the trace event, since we should send a message from C++ to Node for every received/sent RTP packet to tell about such a lastRtpProcessedTime.

Sounds better, but this is not so easy. What about simulcast streams? What about if no RTP is sent but just SCTP messages? Is it a per Producer event or a per Transport event?

Could you please describe a bit more your scenario & use case and why this is needed for you?

I need a mechanism to cleanup the transports that are not sending or receiving data from the corresponding client endpoint. For the moment, I’m using this code that tracks the bytesReceived value:

    transport.appData.stats = {
      bytesSent: 0,
      bytesReceived: 0,
      lastActivity: 0,
    };
    transport.appData._getStats_t = setInterval(async () => {
      const stats = await transport.getStats();
      const now = Date.now();
      transport.appData.stats.bytesSent = stats[0].bytesSent;
      if (stats[0].bytesReceived > transport.appData.stats.bytesReceived) {
        transport.appData.stats.bytesReceived = stats[0].bytesReceived;
        transport.appData.stats.lastActivity = now;
      } else if (now - transport.appData.stats.lastActivity > 10 * 1000) {
        // inactivity detected, close the transport
        await transport.close();
      }
    }, 5 * 1000);

I also believe such a mechanism would be beneficial. I use the signaling mechanism to detect if a peer has disconnected, which works but it’s not ideal.

Please open a feature request in GitHub and we’ll consider it when possible (right now terribly busy).

1 Like

BTW, isn’t receiving a score of 0 in producer.on(“score”, fn(score)) and consumer.on(“score”, fn(score)) an indication that those peers are not sending/receiving anything and thus can provide a mechanism for cleaning up?

@copiltembel: I think it is not ideal: from my tests when I stop a consumer (closing the browser window without sending any message to the signaling service) I get no score events on server side.

@admins: the problem could be resolved implementing something like the AudioLevelObserver (https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/AudioLevelObserver.cpp).

AudioLevelObserver might not be a better approach in my opinion since it might require some CPU etc. But you can do something along the lines of what mediasoup-demo does. It uses a websocket for every connected participant and when the websocket disconnects - tab closes - the server lets go of all the resources for that participant - consumers and producers.

Yes, this is the ideal case. But what if the websocket server fails to handle the client disconnection?

My opinion is that your effort should go on making that client disconnection handling reliable in that case.

Looking at the RTP reception in order to know if an endpoint is alive is not meaningful unless you control the nature of your endpoints and your mediasoup usage. If you are certain that N (lets say 10) seconds of RTP inactivity can only mean that the endpoint is not alive then there you are. Of course you must consider probation, etc in case it’s being used.

For more generic and reliable situations I would not rely on RTP reception for considering an endpoint not alive, but use another means for checking periodically whether the endpoint is alive.

Then you have a bigger problem, since without signaling, your app will loose notifications about new consumers, etc.

We cannot do magic with RTP activity. RTP can become 0 if DTX is used (Discontinued Transmission). Nowhere is told that RTP must be continuos. We cannot assume at server side that eventual lack of RTP activity means an error. For instance, if you mute your mic or share a static video content, it’s perfectly fine that your endpoint does not send any new RTP packet until a change happens.

If you want to monitor from server side whether the remote endpoint is alive or not (at WebRTC / ICE level), you can do that with some kind of ping/pong messages over DataChannel.

Maybe the issue here is assuming a strong coupling between an actual fact (no RTP packets have been seen in N seconds) and its particular semantics (a remote endpoint has abruptly stopped sending data and should be considered dead?.. but that’s just one possible interpretation…).

Just to give another perspective of how this could work; in Kurento we have a flow detection probe that is connected to the audio and video input/output pads of the WebRtcEndpoints. These probes simply trigger an event when data is actually flowing in (or out) to/from these Endpoint pads to/from the rest of the internal pipeline, and the event represents one of two state changes:

  • From Not Flowing to Flowing: if the last state had been “Not Flowing” but now a data buffer has passed through the pad.
  • From Flowing to Not Flowing: if the last state had been “Flowing” but now 2 seconds have elapsed without any other buffer passing through the pad.

Now, it is the application the one to decide what to do with these events, but at least having them allows for some fine-grained analysis and allows the app to detect unexpected situations.

Having these events has proven very useful to provide applications with confirmation coming directly from the media plane (in combination with app-specific logic from the signaling plane). Of course, trivial logic such as assuming “Not Flowing == Disconnected” is too simplistic. But more detailed logic can be programmed in order to check the facts and decide what to do.

(e.g. people using this event for the main speaker teacher’s endpoint in a teacher-student app, it typically is adequate to assume that their flow should not suddenly halt in the middle of the session, so if a NotFlowing event occurs, their endpoint gets reconnected. However this doesn’t -necessarily- apply to a screenshare, because there might be NotFlowing events if the desktop is static)

I agree with what ibc wrote. I implement this using a ping/pong over the websocket and time it out after a few seconds of inactivity on both the client and server end. I also hook in a reconnection attempt with the timeout, websocket close/error, and reconnection timeout events and it works surprisingly well. When a client is a mobile device and transitions from cellular to wifi, for example, the reconnection happens within seconds and there’s minimal downtime. Works better than popular video conferencing apps in my own biased opinion.

Unfortunately this method gets pretty janky on unstable connections (lots of unnecessary reconnections/interruptions because I have the timeout set pretty low). However, given the poor signal quality I’m sure the video would be garbage anyway even if I wasn’t forcibly closing it.

I use a similar method with socketio timeouts (set to 30 seconds). Also, if the socket has ping timeout or a transport error, I allow 10 seconds for the client to reconnect on another socket and reclaim all the transports. If the socket disconnects gracefully, I clean up all the transports immediately.

All of my clients are on mobile devices so changing networks and IP addresses is pretty common. This method helps with the “walk-out-the-door” problem.

That’s a cool way of doing it. When you reclaim existing transports, do you check if the client’s IP is identical to make sure they have the correct transport endpoint? I can see a mobile client dropping off wifi, getting a cellular connection right away and reconnecting, but incoming stream are being sent to the old endpoint.

I think if your server detects the client IP address has changed, you’re supposed to restart ICE.

Alternatively, If you’re using the native libwebrtc library (not through the browser), you can set an option called “continual gathering”. This isn’t a standardized part of WebRTC and I’m not sure it would even work with mediasoup, but it’s supposed to make switching networks more graceful.
https://webrtc.googlesource.com/src/+/refs/heads/master/sdk/objc/api/peerconnection/RTCConfiguration.h#53

That being said, I don’t do any of these and everything still seems to work. It’s on my TODO list to dig into why. Maybe MediaSoup doesn’t care of the source IP address on the selected tuple changes mid-stream? I dunno. If anyone else on this thread has some insights on this, I’d love to hear.