Transport connectionstate never changes to 'failed', stuck on 'disconnected'

I am working on re-connectivity during the call to connect user back to call if peer connection failed because of network conditions.

The issue is during the call when I turn off network then transport connection state changes to ‘disconnected’ but never goes to ‘failed’ state.

I have checked the underlying peer connection using chrome://webrtc-internal, it correctly goes to failed state after disconnected state.

Can you please guide me what can be the issue?

Also I was able to access original RtcPeerConnection object under transport > handler > _pc, I can add connectionstatechange event on that. Is this reliable? It will be available in all cases?

Just my two cents:

When a user requests for a transport, there should ideally be a timeout setup to safely close the transport and as well alert the user of an audio/video could not be connected at this time error; try again.
If the transport is not connected within several seconds to throw this timeout.

With each transport requiring to be connected once or be closed, you can watch at a client level the connectionstate and if that hits disconnected you can wait several seconds for recovery (unless they closed broadcast/etc) if recovery doesn’t occur and the stream is still hosted you can request ice every so often till connection returns to solid.

If user disconnects entirely from websocket, you should completely destroy any transports they were using.

2 Likes

I was thinking of this one, I think it should do the job as usually connectionstate goes to failture after disconnected state after around 7 seconds. But why the transport doesn’t change connection state to ‘failure’ while the underlying peer connection has gone to ‘failure’ state.

You are talking about transport.restartIce()? So in case of failture in transport I should restartIce on both server and client and it will renegotiate the peer connection and streams will start to flow right? And I don’t need to manually close and reopen the transport again to make the streams flow again right?

Is this reliable way of accessing base RtcPeerConnection? I couldn’t find anything in docs about this?

Here’s an example:

            transport.appData.idletimer = setTimeout(async (wss, room, handle) => {
                console.log("User had disconnected before transport connected. Alert them to request again.");
                if (Transports.has(room)) {
                    Transports.get(room).delete(handle);
                    if (Transports.get(room).size == 0) Transports.delete(room);
                }
                await transport.close();
                wss.send(JSON.stringify({
                    "chat": "publish",
                    "type": "closed",
                    "room": room,
                    "handle": handle
                }));
            }, 10000, WSS, transport.appData.room, transport.appData.handle);
        
            transport.on("dtlsstatechange", (dtlsState) => {
                if (dtlsState == "connected") {
                    clearTimeout(transport.appData.idletimer);  
                    WSS.send(JSON.stringify({
                        "chat": "publish",
                        "type": "connected",
                        "room": transport.appData.room,
                        "handle": transport.appData.handle
                    }));
                }
            });      
            Transports.get(Received.room).set(Received.handle, {
                "transport": transport
            });

Media server does not need to monitor for anything more than connected on the transports to ensure no issues, if the timeout goes off we throw an error to have user decide what to do from there (or code automatic handling).

With state reaching connected we can tell the user they’re connected and start monitoring the transport for disconnected/connected states on the client side. Something like this.

           Transport.on("connectionstatechange", async (connectionState) => {
                    switch (connectionState) {
                        case "connected":
                            // Break that delayed mechanism we're connected again woo!
                            break;
                        case "disconnected":
                           // Create a delayed mechanism to repeatedly send Ice if no connection was made
                            break;
                    }
                });

When we reach disconnected and there wasn’t a request to close transport (publish/subscribe). We can schedule to send ice out momentarily if self-recovery does not work. I generally repeat request every 10 seconds and when we hit connected I stop destroy any timeouts/etc.


With that all said, you need to be careful with states. A bit confusing but imagine you listened for disconnected at media server level on the transport. What if user left room, or closed broadcast, what if they didn’t do any of those two things and are just lagging really hard. In these cases chat server would have gotten prompt of an action or the user is lagging they’ll be with us in a moment unless they get kicked out the chat server then completely destroy their session.

Thanks that makes sense.

Initital connection problem is not an issue for me, I am actually trying to handle if user is connected and in-between disconnected, may be his network goes off.

I will use this to access underlying peer connection and when connectionstate is failed then I will close transport all together and make a new one when user is back online to keep the things simple on server side. Hopefully transport > handler > _pc is available under all cases.

What do you think of this?

I told you what I thought, this applies to any state you’re monitoring if it’s watched for at the wrong end of the service reliability is tarnished.

With that all said, you need to be careful with states. A bit confusing but imagine you listened for disconnected at media server level on the transport. What if user left room, or closed broadcast, what if they didn’t do any of those two things and are just lagging really hard. In these cases chat server would have gotten prompt of an action or the user is lagging they’ll be with us in a moment unless they get kicked out the chat server then completely destroy their session.

I covered all use cases, take the sauce and run.

Ok, thanks for your time

Mediasoup’s device handlers listen on ‘iceconnectionstatechange’ event of the RTCPeerConnection, and pass it to the transport (via an internal event) and the transport changes its state accordingly, unless it is closed. By the way, the state is ‘failed’. Are you sure you didn’t misspell it in the code?

Yes I am sure that state is not failed while is should. Event it is not displayed in console logs, if I enable debugging, it only showing disconnected state but nothing after that but the actual RtcPeerConnection is in failed state if I dig deep to check it out.

No idea what is the root cause of issue but I was able to get correct connectionstate this way.

And in case you add your own listener for ‘iceconnectionstatechange’ event on the _pc, does it receive an event with the ‘failed’ state?

Yes it correctly receives it

Then add a listener for the ‘@connectionstatechange’ internal event on handler, just for a check. Does it receive the ‘failed’ state?

Can’t listen to that event.

handler is an instance of EventEmitter from the events module, similar to a built-it one of Node.js. The method for adding listeners are ‘on’ or its alias ‘addListener’ (not addEventListener as in the EventTarget)

I checked with ‘on’, ‘addListener’, it doesn’t work but surely work with handle._pc, so I am using that one for my usecase

This is how I did it using transport > handler > _pc:

Also read out the detailed comment mentioned here: