Frequent transport ICE state and tuple changes in production

When testing on production, I’m seeing my server side video producer transport frequently change state and tuples. Listening to the transport icestatechange and iceselectedtuplechange events, a lot of the time I see behavior like this in the logs:

2020-08-18T15:06:34.371-07:00 Client f7113813-6462-422f-bf30-37f2987d02d2 producer transport ICE selected tuple changed. localIp: 0.0.0.0, localPort: 44409, remoteIp: 65.101.136.158, remotePort: 55948, protocol: tcp
2020-08-18T15:06:34.371-07:00 Client f7113813-6462-422f-bf30-37f2987d02d2 producer transport ICE state changed to connected
2020-08-18T15:06:34.420-07:00 Client f7113813-6462-422f-bf30-37f2987d02d2 producer transport ICE state changed to completed
2020-08-18T15:06:35.917-07:00 Client f7113813-6462-422f-bf30-37f2987d02d2 producer transport ICE selected tuple changed. localIp: 0.0.0.0, localPort: 49063, remoteIp: 65.101.136.158, remotePort: 64882, protocol: udp
2020-08-18T15:06:35.922-07:00 Client f7113813-6462-422f-bf30-37f2987d02d2 producer transport ICE selected tuple changed. localIp: 0.0.0.0, localPort: 44409, remoteIp: 65.101.136.158, remotePort: 55948, protocol: tcp
2020-08-18T15:06:36.009-07:00 Client f7113813-6462-422f-bf30-37f2987d02d2 producer transport ICE selected tuple changed. localIp: 0.0.0.0, localPort: 49063, remoteIp: 65.101.136.158, remotePort: 64882, protocol: udp

The ICE tuple alternates between tcp and ucp. And then most of the time the client reports permanently disconnected in the transport connectionstatechange listener.

Note that I can sporadically get it to work once in a while but the transport connection behavior is very flaky.

Is this normal behavior? And if so, when should I be calling restartIce on the transport? Every time the ICE tuple changes on the server? Or when the client transport state gets stick in disconnected for some defined timeout?

I’m testing in Chrome, and my server side transport initialization looks like:

const transport = await this.router.createWebRtcTransport({
        listenIps: [
          {
            ip: MEDIASOUP_LISTEN_IP,
            announcedIp: MEDIASOUP_ANNOUNCED_IP,
          },
        ],
        enableUdp: true,
        enableTcp: true,
        preferUdp: true,
        initialAvailableOutgoingBitrate: 1000000
      });

We’re deployed on AWS Fargate with Amazon ECS. Ports 40000 - 49999 are opened for both UDP and TCP.

Just network issues coming to my mind (given that it just happens in your deployment). Try with just UDP and just TCP and see how it behaves.

It’s not.

No, you don’t. restartIce() is just for when your client network changes (i.e. WiFi to 4G) so you need to tell the PeerConnection to restart the entire connection.

Thanks for the reply.

Ok trying just with UDP fails entirely. The transport never fully gets connected, just stays in connecting. Trying with just TCP works flawlessly. So sounds like I need to flip another switch in our server network to make UDP work. We do have UDP ports 40000 - 49999 open, but will poke around.

I’d say that your server receives UDP packets but cannot send UDP replies. Firewall stuff probably.

This was ultimately determined to be caused by the same issue here: Using Chromium in a Docker container to connect to mediasoup in another container fails (with mediasoup >=3.5.8).

I was having the same problem as ghempton and whather. ICE selected tuples would oscillate back and forth between udp and tcp. I’m also running on ECS/Fargate/Alphine-Linux.

Switching from node:14-alpine to just node:14 seemed to resolve the issue for me.