Weird frontend(?) bug: freezing videos in dockerized browser

Long story short

We open mediasoup-based webapp (either our own or mediasoup demo) in a browser running inside Docker container, and after some time (20-30 minutes) videos starts to freeze one by one in it. No errors in browser console, nothing suspicious in Firefox’s about:webrtc.

Long story long

The challenge

We need to transmit MediaSoup-based multi-user webinars to large audiences (thousands of people). At the same time there are only a few active users (speakers), hence to save on traffic and server CPU and also support webinar recording we’re using HLS as the main delivery mechanism for passive participants.

The solution

We came up with the following idea: instead of dealing with streams and muxing them manually in some separate server, we launch a real browser instance, capture its screen and audio, and transmit to existing mediaserver for delivery. All this is being done inside Docker container (I’ve published our implementation for this at my GitHub: dockerized-browser-streamer), in production we launch it as AWS ECS task. While it consumes a lot of resources, it allows us to easily experiment with layouts, etc. We call this component The Streamer.

The problem

Although everything is working perfectly on real user browsers, Streamer has annoying issue: after 20-30 minutes it starts to lose some participant’s media streams (usually camera or screenshare, sometimes audio) which in UI looks like frozen picture, and doesn’t reconnect (e.g. if participant will switch camera off and on, or re-share its screen or other window, streamer will display same old frozen picture again). It usually happens after people starts or stops screenshare or even switch their mic on or off (so if frozen participant unmutes themselves, their video may unfreeze and continue to play probably because it adds audio consumer to the same transport as video).

We tried both the latest Firefox and Chrome, tried TCP-only ICE candidates, but with no luck yet.

There is nothing in the browser’s devtools console and nothing suspicious we can see in FF’s about:webrtc page (however, we don’t understand much there).

These bugs also reproduces on Mediasoup Demo application (also after being used for 20-30 minutes by several people).

I believe that this kind of weird frontend bug or some common logical or concurrency error in client application (so it exists in both our app and mediasoup demo). Or some weird network things (but why after 20 minutes?).

We got stuck and looking for help (including paid, see this message in Job Opportunities) or any advice where to start digging.

The question

The main question is How to detect the source of the problem? Browser console logs don’t show anything.

Maybe, someone could suggest other ways of localizing the problem? Or maybe anyone have seen something like this before?

So if I understand correctly, the problem is that the dockerized Chrome stops receiving RTP eventually, right?

Note that, in case this is due to some network error (ports are closed for whatever reason) you won’t see anything wrong at about:webrtc since there is no real issue in non receiving RTP, it’s perfectly valid.

So I’d go to server side:

If nothing is wrong here and ICE and DTLS remains connected (no disconnections) then let’s go to the Consumer in server side and check:


Are you using simulcast for video?

If so, can you test without it just to discard a possible relation with simulcast?

1 Like

Yes, we’re using simulcast with following options:

  { scaleResolutionDownBy: 4, maxBitrate: 500000 },
  { scaleResolutionDownBy: 2, maxBitrate: 1000000 },
  { scaleResolutionDownBy: 1, maxBitrate: 5000000 },
  { dtx: true, maxBitrate: 1500000 },
  { dtx: true, maxBitrate: 6000000 },

We are going to also try to decrease bitrates significantly and also remove dtx option as there was an issue with dtx (however, in our version of mediasoup it was already fixed).

Ok, we’ve fixed some freezing video errors just by adding one single await we’ve forgot initially and that wasn;t catched on code review.

The bug was that:

  1. When we need to resume some consumer on client we’re asking server for this via signaling and waiting for confirmation
  2. Server resumes consumer and sends confirmation without waiting for its completion
  3. Client gets confirmation and resumes local consumer too early and never get stream as a result (without errors in browser)

It is working on our machines because application is deployed on the client’s region far-far away (2/3rd of Earth due to network topology), so usually server has started its consumer before network packet has arrived to developer machine. But as Docker container with the streamer launches very close to the server (most probably in the same datacenter), video freezes in it much more frequently.

The fix:

     this.#socket.on(events.resumeConsumer, async (consumerId, callback) => {
-      this.#sfu.resumeConsumer(consumerId);
+      await this.#sfu.resumeConsumer(consumerId);

There is reference for this in documentation here. but we just missed the fact that this await is missing.

However, we still not sure that we have fixed that exact bug from initial message, but it is hard to reproduce reliably, so we’re still testing.

1 Like