Feedback on Multi-region architecture to reduce jitter

I’d like to get some feedback on an architecture I’m considering.

We host audio chat rooms, similar to Clubhouse. Each room is hosted inside of a single mediasoup worker, either on Digital Ocean or Vultr. It’s a very simple architecture.

One common complaint from users is hearing a “robot voice” from producers. Our users are spread out globally, so sometimes there is a high amount of jitter, which I think is the cause of packet loss and the “robot voice” sound.

To combat this, I’ve placed mediasoup servers in multiple geographic regions where we have high concentrations of users. This becomes less effective though, when a single room has multiple users in different geographic regions.

I’m considering re-architecting our rooms to support producers in multiple mediasoup workers in different regions. Each producer can choose the which server to publish, based on ping time or some geocoding algorithm.

PROS:
This should minimize overall latency in a room, and hopefully also jitter.

CONS:
This is a much more complex architecture. Consumers would need to establish multiple receive transports, depending on where the producers in the room are located. It would also require a substantial amount of refactoring of our signaling system to support this.

Has anyone else tried a multi-region architecture for a single room? Is this a reasonable approach to minimizing latency/jitter? Is there another approach to combatting jitter that I’m missing?

Any feedback welcome! Thanks!

This is a reasonable architecture when users are far away from each other. But it is also more complex. There are opportunities to significantly improve latency between servers though, which is not really possible with users whose networks you don’t control.

I have built such architecture in the past and it worked reasonably well.

1 Like

Thanks nazar! This is helpful.

Do you mean you have built multi-region architectures? Or low-latency server-to-server architectures?

Just rooms that span multiple regions depending on where users are located. Didn’t get to optimizing for latency between servers a lot, but something like https://subspace.com/ on the backend might be helpful there.

1 Like

We host audio chat rooms, similar to Clubhouse. Each room is hosted inside of a single mediasoup worker, either on Digital Ocean or Vultr. It’s a very simple architecture.

Having many rooms share a single worker is fine, but there is a limit to resources you can use till another worker is needed and scaling is involved.

One common complaint from users is hearing a “robot voice” from producers. Our users are spread out globally, so sometimes there is a high amount of jitter, which I think is the cause of packet loss and the “robot voice” sound.

With that said I think your robotic voice sound is CPU hitting 100% for a short period of time. I’d confirm this however!

In my tests I ran a core at 50% load (blank-app), and then started loading up another app (media-soup) instance. Fair enough at 50% on mediasoup (100% CPU) usage the robotic voice occurs; I don’t really consider your hosts network to be the issue at all!


I think latency is important but let’s be real, I can ping any server in this world at 20-250ms; this is not even a second of delay, I don’t think it’s necessarily needed. You’d help laggy producers, but what about consumers, you can’t possibly re-route and waste all that resource just for their connection.

I run servers all over the world and I’d say it being randomly handed out which server you connect to is AOK. Just my two cents.

That’s interesting. I hadn’t considered CPU usage, as our CPU rarely goes over 20%. But it’s possible there are momentary spikes that don’t show up in the stats.

It’s not necessarily latency, but jitter. If the latency unexpectedly spikes from 50ms to 150ms, the jitter buffer can’t keep up and the user hears a “robot voice”

If the jitter is experienced by all users at the same time, it’d indicate server related issue. However a single instance of this error is not really indicating of much but potential network quality. Bitrate could be set to high for audio and users just can’t keep up.

if it’s pointing back to servers though check for packet loss on the network and check processes for usage. You may be running a snapd or some bogus background process that kills the CPU for a few minutes.

Thanks Cosmosis. You’ve given me some more to think about.

Before I do this re-architecture, I plan to do a some A/B testing to determine if users with nearby mediasoup instances have less end-to-end packetloss.

I suppose pegged CPUs are just another contributor of latency / jitter. So if the CPU is the culprit, then my A/B tests should show a negligible difference between regions.

You can’t always worry about an end-point client. Their network could be really bad and unsolvable by you.

With threading you can find two apps sharing a core and I think that’s the big hitter, we don’t tell a core to free itself for us, so we still sharing it. So with that said maybe treat all servers as just an accessible point and determine if CPU is an issue AT ALL. If not consider your network quality run checks for packet loss and report back to log or something.

How will it help in case of poor connection between a consumer and a regional server?

It won’t.

My goal is to mimic the network path of a P2P connection as much as possible by placing the server close to one of the endpoints. Since there are many consumers and one producer, I’ll choose the producer.

But if the underlaying network conditions aren’t good between the two endpoints, I don’t think there’s anything I can do.