Feedback on Multi-region architecture to reduce jitter

jbaudanza · January 14, 2022, 9:27pm

I’d like to get some feedback on an architecture I’m considering.

We host audio chat rooms, similar to Clubhouse. Each room is hosted inside of a single mediasoup worker, either on Digital Ocean or Vultr. It’s a very simple architecture.

One common complaint from users is hearing a “robot voice” from producers. Our users are spread out globally, so sometimes there is a high amount of jitter, which I think is the cause of packet loss and the “robot voice” sound.

To combat this, I’ve placed mediasoup servers in multiple geographic regions where we have high concentrations of users. This becomes less effective though, when a single room has multiple users in different geographic regions.

I’m considering re-architecting our rooms to support producers in multiple mediasoup workers in different regions. Each producer can choose the which server to publish, based on ping time or some geocoding algorithm.

PROS:
This should minimize overall latency in a room, and hopefully also jitter.

CONS:
This is a much more complex architecture. Consumers would need to establish multiple receive transports, depending on where the producers in the room are located. It would also require a substantial amount of refactoring of our signaling system to support this.

Has anyone else tried a multi-region architecture for a single room? Is this a reasonable approach to minimizing latency/jitter? Is there another approach to combatting jitter that I’m missing?

Any feedback welcome! Thanks!

nazar-pc · January 15, 2022, 5:19am

This is a reasonable architecture when users are far away from each other. But it is also more complex. There are opportunities to significantly improve latency between servers though, which is not really possible with users whose networks you don’t control.

I have built such architecture in the past and it worked reasonably well.

jbaudanza · January 15, 2022, 5:12pm

Thanks nazar! This is helpful.

Do you mean you have built multi-region architectures? Or low-latency server-to-server architectures?

nazar-pc · January 15, 2022, 5:18pm

Just rooms that span multiple regions depending on where users are located. Didn’t get to optimizing for latency between servers a lot, but something like https://subspace.com/ on the backend might be helpful there.

BronzedBroth · January 15, 2022, 10:39pm

We host audio chat rooms, similar to Clubhouse. Each room is hosted inside of a single mediasoup worker, either on Digital Ocean or Vultr. It’s a very simple architecture.

Having many rooms share a single worker is fine, but there is a limit to resources you can use till another worker is needed and scaling is involved.

One common complaint from users is hearing a “robot voice” from producers. Our users are spread out globally, so sometimes there is a high amount of jitter, which I think is the cause of packet loss and the “robot voice” sound.

With that said I think your robotic voice sound is CPU hitting 100% for a short period of time. I’d confirm this however!

In my tests I ran a core at 50% load (blank-app), and then started loading up another app (media-soup) instance. Fair enough at 50% on mediasoup (100% CPU) usage the robotic voice occurs; I don’t really consider your hosts network to be the issue at all!

I think latency is important but let’s be real, I can ping any server in this world at 20-250ms; this is not even a second of delay, I don’t think it’s necessarily needed. You’d help laggy producers, but what about consumers, you can’t possibly re-route and waste all that resource just for their connection.

I run servers all over the world and I’d say it being randomly handed out which server you connect to is AOK. Just my two cents.

jbaudanza · January 15, 2022, 10:48pm

That’s interesting. I hadn’t considered CPU usage, as our CPU rarely goes over 20%. But it’s possible there are momentary spikes that don’t show up in the stats.

It’s not necessarily latency, but jitter. If the latency unexpectedly spikes from 50ms to 150ms, the jitter buffer can’t keep up and the user hears a “robot voice”

BronzedBroth · January 15, 2022, 10:52pm

If the jitter is experienced by all users at the same time, it’d indicate server related issue. However a single instance of this error is not really indicating of much but potential network quality. Bitrate could be set to high for audio and users just can’t keep up.

if it’s pointing back to servers though check for packet loss on the network and check processes for usage. You may be running a snapd or some bogus background process that kills the CPU for a few minutes.

jbaudanza · January 15, 2022, 11:34pm

Thanks Cosmosis. You’ve given me some more to think about.

Before I do this re-architecture, I plan to do a some A/B testing to determine if users with nearby mediasoup instances have less end-to-end packetloss.

I suppose pegged CPUs are just another contributor of latency / jitter. So if the CPU is the culprit, then my A/B tests should show a negligible difference between regions.

BronzedBroth · January 16, 2022, 1:51am

You can’t always worry about an end-point client. Their network could be really bad and unsolvable by you.

With threading you can find two apps sharing a core and I think that’s the big hitter, we don’t tell a core to free itself for us, so we still sharing it. So with that said maybe treat all servers as just an accessible point and determine if CPU is an issue AT ALL. If not consider your network quality run checks for packet loss and report back to log or something.

snnz · January 16, 2022, 8:58am

How will it help in case of poor connection between a consumer and a regional server?

jbaudanza · January 16, 2022, 5:11pm

It won’t.

My goal is to mimic the network path of a P2P connection as much as possible by placing the server close to one of the endpoints. Since there are many consumers and one producer, I’ll choose the producer.

But if the underlaying network conditions aren’t good between the two endpoints, I don’t think there’s anything I can do.

alexciarlillo · January 18, 2022, 5:02am

How are you playing the audio on the client side? If you are passing it through WebAudio, depending on the details of your implementation, this could be another cause.

jbaudanza · January 18, 2022, 5:11am

I’m using react-native-webrtc, with just the defaults for producing and consuming. No WebAudio available, since this isn’t the browser.

I’m going to do some experiments with some bare metal servers this week. Since I’m using DigitalOcean / Vultr, it’s possible I’m competing with some noisy neighbors.

Topic		Replies	Views
Professional Help - Distributed Architecture / Horizontal Scaling Job Opportunities 💰	0	272	March 16, 2023
Feasability Question: 16-member multiplatform group video conference web app mediasoup libraries	4	342	June 16, 2020
Question - mirotalk integration 🚀 Integration	11	934	December 17, 2021
Video lags but CPU usage low mediasoup libraries	6	1389	April 6, 2020
Distributed Scalable Mediasoup Consultant Needed Job Opportunities 💰	0	181	January 8, 2024

Feedback on Multi-region architecture to reduce jitter

Related topics