I’d like to get some feedback on an architecture I’m considering.
We host audio chat rooms, similar to Clubhouse. Each room is hosted inside of a single mediasoup worker, either on Digital Ocean or Vultr. It’s a very simple architecture.
One common complaint from users is hearing a “robot voice” from producers. Our users are spread out globally, so sometimes there is a high amount of jitter, which I think is the cause of packet loss and the “robot voice” sound.
To combat this, I’ve placed mediasoup servers in multiple geographic regions where we have high concentrations of users. This becomes less effective though, when a single room has multiple users in different geographic regions.
I’m considering re-architecting our rooms to support producers in multiple mediasoup workers in different regions. Each producer can choose the which server to publish, based on ping time or some geocoding algorithm.
This should minimize overall latency in a room, and hopefully also jitter.
This is a much more complex architecture. Consumers would need to establish multiple receive transports, depending on where the producers in the room are located. It would also require a substantial amount of refactoring of our signaling system to support this.
Has anyone else tried a multi-region architecture for a single room? Is this a reasonable approach to minimizing latency/jitter? Is there another approach to combatting jitter that I’m missing?
Any feedback welcome! Thanks!