Spanning "rooms" across routers (many-to-many scaling)

This is kind of a general question to see if this is a feasible or reasonable idea to achieve 2 goals: pure horizontal scaling and near-zero-downtime deployments.

We are basically considering spanning rooms across routers and also across server instances. The goal here is for the client connection to not care what server it is connected to, and allow us to hook up all the correct producers and consumers for “room participants” at the application layer by piping between routers as necessary. A router could even be handling connections for various rooms - all that matters is that the client can establish the correct consumers. My concern here is that although this makes client connections more seamless, we will actually increase bandwidth load across all participating servers as we will need to create many more pipe transports across all participating routers.

The other thing this would help us achieve is that as we deploy new code versions to servers, we could trickle users off of old servers and onto new ones, without ever having to completely take down a “room”. The individual users would still experience some brief downtime as they establish producers to the new router on the new server instance, but not to the same degree as would be seen when migrating an entire router to a new server instance all at once (which would require all users to renegotiate at the same time).

Again, I am not asking how to implement this as it is mostly up to application logic and signaling between servers to setup the appropriate pipes. It is more a question of is this even reasonable, and would such an architecture truly help to achieve our goals? I am hesitant to move forward with this as it is a huge increase in complexity over the standard “router == room” model for possibly no true reward.

I don’t have a strong opinion on this. However I consider super complex what you mean to just allow clients connect to whichever router. I’d just make them connect to the appropriate router and delay the server redeploy until all rooms have finished (and avoid new ones when the redeploy status is enabled).

1 Like