How to manage rooms, workers etc when using multiple server/node clustering?

I am using node clusters to divide loads on all cores of cpus, doing the same for socket io using io redis adapter to distribute sockets between cores.

Socket will be connected to any of the cores. I am having issue here which is user’s socket gets connected to core 1, while room is on core 2 and workers, producers etc all references are on core 2 as well, so socket can’t access all that stuff of the room he want to join in or to perform any other action because they are on separate process.

I can use redis pub sub for inter process communication. Anything else can be done to tackle this?

What can be the best possible solution to tackle this?

You’re on the right track,

We must communicate and alert the different processes if there are changes, so it’s best to design your app to be more stateless.

Redis is a perfect start however you may want a custom solution to improve scaling.

There’s no best possible solution as app requirements vary, you could be perfect with Redis.:slight_smile:

Thanks, my rooms, media server were tightly coupled with socket io previously, now what I have done is I have removed this socket io coupling from rooms, workers etc. Previously socket io plus my media server were part of the same node server, and were running in non cluster mode so they were in one core. Which at some point I had to change and the time has come :stuck_out_tongue:

This is what I have done now:
Socket io, media server both are still part of the same node server but I have enabled node clustering so it is using all server cores and the media server code is running on one of the core (for access related reasons). User connects on socket on one of the node process running on one of the core and then I communicate to my media server part via redis to perform the action. This is all good.

This is what I will do in couple of days:
I will separate signaling code and media server code in separate server and they will communicate via redis all along, so there will be 1-2 or may be more signaling servers but there can be n media servers so this should be the best way, I will be working on it.

Is this all ok, what do you say?

The reason for this is having on one core is that I will have access to all the workers, transports, producers, consumers objects as they will be in one core. But if I distribute my rooms in separate cores I will not have direct access to the other core’s rooms as it is separate process I will have to again use redis etc for IPC. But I do believe that they should be on separate process or servers for scalability.

What motivated me to keep them on one core for the time being is that actual things are happening in workers and they are already separate process so my node instance which manages rooms is doing nothing special except calling methods on workers etc so it shouldn’t consume much cpu? Is this assumption right?

What do you say? How do you tackle this?

So your n media servers are running separately each on it’s own server, and the signaling server is running on separate server and then there is another broker server or you call the signaling server the broker server?

When you say broker server knows the state of mediaservers in it’s holding? What do you mean by hloding? Aren’t the media server running separately on their own machines? And when you say it knows the state, do you mean that the broker server can access the workers, transports etc of other media servers or you mean that this server knows to which media server to communicate and then it communicates to that media server and ask to perform some action, get response and send it back to the user?

There’s a lot to take in here, I’ll try to rip through it effectively.

Making much of your process stateless is the way to go however we want to utilize our servers to the fullest or close enough and with that a factor, it can be tricky. A pooling mechanism is the best we have for cost effectiveness and maintaining a user base of a few thousand users no problem.

How you design is up to you, we have dedicated scenarios where a single room has their own servers or users share what’s available within the pool.

My media-servers connect to a broker server via WebSocket. This mechanism can allow thousands of cores no problem.


You need to hop outside the single worker (core) to be able to provide bigger rooms. Adjust your code for sure

All media servers are independent to their processing abilities, so they are just fed commands and relay. I run many signal servers to perform this task, routing may take time but once routed it’s super-fast. :slight_smile:

Thanks for the information. I am confused on broker server concept, why do we need broker server? Our signaling server can talk to the media servers directly and can keep track of the media servers?

Within single media server how do you manage the stuff? Lets says there are 4 cores in server so you will start node without clustering meaning it will be running on 1 core while worker will be on their own process?

You sort of answered your own question there. Having a server route/control many servers is key/ideal; just do such optimally.

Okay so you would fork each process and in theory the process could take 25% of overall process (or a single core). If you exceeding CPU capability you need to account for such and write your code differently or dynamically adjust weight/handling.

I would run 4 processes and dynamically adjust weight of producing users. If producers becomes heavy I move them but idea is I pre-calculated core ability and allot a slot count/etc.

Think of it this way, a single core server could handle 30-60 broadcasts going out at least once if it were piped to another server.

If a broadcast became huge as in it were to be consumed 1,000 times, we may need to see this broadcasting server handle 1-5 people versus 30+ users to allow piping and expansion.

So the thing is there will be signaling servers, broker server, media servers. Users will connect to signaling server. And then signaling server will communicate to broker server and broker server will pass it to appropriate media server and then media server will send response to broker server and then it will send back to signaling server and then user will be notified back right?

When you say fork process which process you are talking about? mediasoup workers? Actually this is my main confusion what I have in mind is that I will start node server and that node server will start n mediasoup workers. Node server will keep all the object references of rooms worker, transports.

The broker is a signaling server, all chat servers and media servers would connect to it and routing is done through broker. The idea is a broker server will know all servers and their state.

If you exceed broker ability, we have a world server (single core) that routes us to a broker for our means. It’s all signaling but we build servers to make this process more efficient.

Clear now, and users will connect to this same broker server from apps?

The HTTPS server will always ask/tell broker server if a request to join/etc is made. Broker is responsible for keep track of state of a specific zone.

So to sum it up the broker is knowing of everything. The broker can handle x many users and I run more brokers to cover so many users.

Thanks one last confusion about broker server is that I am assuming that this is the socket server to which user from lets say web will connect.

Give me a moment, the broker server is not socket server, chat server is socket server and users will connect to it lets say via web and then this chat server will talk to broker server. I think this is right?

Broker can be a web-socket server, I’d probably prefer this as if a media server disconnected I can quickly under that protocol destroy and reset users.

Consider the broker server to be the all seeing eye, if anything happens broker shall know so an e ducated decision can be made. So say services get hype, broker can adjust.

This may not be a valid term"broker" but it’s a term I’ll work with.

Ok and what is the chat server doing in all of this?

The chat servers handle the messaging and user-state, to ensure scalability is a thing users are routed per room within a max of 4 chat-servers, these chat servers get a chat-broker that routes the messages like pub/sub but more controlled.

The broker is split up to handle different things, so ex. Login, Chat, Media, etc etc.I have a world broker to sort my brokers across grid.

Idea is, I have one server keeping eyes on other servers and I scale this up. I let single cores control other units that have more cores and when that single core is met, determine how to stretch the process. With brokers you can literally learn to create or destroy them.

With this discussion we can refer to a broker as a handling server (it signals still but handles the I/O).