Mediasoup horizontal scaling using pipeToRouter

Hello all,
So I was made a decent attempt for horizontal scaling implementation using pipeToRouter. I would like to share some observations and doubts.

Mediasoup server system : Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-70-generic x86_64). 48 CPU(s)
1 CPU : 1 mediasoup-worker : 1 mediasoup-router
So 48 routers in total.

I have marked first 24 as producing routers(creating only send transport) and next 24 as consuming routers(creating only receive transport) (mediasoup demo pipe branch reference here)

  1. When a peer joins the room, creates send transport on the router id - 0 and receive transport on the router id - 24.
  2. Piping producers from router id - 0 to router id - 24.
  3. The steps repeated until a certain consuming router reaches threshold (now threshold is a value calculated by iterating all transports created on the router and summing up the number of consumers and number of data consumers, *value 500 but is experimental based on CPU capabilities)
  4. When threshold is reached for a consuming router, receive transports are now created onto the next router (switching consuming router from router id - 24 to router id - 25) and producers for that peer are piped into both consuming router id - 24 (increasing value of consumers and data consumers above threshold now for this consuming router) and router id 25.
  5. This also requires to pipe all existing producers from existing peers into the new consuming router router id - 25, increasing value in the number of consumers and data consumers within the current producing router id : 0.

So by adopting this logic, it seems like every producer created on the producing routers have to be eventually piped to all the other consuming routers in use which will then push the limits of that router be it producing or consuming way above threshold as the number of peers increases.

I have tested this for more than 100 users with both audio and video producers. So general observation is that after a certain number of peer count range lets say 40-50, the consuming router id : 24 exceeds not only threshold but with the overhead peaks 100% CPU usage for that specific core causing problems. However, it is also necessary to pipe all producers to all consuming routers as well.

So I was wondering that is this the limit we can achieve with such an approach and then switch to scale between servers or the approach lacks some serious misunderstandings about how mediasoup piping between routers work ?
Please feel free to provide any comments, suggestions or feedback.

(to provide some numbers:
tested : 100 users, all genuine users both audio and video, with 3 simulcast layers
works fine upto 30-35 users with no issues at all)

When my producer servers go online my signalling server tells the consumer servers if any are online to createPipeTransports and connect them.

In my case I just want the producers connecting to consumers and not consumers connecting to consumers but to make a single request per server, so consumer won’t connect to a producer more than once.

In this state they’re just chilling,

Now here’s the catch and you need to really watch this haha

The producer server will be hit little, your biggest hit will be the exponential increase on consumers.

12 video broadcasts consumed by 24 viewers is 12*24-24=264 consumers and if there’s audio way more.

If you plan to many to many, and properly I’d maybe devote more consumers to the party and leave some additional space on your cores so the pipetransports can re-consume/re-produce. You can safely use a single re-consume/re-produce on a consumer server multiple times so some gain there just close it when last viewer is done.

If you execute this right you can go insane with that many cores.

1 Like

Okay. Thanks a lot for the suggestion and feedback.
I will give this approach a shot.
I have all the testing architecture ready so will provide details and numbers for the total number of peers that can be supported and also the overall stats for the mediasoup server. :grinning:

If you really want to aim for top-tier efficiency in my opinion, you could detect when or if a producer cannot re-produce/re-consume again to have the consumer server with space to send it to another producer to expand the amount of consumers that can get the source.

If this makes sense, good luck it drove me mad for a bit. haha

Do post results if you wish but I don’t think you can go wrong this approach and if done remotely you just exponentially increase your power, idea of turning single cores added up to thousands of cores working real hard. :smiley:

If you do this at first I would suggest running all the servers under a private lan to pipe to save on outbound (keep them in same server region).

Thanks for the suggestions again.
But the point is I still don’t understand how the approach is going to solve the overhead on the certain producing and consuming routers as the number of producers increases.
Surely with new consuming routers being assigned producers need to be piped to from all existing producing routers to all in use consuming routers.
So in this process the very first producing or consuming routers will quickly pass the threshold value to hold max consumers and also results to 100% peak in CPU causing problems.

If you plan to many to many, and properly I’d maybe devote more consumers to the party and leave some additional space on your cores so the pipetransports can re-consume/re-produce. You can safely use a single re-consume/re-produce on a consumer server multiple times so some gain there just close it when last viewer is done.

The point here is that even the additional space on my cores start filling up due to the requirement to pipe producers to all consuming routers for all consumers in turn straining the producing routers as well

I am not sure I follow what you mean by producer servers/ consumers servers (are you talking about server to server scaling ?)
Right now I am focused on achieving max capabilities on a single server unit.

But the point is I still don’t understand how the approach is going to solve the overhead on the certain producing and consuming routers as the number of producers increases.
That part you’d build your own little load-balancer, so keep track of who enters/leaves the server and possibly how many pipes you have online.

Surely with new consuming routers being assigned producers need to be piped to from all existing producing routers to all in use consuming routers.
Yes or however you intend to make these servers aware of eachother (but generally consumer connects once to each producer or worker to worker but try drawing it out your plans or both ways)

So in this process the very first producing or consuming routers will quickly pass the threshold value to hold max consumers and also results to 100% peak in CPU causing problems.
Not at all the case, if you keep track of your limits each server/core(or worker) you can tell the signaller at anytime this server isn’t an option to use the next one available.

The trick here is a broadcast(produced item) will be piped and consumed on a number of consumer servers and so if at all the broadcast can’t be piped further cause the producer server is maxed out, just know the consumer server has it and might be able to send it further. This is advanced and would likely require piping consumer to consumer and keeping track of these additional states of sharing.

I am not sure I follow what you mean by producer servers/ consumers servers (are you talking about server to server scaling ?)
Sorry my lingo is still new here, my servers (workers) are programmed to handle strictly producers(broadcasts) on the producer server and the viewers(consumers) on the consumer server.

I am talking server-to-server scaling but this can very-much so be applied to local level. Also this is just information to maybe help you’re welcome to build however you want this is what I’ve gotten to test and it’ll allow me the scalability I need for thousands. And if truly worried about over-load I can run producer servers at 20-30 members and let it go hard!

1 Like

Thanks @CosmosisT for the detailed explanation
Will pick up on this and revert back on the topic with all the findings I can get.

1 Like

Awesome, I just found mostly two facts:
Producers don’t kill resources but you will need to send them out to many consumers,
Consumers will kill your resources so you will want to handle this so if you get a wicked idea going that clearly destroys this hell yeah! :smiley: