So I was made a decent attempt for horizontal scaling implementation using pipeToRouter. I would like to share some observations and doubts.
Mediasoup server system : Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-70-generic x86_64). 48 CPU(s)
1 CPU : 1 mediasoup-worker : 1 mediasoup-router
So 48 routers in total.
I have marked first 24 as producing routers(creating only send transport) and next 24 as consuming routers(creating only receive transport) (mediasoup demo pipe branch reference here)
- When a peer joins the room, creates send transport on the router id - 0 and receive transport on the router id - 24.
- Piping producers from router id - 0 to router id - 24.
- The steps repeated until a certain consuming router reaches threshold (now threshold is a value calculated by iterating all transports created on the router and summing up the number of consumers and number of data consumers, *value 500 but is experimental based on CPU capabilities)
- When threshold is reached for a consuming router, receive transports are now created onto the next router (switching consuming router from router id - 24 to router id - 25) and producers for that peer are piped into both consuming router id - 24 (increasing value of consumers and data consumers above threshold now for this consuming router) and router id 25.
- This also requires to pipe all existing producers from existing peers into the new consuming router router id - 25, increasing value in the number of consumers and data consumers within the current producing router id : 0.
So by adopting this logic, it seems like every producer created on the producing routers have to be eventually piped to all the other consuming routers in use which will then push the limits of that router be it producing or consuming way above threshold as the number of peers increases.
I have tested this for more than 100 users with both audio and video producers. So general observation is that after a certain number of peer count range lets say 40-50, the consuming router id : 24 exceeds not only threshold but with the overhead peaks 100% CPU usage for that specific core causing problems. However, it is also necessary to pipe all producers to all consuming routers as well.
So I was wondering that is this the limit we can achieve with such an approach and then switch to scale between servers or the approach lacks some serious misunderstandings about how mediasoup piping between routers work ?
Please feel free to provide any comments, suggestions or feedback.
(to provide some numbers:
tested : 100 users, all genuine users both audio and video, with 3 simulcast layers
works fine upto 30-35 users with no issues at all)