Horizontal scaling


tl;dr: Could someone provide a template project that implements scaling across machines correctly using any signaling mechanism? Is it really just the content of router.pipeToRouter that needs to be implemented using some signaling mechanism?

I’ve already seen a few topics and related code-snippets on this subject, but I’m having a lot of trouble implementing my own version of pipeToRouter across multiple machines. At a very basic level, I’m at a point where I have a stream that is being sent from one node to another. But I’m running into a lot of trouble with implementing signaling (pause/resume/close) correctly.

One example of my confusions is this: router.pipeToRouter() does the following:

pipeProducer = await remotePipeTransport.produce({
    id: producer.id,
    kind: pipeConsumer.kind,
    rtpParameters: pipeConsumer.rtpParameters,
    paused: pipeConsumer.producerPaused,
    appData: producer.appData
pipeConsumer.observer.on('close', () => pipeProducer.close());

When I blindly translate this snippet within my signaling mechanism, I end up closing the original producer instead of the pipeProducer. This occurs because producer.id is being ‘reused’ in some sense. Obviously, the right thing to do would be to maintain state of the pipeProducer and close that instead.

Similarly, for some reason, I had to pass router.rtpCapabilities instead of rtpParameters in my call to localPipeTransport.consume() to get my stream working across nodes. Again, it’s most likely something I’m doing wrong, but I guess that’s my point…

router.pipeToRouter already does a great job of being a template to get developers to scale mediasoup across machine. But I strongly feel it would greatly benefit me, and maybe the community as a whole, to have some sort of reference implementation of scaling across machines that guarantees correctness.

I don’t understand. Original producer lives in HOST_1 while pipeProducer lives in HOST_2, so the fact that they share id should not make your app close the original one.

Neither I understand this. Are you really using a pipeTransport to communicate HOST_1 and HOST_2 or not? And of course, I assume you are using same mediaCodecs in workers in HOST_1 and HOST_2, right?

As expected, I was doing something stupid. I think I’ve fixed most of my bugs, and am able to scale across nodes without issues. Mediasoup has been delightfully simple to work with so far.

I still thing it would help the community to look at a reference mediasoup implementation that scales across nodes with proper initialization and cleanup, and be able to translate it to their own architecture. I’m happy to try and put my code up for this purpose once I clean it up and confirm that it works at a reasonable scale.

Are there any tools or scripts you are aware of and would recommend for testing mediasoup scaling?


You could add a github gist with the basic concepts and usage. Would love to take a look and comment on it.

There’s a chrome functionality that allows to mock videos and create fake users, the problem is that it must be hosted in kubernetes or something similar and in different geographies to simulate real world conditions. Can share the gist of it if you interested.
That I’m aware there’s no ready to use tool to do stress/load test.

1 Like

And if you do it will reference it in the website.

I would love to have a look at the potential solution and give it a try.

Hi, gurupras. Were you ever able to put together an example for your solution? I’d love some advice.

I was working on a gist a while back, but I gave up mid way :frowning:. I’ll get back to finishing it at some point.
https://gist.github.com/gurupras/c9ac6609f4c22a515b3aadea8d65bad3 If anyone’s interested.

1 Like

Thank you so much for sharing that, gurupras!

Be keen to see any additions to this too @gurupras! I’m doing (struggling) with some very similar issues right now and want to see if I can get Mediasoup to 3k consumers at some point in the near future!

For load testing, we’ve been using KITE WebRTC framework, https://github.com/webrtc/KITE.
It allows you to write the test in Javascript and has been alright so far at connecting to an instance and running the test.

Sometime in the next week or so I want to try and load a Selenium Grid up on AWS Fargate, I will hopefully be able to connect to this with KITE to run larger-scale load tests. If I get anywhere with this I’ll report back here but tag me if you’ve got any questions about the above!