CWLB, a full-fledged load balancer for Mediasoup

Announcing CWLB, a full fledged load balancer for mediasoup is now available for commercial use.
It currently supports:

  • Horizontal scaling (multiple rooms in multiple servers)

  • Vertical scaling(one room in multiple servers)

  • JIT(Just In Time) media server creation only when needed and shutting down of media servers when idle for XX minutes

  • Currently supports AWS for auto scaling with planned support for digitalocean and azure in the future releases

It currently doesn’t support media server cascading. This feature will be available in the next stable release.

CWLB has been developed with an intense effort of 15 months. You can visit this link ( to know more about it. If any body has any commercial interest in such a solution, feel free to drop me an email at

If anybody has any questions related to auto scaling of media servers(or even recording servers!) feel free to ask. I’ll try to answer them to best of my capabilities.

Please don’t ask to share the code as it has been developed by a company with commercial interests with a considerable investment.

I hope I am not violating the forum guidelines with my post.

1 Like

Is it paid?

Yes, it is paid as it is currently available as a commercial solution.

Can you guys share a bit more info on how it would work on current codebases. Is it something like a library that we can add to our current solutions? does it need to be hosted in AWS?

Would be great to have more details on how this solution works so we can “sell” it to the companies we work for.


Yes it currently is needed to be hosted on AWS as it only supports AWS APIs for autoscaling as of now. Once hosted, It will work not only as a load balancer but also as an integrated signalling server which can auto scale mediasoup instances based on room creation requests. A client side JavaScript API will be provided which you can use to make the room join and room leave calls. It pretty much now works like a video API provider like with a couple of minor differences.

The differences are that we accept deep customization requests which they don’t accept.
We are open to provide the whole set up on client’s AWS infra as a managed service which they don’t provide (if I am not wrong!)

We haven’t yet explored the option to provide it as a server side library as it needs a tight coupling with signalling server. But we think it is possible to provide it as a nodejs module in the future in case of a need.

1 Like

Do you have an diagram of the architecture?

How does the signalling server scale or does it?

How are you determining a broadcasts use and ensuring it’s delivered 100%? There’s situations where a room could drastically increase in size and drop off completely leaving servers with available CPU/slots all at random. How are you filling back in the holes left over from such events to effectively use the CPU and guarantee produced users can be consumed no matter what.

The load balancer works with some simple facts as mentioned below.

  1. Every room need to have a defined size, it may be 10 or 100 or 1000 but it has to be defined. This value can’t be undefined.
  2. Ever user joining a room need to send his / her streams to everybody else currently present in the room(if he/she has publishing rights) and receive streams from everybody else.
  3. The job of the load balancer is to keep track of the current number of users in the room, total number of media servers currently being used by this room in real time. When the next room join request comes, it does some math to decide number of streams needed for this user, then checks all the available media servers if this request can be accommodated. If it can be accommodated, then cater to the user from that media server else spin off a new mediasoup instance and cater to this new user. If all the users can be accommodated in one media server instance then no worries at all else pipeTransports are there to save us. As soon as one mediaserver is released from duty as the last user using it has left, the load balancer will wait for a x amount of time (in anticipation of future requests to save on the server the start time) before shutting it down. Keep in mind that all the mediaservers catering to one room share a single virtual state therefore acting as one large virtual instance of the combined capacity.

Yikes… lol

This isn’t scalable, rooms are dedicated. This not only costs more with servers but so many servers can sit unused due to fixed room sizing.

What do you mean by it is not scalable when rooms are dedicated? You may have been taking some assumptions here. It would be helpful if you explain your thoughts here.

To clarify from my sides, here media servers are not dedicated for rooms but they are shared among rooms. Here media servers are mere entities used by rooms whenever there is a need for media transportation. The rooms release the resources as soon as the need for media transportation ends so that some other rooms can reuse the same resources.

Do have some numbers about the kind of efficiency you are talking about?

Here is what we deliver with our load balancer.
A c5a.2xlarge instance in AWS caters to a need of one 50 users (approx.) video conference with all of them have their mic and camera on with 720p resolution along with up to 3 screen shares in 1080p resolution. The same instance also caters to a private live streaming of 8 presenters and 250 listen only participants with an option of bringing 2 listen only participants to stage i.e. allowing them to present to all other participants in the room for temporary period of time.

That’s 20 rooms, all at 12-24 broadcasts and many viewers. Around 500 users.

Your signal server can’t do this apparently!? :slight_smile:

Perhaps explain what you do with servers/services that require 100K → 1million viewers real-time if such were an option without HLS.

Good to know the numbers. I can only say once I understand your use case fully. By the way you also are using a room based architecture. Good to know that.

Consider the math below.

All 20 rooms have 18 mediasoup(average) producer each with 9 producing audio with opus as it’s codec at 40Kbps each and 9 producers producing video with vp8 at 400Kbps each.
400 + 40 = 440Kbps / producer X 9 users in room = 3960Kbps / room X 20 = 79200Kbps(77.34Mbps)
This should ideally be the receive in your network interfaces but it shows quite low values which is 25.21 Mb/s combining both the interfaces. This simply means your producer are producing at much lesser bit rate.
25.21 Mbps total input bitrate / 20rooms = 1.2605 Mbps each room / 9 users = 140Kbps /user/room
This means your each producer is sending 40kbps of audio and 100Kbps of video. 40Kbps of audio is very good for consumption but 100Kbps of video is not good if the consumer video size while displaying in the browser is more than 200 X 200px.

When the producers produce at such low bitrates, the cpu has to put much less effort to create consumers compared to high bitrates like 740Kbps per user.

Please correct if I am wrong with my above math.

Producers actually don’t cost anything both in terms of CPU and data transfer for a service provider like AWS as in-bound traffic is free and CPU does not have to perform any task here if there are no consumers. The producers just terminate into a sink and waste all the data packets each second until a consumer is connected and the CPU has to kick in now to do the heavy lifting of creating a copy of the producer data packets and send it to the consumers. The more the bit-rate, the more the effort for the CPU.

What I know is that mediasoup is a SFU(not MCU!) and it does not do any transcoding. So It’s network that has to be well taken care of rather than worrying about CPU usage. We have our network input / ouput intelligently optimized for various use cases so that we don’t break our customers bank with astronomical data transfer costs.

Regarding our signalling server, I only can say that it’s very capable. Become our customer to test it’s capabilities!

Regarding our take on 100K-> 1million viewers with WebRTC, Currently we don’t think it’s technical challenges that matters more than financial viability of it. The resource requirement difference between the HLS approach and WebRTC is huge. It may be a good theoretical use case but may not be a practical one!

No, we’re still producing at whatever resolution the user can push at a cap limit but know we can transcode this client side and send it through and use less bitrate in exchange for 10-30% of user CPU which is acceptable.

200KB/s is correct and 0-30KB/s+ for OPUS of coarse depending on state.

It does not affect CPU as much as you’d like to think, that’s really going to come down to active transports running with video and/or audio streams. Now in my case I know many users on mobile can’t handle 500KB/s streams or even up-to without their websocket connection being cancelled cause entire buffer has taken over their c onnection on streams. At 200KB/s and even lower we can improve this experience and not waste server CPU with simulcast/etc.

So how hard do you run a single server on average, what’s your output bandwidth and monthly costs on that, how many users per server?

@skp, you said you provide a client side JavaScript API, how does it deal with the dominant speaker producer, are you using the ActiveSpeakerObserver on the Router or it is being handled on the client side?

@mariouzae We are using the ActiveSpeakerObserver on the routers where producers are producing.

@skp it will be helpful if you can share some stats of the tests you conducted so far.

Shameless plug: my product provides vertical scalability in a transparent way with an API similar to Mediasoup one, and it’s sub-modules RemoteMediasoup and MediasoupCluster allows to control multiple remote instances of Mediasoup as if they were a single standard local one, just in 5 minutes and 20 lines of code :slight_smile: