Just wanted to share some numbers using mediasoup. We are currently working on a usecase of 1:many broadcasting, and yesterday we had a Performance Testing done. There were two producers and 1600 consumers(1 publisher : 800 subscribers). The system config we are using currently is as following,
CPU cores : 4
RAM : 30GB
OS : RHEL 7.4
We had this testing for 30 minutes and we were also capturing the server side parameters and the results were surprising.
Avg RAM utilization was around 6 GB,
Avg Cpu utilization was 44.6 %,
Processor load was less than 3% for these 30 minutes.
Yes it uses pipeToRouter(). We created 4 workers and on each worker only one router. Producers were present on router1 and from there we piped them to the other 3 routers created on other 3 workers. All consumers were able to consume the piped producers from router1. We had set the limit of each router to have only 400 consumers. Next 400 goes on next router.
Yes, audio producer was a simple one. Video producer was a simulcast producer with 3 encodings of maxbitrate(100000, 300000, 900000).
All receivers were in real network using different browsers and devices(android, macos, ios, windows). Browsers used were Safari, Chrome, Opera and Mozilla firefox. Transports were enabled with only UDP so there might be packet losses as well…
Yes, ofcourse the numbers are really good. The limit of 400 was kept after reading the cosmo software publication and also after reading the documentation present on mediasoup.org
On the site it was mentioned that for a single worker we can have anywhere between 400-600 consumers. Therefore we restricted a maximum of 400 consumers on each router. Since we only have one router per worker.
We have many performance testings lined up for this month where we are planning to have more than 5000 subscribers which means 10000 consumers. There, we will also have scaling of producers across servers using pipe transports.
Congrats on the implementation! We’re just entering the same phase with our own implementation and I had a couple of questions about the hardware you’re running? Have you gone the virtualised route, cloud hosting or local?
These really are some cool to see just before we begin tackling it ourselves!
Ok, in this case test results are affected by the video content captured and encoded by the producer side.
I’ve performed some test with a similar configuration (using pipeToRouter) but using a recorded video as source (https://cloud.quavlive.com/s/gcw4ZLrfijJQbwP). You can find the first test results here: Video conference test: 1 producer / N consumers. Next week I will add some more scenarios with N>1 producers, that is more challenging because the number of output consumers without optimizations increases as N*(N-1).
Would like to share our implementation and some questions
We are working on a solution where we have one producer (video/audio) and thousands of viewers.
We made our deployment using aws ec2 instances (m5n.8xlarge). We are using 32 workers (= # of vCPUs) and 1 router in each worker. We use pipeToRoute for all the workers (and pipeToExternalRoute which we implemented for other instances in the cluster).
In production we have several instances/machines.
We do not use simulcast or SVC, as we got very low-quality video when using simulcast (not sure why yet).
Video is maxed at ~ 100KB/s (VP8) and video is at ~4KB/s (opus).
We tested (synthetic tests…) and found that each aws instance, can handle 3000 users (3000 video consumers + 3000 audio consumer)
With 3000 users and on one ec2 instance (m5n.8xlarge), we are at:
CPU: 55%
Network Out: 315MB/s
Network In: 5.3MB/s
We do tests by opening mass number of servers on multiple aws regions with multiple instances of puppeteer on each which opens the web page with our session and save an image on a central machine - for us to review the image quality, and additional data we put on screen (bandwidth etc).
To lower the bandwidth required by the producer (standard mobile device), we set keyFrameRequestDelay to 4 seconds (didn’t implement ‘re-encoder’ yet) and we are setting ideal video in getUserMedia to 640x480
Still, we have a lot of unsolved areas and questions -
So, m5n.8xlarge has 32 logical cores with AVX-512 (https://aws.amazon.com/ec2/instance-types/m5/), and you consumed 55% of it to send out 315MB/s or around 2.5 gbit/s of traffic. That closely matches my experience: about 0.019 of AVX2 cores per our typical stream which is 1.4mbit/s.