Perfect Scaling, and improving keep it coming.

I have been at this for months, trialing different configurations and it wasn’t easy but I think I can explain this for you guys nicely. Here’s my numbers though first and just scale them up!

Bitrate: 10000000 (I/O)
Cores: 2 (1 for producer 1 for consumer)

1 core = 12 viewers (6 broadcasts)
4 core = 24 viewers (12 broadcasts)

User Weight for Viewers (Bitrate had affected these results more than CPU could):
12 viewers = 1 weight
24 viewers = 2 weight
36 viewers = 3 weight
48 viewers = 4 weight
60 viewers = 5 weight
72 viewers = 6 weight

So how this weight system works is, your consumer server is rated 6/6 total space per core, if producer needs to be sent out to 24 people you’ll cost the consumer server 2 slots, if 12 then 1 slot.

So a single consumer core here will allow 12 viewers to view 6 cameras/audio without issue. Some adjustments however to ensure no buffer problems/etc.

Producer server will get its first broadcast and wait till a user comes to view, a first attempt at creating the pipetransport will be made with the consumer server and then the request if no stream exists to have it re-produced for the user and remembered for the next connections.

If all of the viewers leave the consumer server for the producer resets unless there is an active broadcast elsewhere (like different room).

So if you’re a user hanging alone on camera it shouldn’t cost a viewers’ server anything.

The numbers are not perfect, but it could be interesting for those to know during my tests CPU would not maintain a consistent % to user-count; there was at least a few factors to involve/consider but maybe if I sort that, these numbers will be much getter.

Now additional if I were to add more power to this, it’d be to have producer able to use 2 or more consumer-servers or more than 6 weight for the big rooms.

Enjoy that guys,

So I’ve been testing this for weeks, it’s perfect however some issues and improvements needed to ensure this works better but here it goes the list.

  • Where weight is greater than 2 treat it as individual points to 100% fill servers but to further allow them to continue with their consumption.
  • Separate the screen share from the broadcasts via server and change their weight factors, reason for this is higher bitrate and more usage on CPU however if you set at least one per room, by fill-up it’s AOK.
  • If you are planning on tier system, in my case tier 1 is 6x12 and tier 2 is 12x24; ensure you close room and re-open it if a tier is changed to not lose the consumer servers. :wink:

Dropped bitrate so it’s now up-to 800Kb/s but plan is to ensure the screen shares can have 1-4mbp/s

Tests so far have shown that what I list initially is correct and I can scale like this, media-soup on every CPU core (not much variance) had capability of delivering up-to 140 total audio and/or video transports. Across four cores up to 560 transports.

On 1gbp/s networks I was pulling around maximum (combined across all interfaces) 150Mbp/s. Daily I use about 1-20TB per server depending on if it gets used, overall for 100 or less, up-to 10-20TB bandwidth daily and if higher, well you can guess.

If you guys need more information ask. This setup allows me to overload a zone at a time, a zone being my broker handler (main signalling server).

Cheers, I’ll be hitting thousands of users soon and the imrpovement coming soon will make this run better and at full use.

If this is something you guys want to learn more about, my goal is to release a small demo of the idea simplified with of coarse the help of you guys if you think we can advance this balancing system across networks. (Demo may take up to a few months or stay as a discussion here but those venturing the many by many will love this).

I will add this as a disclaimer for those wanting to do major broadcasting, most hosts will not provide you unlimited bandwidth and even on the softest setups it could cost you thousands. Don’t become that victim, Digital Ocean almost did it to me during testing/setup (but switched to Tier 1 hosting).

1 Like

Super interested in this. Please provide link to a demo / your setup.

I currently live test on users (and their rooms). Site is set for registered users only, and wouldn’t show much of this concept off however it’d look clean the streams noticeably.

Unfortunately a demo is not possible at this time, I will happily talk about it however and answer questions. If there’s ideas throw them down.

This project is rather large so I’m by no means ready to write any demo for users to simplify this just yet. Hopefully explained well enough for developers to take a poke and see what’s up. :slight_smile:

1 Like

Fast update on the project, results have been perfect but after experiencing a very nasty attack and really analyzing how far performance will stretch I have come back with revisions to the plan.

So to start, front-end will be load-balanced/proxied. Session handling will be done on its own database cluster to keep critters from hurting back-end performance. Account database will be clustered with read only nodes to service room-entry.

For premium, simple load-balance and its own database server that’s utilized when payments are accepted. Should never be offline, or be exposed to non spending entities.

All rooms now are to have a dedicated chat-server that’s proxied/handled by CloudFlare API. This allows the highest QPS without degrading performance that’d be introduced by let’s say Redis, or other MQ like services. In rare cases you may want this however and that’s fine, it’s reasonable for DOS protection/fail-over etc.

For media-servers, the diagram is not too clear but I’ll explain it fast. Below is our Consumer Server table, this is applied during production as well.

Tier Broadcasts Viewers Required Transports Required (cores) weight required
0 0 12 0 0 0
1 6 12 132 1 12
2 12 24 552 4 48
3 12 48 1128 8 96
4 24 48 2256 16 192
5 24 96 4560 32 384
6 48 96 9120 64 768

So as you may see, Tier references the alloted Broadcasters to Viewers, so if Tier 3 we can have a room with 12 broadcasters and 48 viewers. This assumes both audio/video will be transmitted. A requirement now of 8 cores

The required weight has to do with distribution, we can consider them points, pounds, whatever you want. The consumer server can handle 12 weight, this to simplify allows us to never over-shoot performance.

So if broadcaster has…
0 viewers they weigh 0.
1 viewer they weigh 1
6 viewers they weigh 1
7 viewers they weigh 2

A broadcaster can be consumed by different servers so these points are not deducted in values higher than one, record is kept of where users are with which server so when they’re removed weight can be added back properly.

The producer server is kept simple, 30 active broadcasts so that you can actively re-produce these to a consumer server and have zero faults. Now for small rooms, this is over-kill for sure but forget that…

So by default all rooms get a dedicated producer server and a chat-server; if they’re a free-room just access to global chat servers.

This design is far from perfect, there won’t be code release unfortunately but this should maybe give ideas for those looking to make things super-fast. If I haven’t explained something good enough let me know. Goal for me is to provide a site for thousands of users to enjoy cheap many-to-many and dedicated almost nearing first year and it may be just a few more months to have this fully in play the above.

Fun note, doing this solo is not easy. Companies like discord were finished in two months… disgusting. haha

Slight adjustments for now, rooms are starting to fill up galore. Site has over 10-20 rooms relatively filled at a time.

So my producers now are more dynamic,

Publishers Viewers Weight Required Consumer Servers Producer Servers Required Re-Produce/Consume Producer Limit
0 12 0
6 12 6 2 1 1 13
12 24 24 8 1 2 13
24 48 96 32 4 4 7
1 96 8 3 1 8 4
2 96 16 6 1 8 4
3 96 24 8 1 8 4
12 96 96 32 3 8 4

The publisher is the amount of broadcasts as viewers is the amount of subscribers. So in a 6x12 room, we require each publisher use’s a single weight when they’re consumed to open 12 viewer slots (6 weight total for 6 publishers wanting 12 viewers).

The consumer/producer server count are the required servers for the room size, they represent 1vCore@512MB machines each.

Where it gets tricky, we need to ask ourselves how many times a producer may need to be re-produced. My results were pretty much 1.6%-2.0% on every broadcast and adds up quick.
Published audio/video, pipe-transport, re-producing and consuming are their own thing and to be considered.

Producer Logic:
13 broadcasts(26%) @ (52%) 2 re-produce each = 78%
7 broadcasts(14%) @ (56%) 4 re-produce each = 70%
4 broadcasts(8%) @ (64%) 8 re-produce each = 72%

How this works is 13 broadcasts multiplied by 2 (audio/video) equate to 26% CPU usage. This process is repeated when re-producing occurs, so we take the equated value of 26% and multiply that by re-produce count and add all the totals up to get a general CPU ball-bark. Goal is to avoid 100% usage the server starts to choke and add latency (not usable). If we required 8 re-produces off the list above we run the producer server at 4 broadcasts total and add more producer servers as required.

Consumer Logic:
3 re-produce@6% + 12 viewers (72%) = 78%
(3 weight/re-produce and 12 viewers per weight)

Consumer logic does not change much, when a user wants to view a producer a weight is deducted as explained above but the numbers are a bit more fine-tuned now as I was over-loading prior.

Scaling like this has its majors advantages, you’re guaranteed lowest latency, highest quality streams and consistency across network with this service. The only issue I have truthfully is CPU usage jumps quite a bit, if ran to heavy it lags itself commonly so 20% overhead in CPU is provided to handle.

If there’s changes I’ll post again the new numbers, I won’t be uploading/posting code so please don’t ask. Enjoy the numbers/ideas for your project if any use to you! :slight_smile: