Memory Complexity of adding consumers

I’ve read the Scalability document: https://mediasoup.org/documentation/v3/scalability/. As a constraint I’m only pertaining to One-To-Many section described in the document, and my question assumes that there is only one producer.

The Document Reads:
“If there are more than 200-300 viewers (so 400-600 consumers), the capabilities of a single mediasoup router could be exceeded.”

From what I understand, this clearly explains the compute - CPU - limits of a router, since by definition a router runs on one worker, which is tied to one core (I’m not sure if that’s in terms of CPU affinity or in terms of path of execution, that does not concern my question though). Also as discussed in this amazing thread: Preferentially optimize latency while broadcasting

The immediate question that comes to my mind is “What about memory scalability?”. That is, If I want to use a mediasoup router to its optimal capacity - say 150, considering 200 is borderish - what would be the amount of memory required in terms of the number of viewers of that stream?

Phrased in other words. If I know the amount of RAM - say “m” - used in a meeting:

  • with one producer endpoint having a send only transport , streaming audio and video.
  • with 30 consumer endpoints having receive only transport.

Can I safely assume that the amount of memory used by the router after adding 30 more participants would be of the order of “2m”?

I’m saying this because mediasoup does not do any decoding it just forwards - clones sort of - incoming RTP packets to all relevant consumers. So for N consumers, N copies. So linear memory complexity in a sense.

I would love to be corrected in case there are inaccuracies in my understanding.

Thanks :+1:

Each video Consumer holds a vector of 600 entries of StorageItem (see RtpStreamSend.hpp definition). Each StorageItem holds this:

struct StorageItem
{
	// Cloned packet.
	RTC::RtpPacket* packet{ nullptr };
	// Memory to hold the cloned packet (with extra space for RTX encoding).
	uint8_t store[RTC::MtuSize + 100];
	// Last time this packet was resent.
	uint64_t resentAtMs{ 0u };
	// Number of times this packet was resent.
	uint8_t sentTimes{ 0u };
	// Whether the packet has been already RTX encoded.
	bool rtxEncoded{ false };
};

The store field takes 1600 bytes, so each StorageItem takes 4 + 1600 + 1 + 4 + 1 + 1 =~ 1611 bytes. So each video Consumer takes 1611 * 600 ~= 966,600 bytes. So 200 Consumers mean ~193 Mbytes.

3 Likes

This is AWESOME!! :star_struck:

Thank you for the amazing answer. :+1:

Is the size the same for audio consumers as well? I don’t see any audio stream specific code here so it should be.

Also, I would like to contribute to the scalability docs and add this section there, I think adding memory complexity would be great.

No, there is no RTP retransmission buffer for audio consumers, so empty array.

1 Like