Mediasoup integration into existing WebRTC architecture | Mental health teletherapy

Hello MediaSoup Community!

We’re a small mission driven startup in the behavior health space (teleo.space) and are facing challenges as we scale our WebRTC architecture. After exploring options, we are considering MediaSoup to address these issues and would be deeply appreciative of your insights.

Our in-house solution uses Node.js with node-webrtc for WebRTC and gstreamer-superficial for GStreamer integration. The system is peer-to-peer:

  1. One peer is a Debian container which:
    1. Shares its screen from a GStreamer pipeline using
      1. ximagesrc
      2. pulsesrc
      3. appsink
    2. Receives events via data channel for remote control
  2. Additional peers are instantiated per user’s browser and each:
    1. Sends video and audio
    2. Sends and receives events via data channel

The 2 main challenges are:

  1. node-webrtc’s RTCVideoSource only accepts raw video frames. Processing and sending frames for multiple peers is CPU-intensive, limiting us to 3 browser peers plus the container per session before performance degrades.
  2. As we scale the number of peers, the peer to peer architecture will be resource intensive for each peer (CPU and network bandwith).

We’ve implemented the perfect negotiation pattern and minimized costs using peer-to-peer data flow, with a TURN service as fallback.

We aim to support up to 14 participants per meeting without excessive infrastructure costs and without exceeding user’s CPU and network limits. To achieve this, we’re considering hosting a SFU like MediaSoup.

Option A - Deploying one MediaSoup instance per Debian container (per meeting). (The Debian container could connect to the SFU as a WebRTC peer or via GStreamer using PlainTransport)

Option B - A centralized MediaSoup instance for multiple meetings.

We’d love your feedback on the following:

  1. Do you recommend Option A or Option B? Why?
  2. What should we watch for when transitioning to an SFU architecture?
  3. Are there better open source solutions for our use case?
  4. Given AWS’s network outbound charges, would hosting MediaSoup in-house be more costly than using other cloud-based video solutions (ie: Zoom SDK, agora. io, etc.)?

Thank you in advance for your help!

If you’re interested in providing ongoing support and/or consulting, please checkout our post in “Job Opportunities”

Best,

Murilo Cruz
Founding Engineer at Teleo

Contact us at engineering@teleo.space or schedule any time here.

Unless you do tricks with the TURN servers you probably need that mediasoup instances have a public IP assigned in your cloud so, probably, having as many instances of mediasoup as Debían containers may not scale if you want to scale horizontally.

I would rather have a centralised cluster of mediasoup instances with N assigned IPs, and would take into account the number of CPU cores on those machines and run same number of mediasoup workers, since each worker is a separate process.

Hi @murilo.teleo.space, I have already wrote to @TomFanella, but I’ll answer some of your questions here too :slight_smile:

I recommend Option B, it’s easier to maintain and deploy, and Mediasoup is capable to handle multiple rooms of up to 14 participants depending of the capacity of the server, as @ibc already said. If you would want to host A LOT of rooms, then you would need to implement some scalability solution, both managing multiple media servers instances yourself… or with some solution for horizontal scaling like my own one Mafalda SFU (shameless plug :slight_smile: )

Migrate from a P2P mesh architecture to an SFU is not easy. Frontend code and layout could be reused, but management would need to be done by the app server, and rethink from scratch. OTOH, it would have the advantage of not needing to use node-webrtc to emulate a client to get the stream, because it can be captured directly from Mediasoup by using RTP.

I would need to get more details of your actual use case, but if you want have low level access to the streams both for management or scalability or performance, I think there’s almost none that can beat Mediasoup, other solutions are more focused on higher level use cases like videoconferences and so. Maybe they can work for you, but in that case it would be not only throw everything to the trash and start from scratch, but also this higher solutions have less flexibility, so I would think seriously your particular use case and both current and probable future needs.

Using Zoom, Agora and other cloud based solutions can work for small scale, but they have two big related problems: you get binded to their “easy to use” API, making migration to any other solution almost impossible, and once usage increase, costs get prohibitely. That’s why I run away from any online walled garden API.

If you have already something working, specially if you’ve just started the migration to Mediasoup, stick with it, your future self with thank you :slight_smile:

1 Like

Check out MiroTalk SFU:
Live Demo: https://sfu.mirotalk.com
Repository: https://github.com/miroslavpejic85/mirotalksfu


Thanks Iñaki! This advise is extremely helpful and I agree that the centralized cluster of mediasoup instances makes the most sense for optimizing horizontal scalability and cost per CPU usage. Do you have any advice and/or examples for scaling mediasoup server instances using Kubernetes (or similar orchestration system)?

Thank you for the extremely detailed and considerate reply. I’m excited to collaborate with you!

1 Like

Hi Miroslav, based on the linked github page, it appears that microtalksfu is using mediasoup for the SFU server. Is that correct? Either way, how does microtalksfu differentiate itself from mediasoup?

Definitely that’s not my business I’m afraid.

Hi Tom,

Yes, MiroTalk SFU is built using mediasoup as its core SFU (Selective Forwarding Unit) server, which is a robust and widely used WebRTC framework.

How MiroTalk SFU Differentiates Itself from Mediasoup

  1. Out-of-the-Box Solution:
    While mediasoup is a low-level framework for building WebRTC SFUs, MiroTalk SFU provides a complete, ready-to-use application. This means MiroTalk SFU handles not just the SFU aspect but also the signaling, frontend interface, authentication, and other integrations needed for a functional video conferencing solution.

  2. Ease of Use:
    Mediasoup requires developers to have a good understanding of WebRTC and signaling to build custom solutions. MiroTalk SFU abstracts much of this complexity, offering a user-friendly setup for deploying and managing rooms without deep coding expertise.

  3. Integrated Features:
    MiroTalk SFU includes pre-built features such as:

    • Multi-party video conferencing.
    • Broadcasting (one-to-many) with RTMP stream support.
    • Screen sharing.
    • Recording capabilities.
    • Collaborative Notepad and Whiteboard.
    • ChatGPT integration, AI Avatars, group and private chat, as well as file sharing.
    • Role-based participant management (e.g., host, guest).
    • And many more features, all of which are configurable from a single central config.js file. Here, you can rebrand the application, update the logo and description, choose which buttons to display during meetings, and much more!
  4. Scalability and Deployment:
    MiroTalk SFU is designed for easy deployment with Docker, making it straightforward to scale horizontally by spinning up additional instances. Mediasoup, in contrast, requires developers to architect and manage their scaling strategies.

  5. Community and Support:
    MiroTalk SFU comes with a more focused community and support for its specific use case as a video conferencing platform, while mediasoup is geared toward developers building a variety of media server applications.

If you’re comfortable building from scratch and customizing extensively, mediasoup offers flexibility and power. However, if you need a ready-to-deploy and customizable solution, MiroTalk SFU saves time and effort by packaging the essentials for WebRTC applications.

@TomFanella Seeing that you are looking for cloud pricing info. I have some info I did research for other projects, just for reference, please check again before proceeding.

Here it say 500 consumers per router / core:

With 1 viewer it needs at least 2 consumers (audio + video)

** 1 core = 250 viewers, efficient rate 50% ~ 100 viewers / core
** aws c7gn.8xlarge 32core = 3,200 viewers, 300 instances = 1M viewers = $600/h
** 1920x1080 low quality bitrate 0.5Mbs = 225,000GB/h, network bandwidth pricing 0.05/GB = $11,250/h

So for 1M concurrent users ~ $12k/h
You can lower down for 1k = $12/h = $8,640/month

We can notice that pricing for the network bandwidth is insane, for cheaper cloud provider such as Hetzner:
** Network $0.00119/GB ~ 2% of aws 0.05
** Computing ~ 10-20% of aws ec2
For 1k concurrent users it will be like $250/month

1 Like

@TomFanella Regarding the scale logic, by default in mediasoup demo it already has vertical scale by number of workers using cpu cores.
For horizontal scale using k8s, we will have a central api where you get the protoo url from this api. The api will look into the db to find available mediasoup instance. Those instances will post back to this central api to update stats such as number of peers, cpu & ram usage, etc… The api can find the available instance using those stats, and consider to scale up a new instance if it matches some limit. Then with the available instance info, the api will response to frontend protoo url points to that instance. There will also a cron job to check for empty instance to scale down as well.

Each instance need a public ip address, so it require full node deployment and expose host mode network. It is also required to have turn servers to serve people behind strict NAT, where you can combine with the mediasoup to deploy in pod instead. For example here is a solution they claim: GitHub - l7mp/stunner: A Kubernetes media gateway for WebRTC. Contact: info@l7mp.io

1 Like