Best Architecture for Offloading MediaStream from Mediasoup to a Sidecar Node.js Service for Image Processing

HARJEETSINGHX · July 9, 2025, 7:51am

Hi team,

We are running a production-grade Mediasoup server and looking to implement a Node.js-based sidecar service (deployed on Kubernetes with HPA) that will perform image processing and detection on all video streams in a room.

Our goal is to offload this processing from the main Mediasoup server since it’s CPU/memory intensive, and we want to keep our media server lightweight and focused solely on media routing (as it’s the core of our application).

Current Setup:
The Mediasoup server running successfully.

Multiple rooms, each with multiple users.

Each user is publishing a video stream via Mediasoup.

Kubernetes is used to deploy microservices, including the upcoming sidecar service.

Objective:
What would be the best and most efficient architecture to:

Consume all video streams of a room from the Mediasoup server in a Node.js sidecar service (for frame extraction, image processing, etc.).

Ensure the Mediasoup server is not overloaded.

Scale sidecar services horizontally per room or load via Kubernetes HPA.

Specific Questions:
Is it advisable to create a bot-like Mediasoup peer in the sidecar service that joins each room and consumes all producer streams?

Would a single PlainTransport per user stream (or a combined one per sidecar peer) be ideal?

How can I handle routing video of all users from a room into this sidecar while ensuring performance and network isolation?

Are there examples or best practices for handling this kind of “observation bot” consumer setup in Mediasoup?

Any architectural guidance, working examples, or common pitfalls to avoid would be extremely helpful.

Thanks in advance

ibc · July 9, 2025, 9:26am

If you need to do video processing you need a decoder and hence a proper RTP consuming endpoint (with jitter buffer, NACK/PLI/FIR capabilities, etc). You cannot just process video RTP packets the way mediasoup receives or forwards them since they could be out of order and some of them could be missing, and take into account that video frames are split into multiple RTP packets.

BronzedBroth · July 10, 2025, 4:28pm

This is silly but it works…

The most efficient for the system is having the client capture, process and submit the image. The problems with this though is that client has higher CPU% usage and the image submitted may not be what the user is displaying on stream at all.

Topic		Replies	Views
Processing video server-side, inside a router? mediasoup libraries	7	1045	August 30, 2023
is there an "easy" way to "consume" media on the server? mediasoup libraries	4	235	May 20, 2024
Is it possible to make some "backend" jobs on the server side?? mediasoup libraries	11	653	October 7, 2021
Pipe Media across nodes/k8s pods. Deployment & Scalability	6	207	May 3, 2024
Server-side processing of WebRTC streams mediasoup libraries	1	820	June 14, 2022

Best Architecture for Offloading MediaStream from Mediasoup to a Sidecar Node.js Service for Image Processing

Related topics