is there an "easy" way to "consume" media on the server?

Hey guys,

I am very new to mediasoup ecosystem, I spent 1 week working with ChatGPT everyday for 4-6 hours, and I still does not have a working app. So I must be doing something wrong.

My question is “based on my requirements, my 0 experience and the fact that chtgpt can’t really help with mediasoup tasks, do you think I should pursue the idea of mediasoup for my app?”

I am building joblander.app - a chrome extension that gives you realtime ai insights on the zoom. My current architecture is just new MediaRecorder on the client that sends audio chunks via websocket to the server. Evertything is working well, except I have only tabMedia not user mic/cam. For my current needs it is enough. However, for future features I need to send 2 streams (tab and user) and on the server merge audio, cut video, cut audio, etc. Hence mediasoup.

However, now after so many attempts I understood that there is no easy way to get the audio/video data on the server? I mean to get the actual Buffer with data that I can .pipe to whatever ffmpeg or trasncription server I need to extract RTP packets? is my understanding correct?

Hey there,

About integrating Mediasoup for your app, it sounds like it could be a good fit for your future feature needs, especially if you’re looking to handle multiple streams and perform server-side processing down the line.

As for accessing audio/video data on the server, you’re correct. While Mediasoup doesn’t provide direct access to raw audio/video buffers, you can set up plain transports and use tools like ffmpeg or gstreamer to consume media server-side.

If you need a solid starting point for server-side media handling, I’d recommend checking out this demo: Mediasoup3 Record Demo. It’s a great resource for getting started with server-side processing.

And if you find yourself needing further assistance or exploring alternative solutions, consider giving MediaSFU a look. They offer some cool features like capturing images and audio buffers in real-time and returning them to the client-side if that interests you, which could be handy for your project.

Hope this helps, and best of luck with your project!

1 Like

Dear Bdayz,

Thank you very much for your answer, it helped a lot!

If I would go into all the complications with extracting RTP packets,
as gpt suggesting:

const transport = await router.createPlainRtpTransport({
  listenIp: '127.0.0.1', // Local IP
  rtcpMux: false,        // Enable RTCP
  comedia: false         // Do not rely on incoming RTP/RTCP to determine remote IP/port
});

Would it make sense to deploy mediasoup as a microservice? otherwise it feels a bit weird having it in a Node app but forwarding the stream to a local port. Or is there another way?

I think I am not getting something clear but I’ll do my best to provide clarity on the matter.

Deploying mediasoup as a microservice could indeed be a sensible approach, especially if you’re dealing with complexities like extracting RTP packets. By deploying mediasoup as a standalone microservice, you can isolate the media handling logic and ensure scalability and maintainability.

The fundamental concept of mediasoup revolves around sending and receiving media streams to and from the server. In your case, where you require access to RTP packets on the server side for tasks like gesture detection, segmentation, ML processes, or recording, the plain transport mechanism comes into play. This transport method, as outlined in the mediasoup documentation (mediasoup :: API), allows you to access and manipulate RTP packets on the server side.

By utilizing plain transport, you can intercept and process RTP packets as they flow through the server, enabling you to perform various tasks such as analyzing media content, applying machine learning algorithms, or recording streams. It serves as the conduit through which you can access the raw media data for your specific use case.


To put it loosely, your Node.js process, running with Mediasoup, has the primary task of relaying your media streams. Simultaneously, you “punch” a hole via a designated port to access the RTP packets for additional processing or analysis.


I hope this explanation sheds some light on the topic.

Dear Bdayz, thank you very much