pipe the dominant speaker on one router to another

Hi Folks,

Firstly just wanted to say thank you for an awesome library and examples; I’m only five days into my media soup journey but having a great time learning.

Apologies in advance for not being as skilful as some of the legends here, I’ll try my best to explain what I was trying to achieve and my approaches, and I wondered if I could get a kick in the right direction.

If we create 3 routers, one is the master/apex/leader router and the other two are classed as child routers or sub routers; when a dominant speaker changes in the child/sub router, that producer should be produced to the apex router too.

The apex router essentially becomes a room of rooms with 1 producer for each child router containing just the video for whoever is the dominant speaker in a child room/router

A couple of days ago, I knew nothing about media soup, so I started out with the idea of using pipeToRouter.

So I did a “createActiveSpeakerObserver” to create an observer, and every time that detects a dominant speaker change it will pipe to the apex router the right producer. I got this error “Channel request handler with ID bdb527b0-0a12-428e-aa03-7934d818fa73 already exists” when trying to do the pipe, I haven’t figured that out yet. But!

I’ve always been worried that 1) my approach fundamentally won’t work because I don’t know enough about media soup yet, 2) that eventually that apex room would end up with a producer for everyone in every child room, which could be really heavy even if I marshall the pausing and resuming.

My brain was saying I need 1 producer for the video and to sort of write/copy the data from the dominant speaker producer in the child router into that new producer to go to the apex router. I’m not sure how to achieve this or if it’s possible.

Last night I stumbled on broadcasters in the demo repo, and thought that I could possibly use FFMPEG or GStreamer to create that custom producer and produce the dominant speaker to the apex router.

My question is, would FFMpeg or GStreamer be the right direction to travel, or am I missing a blinding obvious solution in the media soup API?

Unfortunately you cannot pipe router1 into router2 if both Routers have been created in the same Worker. This is a known limitation and we may try to relax it in the future.

Ah ok, that’s really good to know; well, at least I got close. Thank you for the feedback @ibc

Would the right approach, for now, be trying something like GStreamer or FFmpeg to connect as a consumer to the child router and produce to the apex?

You should truthfully manually pipe the routers every-time, it simplifies your process when your vertically/horizontally scale. Or atleast understand how we manually pipe server <-> server.

IMO dominat speaker process can be done all client sided, it’s a waste to actually have the server determine especially if you routin many servers to perform this task. I would make Client sort this, use their CPU. :slight_smile:

Hi @BronzedBroth, Thanks for the feedback. When you say manually pipe the routers, do you mean don’t use pipeToRouter and instead manually setup the transport between routers myself, I guess doing what pipeToRouter is doing?

Where would one understand how you pipe server to server? From the forums, I gathered there was a little mystery around, I haven’t hit that challenge yet, so I can’t say I’m very knowledgeable or have researched it.

Principally I agree with you on the dominant speaker thing; the problem I have is that anyone who’s in that apex router isn’t going to be able to tell who the dominant speaker is without some virtual room construct to group the producers. I think the other challenge was that I felt like sending all producers of a child room to the apex rather than just the dominant speaker would be heavy and require a lot of signalling.

Notionally I would have some proxy producer, the logic to figure out which data ends up in that proxy producer would be calculated in the child router, and then the apex receives one producer from the child.


I’ve tried to diagram what I’m saying incase that is more helpful than my failing words.

I want to thank the folks who’ve replied; I know it must be really frustrating when people like me ask for help without fully understanding the stack or the terminology. So thank you for investing your time in answering.

Bree

Don’t know how much help this can be, but you can have a look at:
https://github.com/edumeet/edumeet-room-server

This supports having an arbitrary number of media servers that you can pipe between:
https://github.com/edumeet/edumeet-media-node/

Again, don’t know if that is of help, but at least an example of piping between different servers.

Hola @havfo

Thank you for the reply; everything is useful while you are still learning, right? and I’m 100% still in learning mode with MediaSoup.

I did super briefly see these, but I will spend more time looking at them; I think because I’m prototyping right now on one server, maybe that’s where the issue is, and scaling out would ironically be easier. I was trying to avoid going super complex until I fully understood MediaSoup.

Ultimately, piping between servers was a mystery, and now it’s not, so thank you again.

Bree

Somewhat off-topic, but I thought it would be nice to share the “why” behind what I’m doing.

Quick screenshot from our “Holodeck” on the platform I’m building.

These Giant TVs are Rooms/Routers as we know them, so when you enter the shaded space, the client connects to the room, which is pretty cool, and this is where I can use client-side speaker detection.

The problem is that outside these areas, when you’re not in the TV room, you’re in the Apex space, and two people standing in a TV area aren’t remotely engaging when you’re on the outside looking in from the Apex router, haha. So this idea of trying to produce a stream from each of the Routers containing just the dominant speaker is so I can show what’s going on in each Router, making those TV spaces so much more interesting to visit.

That’s probably not going to work out well storing a single stream on your APEX server to serve as the dominant speaker’s capture. You could imagine there’d be some delay switching the tracks.

You could perhaps create a client to get its own take on the room, record it and provide those TV’s via HLS over CDN.

@BronzedBroth you rock! Thank you for confirming, I think I mentioned I was wondering if this was a bad solution in my initial post. It was the switching I thought would be a bad experience plus the number of producers.

I had literally stumbled over the broadcast code example and wondered if a client using GStreamer or FFmpeg and then producing from child router to apex was the right solution.

Massively appreciate the confirmation, no clue what I’m doing but I’m excited to figure this out tomorrow if memory from the last 4 days serves me, I think GStreamer is going to be the best choice for this.

I haven’t even considered the HLS thing either and now thinking about it, yeah, you’re 100% right! Tomorrow is going to be epic

Thanks again, you’ve absolutely made my night!

The way I see it, you may have over 100 different end-points for a user to potentially produce to and/or consume from. This adds availability and with good routing logic full on scalability horizontally/vertically.

What you don’t want is the Apex server now being slapped stupid by 100 different media workers submitting/signaling stream switch/etc.

So ideally a bot in each room keeps your Apex from working any type of extra job and you can store all the live feeds on a load-balanced https URL and fetch them when user is in proximity.

There’s many ways but that’s a fast thought.

@BronzedBroth Wanted to share my epic win this morning; I spent all night thinking about things differently after your suggestion, I think I was notionally heading in the right direction of thought, but you really accelerated this and pushed it over the line; I’d never thought about using HLS. I did a quick test and within an hour proved the principle!

I think I need to look into Gstreamer as a broadcast client, then maybe sink into kinesis video streams and dump to somewhere and serve via CloudFront; so many ways to skin a cat on this one; it’s really exciting and massively changed how I was thinking about everything.

Thank you again, and thank you for all being friendly and welcoming; I’ll keep this thread updated with my progress.

Bree

I clearly did not show enough respect for the subject of creating an HLS endpoint with only the dominant speaker, Ouch!

I’ve learned tons from previous posts about Jitter Buffer, and I’ve just finished watching the RTP and RTCP protocol explainers + a little bit of 3550

TL;DR for this is: no easy or out-of-the-box solution for this.
Options include:

Custom client using puppeteer

The current theory is creating a plain transport into something, maybe rtp.js, and running a signal alongside from the router to say who the dominant speaker is and forwarding the right RTP packets onto GStreamer to dump into AWS kinesis via their plugin.

If you can simplify processing beyond puppeteer; that’d be most suitable. I would say experiment more from here and see what it is your project needs, also as well if there’s not an entire team of programmers/staff, then keep it simple for you and scalable to your best abilities.

Improvements can come in time just make it ready for 500-1000 concurrent users first and sort from there.