VAD support (no transfer when silence)

LadaBr · March 15, 2021, 2:04am

Hey,

I tried to search everywhere but I could not find if it is possible to use voice activity detection and volume threshold with mediasoup. I tried to detect it with AudioWorklet and trigger pause on stream but when producer is paused then stream stops and it is not possible to detect volume anymore. And if I use multiple streams (one for sending and second one for detection) then there is small delay for first couple of chunks which are not transfered after resume() call and audio is cut in the beginning. I need about 100 users connected and their data transfer must be paused when not needed in order to handle so many users via webRTC.

Is VAD supported or there is no way of pausing producer client side when there is silence?

Thanks for info.

LadaBr · March 15, 2021, 2:14am

I just found out there is option for disableTrackOnPause: false which allows AudioWorklet to keep detecting volume and call events for pause/resume in main js thread via this.port.postMessage so there will be only minimal delay because of the message. (still think it is gonna miss first chunk because of the delay (5µs) caused by message)

Is this approach correct or am I missing something?

snnz · March 15, 2021, 3:22am

You could clone the track instead and let AudioWorklet work on the cloned one without resorting to disableTrackOnPause: false.

dimoochka · March 15, 2021, 3:54am

@snnz 's approach is the one I use. Clone the track, connect it to an analyserNode and use that to measure volume. All of this happens on the client side, you don’t need to involve the server at all. You also don’t need to use mediasoup, it’s all web audio API based.

LadaBr · March 15, 2021, 11:12am

Are you sure I don’t have to use mediasoup? If I don’t call pause() then data is being transfered all the time. With 10 users it is about 2 mbit/s bandwidth. Also even if I connect analyserNode I still need to have AudioWorklet to process every chunk and only way to communicate with workers is postMessage which causes delay so pause/resume won’t happen fast enough for first chunk of speaking therefore being lost and when you say some short letter fast other consumers won’t hear it.

And setInterval is not fast enough and I would have to buffer old chunks and then somehow prepend them to producer stream when speaking starts

dimoochka · March 15, 2021, 12:37pm

Oh I see what you’re trying to do. Yeah, it’s a challenge, you do need to use pause() client side. You maybe don’t need an audioWorklet for performance reasons (I do this at a rate of 60 analyses per second and it works fine in the main JavaScript thread even on mobile devices), but that’s not really your limiting issue.

You could look into buffering the audio on the client side, but that would introduce delay and make your app non realtime. That would be purely web audio API based. There’s also an option with mediasoup where you can process individual rtp packets server side, but then you wouldn’t ever be able to pause.

Are you sure you want to structure it like this? With 100 people on simultaneously it’s basically guaranteed that people will be talking over each other inadvertently. Plus people generally want the option to mute themselves. Have you considered doing push to talk?

dimoochka · March 15, 2021, 1:34pm

Another thing to consider is enabling DTX in the opus codec. It reduces bandwidth substantially during periods of silence. You might be able to scale your bitrate down too if you’re only transmitting speech. You get excellent results with only 40kbps and can go as low as 8kbps and retain reasonable quality.

https://developer.mozilla.org/en-US/docs/Web/Media/Formats/WebRTC_codecs

LadaBr · March 15, 2021, 3:09pm

I am creating VOIP for in-game integration. Distance, 3D spatial, effects (reverb, muffler, underwater). So everyone is connected to the same room but they are muted for each other unless in required range with enough volume. So I think this is the best approach. Maybe I am wrong but I see it as most efficient.

Thanks for info about DTX.

It is just postMessage delay which sometimes causes first non-silent chunk to be skipped. Like 1 in 30 resume() calls but since it is quite rare, perhaps I am gonna just leave it that way and recommend Push-to-talk feature for users.

dimoochka · March 15, 2021, 3:21pm

Consider trying to do the audio processing in the main thread - I don’t think it’s terribly CPU intensive but you won’t know until you experiment. I took a look at audioWorklet in MDN earlier and it’s not (yet?) supported in Safari which has substantial market share and might cause some issues for you.

Topic		Replies	Views
Pause Data Coming To A Consumer mediasoup libraries	8	1308	December 21, 2020
Questions about new producer pause RTP behavior mediasoup libraries	8	311	May 25, 2024
Muting a specific producer server side mediasoup libraries	2	46	July 26, 2024
Producing with producer paused mediasoup libraries	1	740	May 29, 2020
Audio, Video Track Disable With Mediasoup Integration	5	830	January 25, 2023

VAD support (no transfer when silence)

Related topics