Active speaker detection in React Native

I have a react native app and use mediasoup-client backed my mediasoup media server. In react native, I want to know if a producer is speaking or not to show an animated border if the producer is speaking. Is it there any way to do so in react native or this should be done on server side?

Do it on client side no need of doing on server side. Use hark library GitHub - otalk/hark: Converts an audio stream to speech events in the browser it exactly does the same. provide it with the audio stream and it will trigger event when someone is speaking or not.

@zaidiqbal thanks for your response. I previously tried hark in react-native but it didn’t work. I think the reason is that AudioContext does not exist in react-native and it is a web api. Let me know if I am missing something.

I see, I missed that point. So it will not work in react native. I think the way to go is to do it with Stats Api of Webrtc, there are some specific parameters for audio tracks like audioLevel etc, that can work for you. You can get the underlying peerconnection of mediasoup transport to get these stats:

https://groups.google.com/g/discuss-webrtc/c/_sio59LzQkc

https://w3c.github.io/webrtc-stats/#dom-rtcinboundrtpstreamstats-audiolevel

@zaidiqbal I event tried that one. Here is a function I used which takes stats as parameter to check if voice is active or not:

const isVoiceActive = (stats) => {
  const packetsReceived = stats.packetsReceived || 0;
  const jitter = stats.jitter || 0;
  const packetsLost = stats.packetsLost || 0;
  const audioLevel = stats.audioLevel || 0;

  const packetLossThreshold = 5; // Example: 5% packet loss
  const jitterThreshold = 10;    // Example: 10 milliseconds jitter
  const audioLevelThreshold = 0.1; // Example: 0.5 (adjust based on your use case)

  // Determine voice activity based on the criteria
  const isPacketLossAcceptable = (packetsLost / packetsReceived) * 100 < packetLossThreshold;
  const isJitterAcceptable = jitter < jitterThreshold;
  const isAudioLevelActive = audioLevel > audioLevelThreshold;

  return isPacketLossAcceptable && isJitterAcceptable && isAudioLevelActive;
}

and here is how I use isVoiceActive:

setInterval(() => {
                consumer.getStats().then(stats => {
                  stats.forEach(report => {
                    if (report.type === 'inbound-rtp' && report.kind === 'audio') {
                      console.log("Is Speaking: ", isVoiceActive(report))
                    }
                  });
                });
              }, 1000)

but it always returns false.

I think no need to take into account the packet-loss, jitter just ignore them and only work on audioLevel. If audioLevel is changes when you speaker or stop speaking then it will work for you. See if audioLevel is changing or not.

@zaidiqbal yes the audio level is changing over time. As I logged, audioLevel changes between 0 and 1.
But even if there is no voice the audioLevel is greater than 0. Which value of audioLevel indicates that the producer is speaking?

I think you will have to play with it to find the optimal value as I haven’t used this approach, it should be close to 0 when you are not speaking or there is background noise and incase of prominent voice it will be more towards 1 like .5, .4, 1. I was checking with google meet, chrome://webrtc-intervals while I speake the value raises and comes back to 0 when I stop speaking.