Hello, Im trying to add this feature to my proyect by consuming an external API where I can send audio (o microphone in real time) and it returns the string of the interpretation. My intention is to capture the audio from just 1 side of the videocall and show the text on the other side (for deaf people to use the service). I checked the documentation but Im not sure where should I add this code. Thanks and sorry if my question is out of place/category.
mediasoup is a library, you don’t just add a piece of code somewhere, you need to build an app using it that does what you need. In This case you’ll have to consume audio from a particular speaker and send it somehow to the service that will do text recognition.
Thanks very much, I needed orientation on how plausible was what I was trying to do.
This is nothing unusual, I think recognition services even support RTP in some cases, so you can catch RTP packets in Node.js and send to the service.
It’s definitely plausible. I’ve implemented something similar using ffmpeg and Google’s speech-to-text API.
Thanks both, Im going that way now that I know it’s posible.