Hello, Im trying to add this feature to my proyect by consuming an external API where I can send audio (o microphone in real time) and it returns the string of the interpretation. My intention is to capture the audio from just 1 side of the videocall and show the text on the other side (for deaf people to use the service). I checked the documentation but Im not sure where should I add this code. Thanks and sorry if my question is out of place/category.
mediasoup is a library, you don’t just add a piece of code somewhere, you need to build an app using it that does what you need. In This case you’ll have to consume audio from a particular speaker and send it somehow to the service that will do text recognition.
1 Like
Thanks very much, I needed orientation on how plausible was what I was trying to do.
This is nothing unusual, I think recognition services even support RTP in some cases, so you can catch RTP packets in Node.js and send to the service.
1 Like
It’s definitely plausible. I’ve implemented something similar using ffmpeg and Google’s speech-to-text API.
1 Like
Thanks both, Im going that way now that I know it’s posible.