Need Help Adding Google Speech To Text API On Server Side

Hey There, I Want To Transcribe Real Time Audio Of Producers On Server End. What I Did Was I Created Direct Consumer Corresponsing To That Producer. Now I Am Able To Get RTP Packets In Real Time Which Are Nodejs Buffers. I Tried To Fed These Buffers To Google Speech To Text API First By Converting Them Base64 Strings And Got Empty Response. Then, I Fed Them Directly Through Recognizer.write Method But This Time Got Error Audio Timeout: Long duration elapsed without audio. If Anyone Have Had Experience With Feeding These Packets To Google Speech To Text API, Your Help Will Be Appreciated!

The Speech to Text API isn’t going to know what to do with Base64’d RTP packets. You need to use something like ffmpeg or gstreamer to convert in into a suitable format before sending it to the STT API.

I’ve gotten this to work in the past with ffmpeg and OGG_OPUS as a muxer format. I needed to use the C API version of ffmpeg to force it to generate a new OGG page after every packet. It’s not efficient space-wise but it’s important to keep the latency of the transcriptions down.

I don’t even think someone that use’s a steno keyboard can maintain real-time speeds. There’d be significant delay.

Can You Recommend Me Some Docs Or A Tutorial To Learn About Doing This Specific Thing With ffmpeg Or Do You Have Public Code Which Might Help?

It doesn’t really exist. I would start with one of the demos from the mediasoup website to a general idea of how integrating with ffmpeg or gstreamer works: mediasoup :: Examples

You’re going to need to read a lot of docs with ffmpeg/gstreamer and the google STT API.

Alright. I Saw One Of Your Old Github Posts In Which You Used rtp-ogg-opus Library And Got Success. That Library Is Not Really Documented In General So If You Could Provide Me An Insight Of How You Used It It Will Be Much Appreciated. Very Thankful For Your Help Uptill Now.

I had totally forgotten about this library. This was 3 years ago. If I remember correctly, this kind of worked, but it’s a bit of a hack. RTP packets are not guaranteed to be delivered in order, or even delivered at all. Everything will work fine until a packet is missing or comes out of order.

You might be able to get a demo working using this module, but I wouldn’t ship anything to production with it.

Oh, Ok. That Library Is Quite Poorly Documented And RTP to Ogg Opus Function Is Not Documented At All. If You Have Any Previous Demo Lying Around, Can You Send It To Me?