Best way to package RTP into ogg-opus

I am using MediaSoup to build a voice chat application for language exchange learning. I would like to record the content of the calls, as well as send the audio to Google’s Speech-to-Text API. The Speech-to-text API accepts audio in the OGG-OPUS format. So I need to repackage it from OGG-RTP. I’m considering two approaches:

Approach #1: Use a DirectConsumer in my node process.

I’ve been using to decode the RTP packets and send the frames into libopustools. This works, but it doesn’t handle any lost or unordered RTP packets. Following the discussion from this github issue, it feels like my naive implementing isn’t enough.

Question: How much is a DirectConsumer insulated from the unreliability of RTP? I know MediaSoup will request dropped packets from a producer. Does this mean the DirectConsumer doesn’t need to worry about this?

I think for the purposes of recording, a jitter-buffer isn’t necessary. Is this a reasonable assumption?

Approach #2: Connect each call through a PlainRtpTransport to ffmpeg (or maybe PyAV) and pipe back OGG-OPUS.

This seems to be a pretty standard configuration for MediaSoup, but it’s a little more complicated.

I have a proof-on-concept working for both approaches. If anyone has any feedback on either, it would be very helpful.

Thank you!

DirectConsumer just receives the RTP that the associated Producer sends to mediasoup, nothing else. DirectConsumer is not an “RTP endpoint”.

This is just true for video.

Nope. The DirectProducer will just emit “rtp” events for each RTP packet that it receives from the Producer, in the same order (may be in disorder), with same packet lost, etc.

However I assume that ffmpeg is a real RTP endpoint and will properly hold receive audio packets into a jitter buffer, etc. I don’t know anything about ffmpeg. This may be true however in Gstreamer.

Thank you! This is just what I needed.