Recorded Audio (FFMPEG or GStreamer) removes gaps/ silent phases

I am testing recording of video and audio by creating a plain transport which is then consumed by FFMPEG process spawned by my app. When I play the recorded video, I see that the audio seems to jump ahead and gets out of sync with Video over time. The longer the video, the larger the displacement/ desync.

To debug this further, I removed the video producer/ consumer and just recorded the audio stream. I tested the audio by speaking every 5 seconds over a 30 second period and when I play the recorded audio, the audio seem to play without the 5 second gaps between them. The audio player still shows the duration as 30 seconds on the generated file.

I need some pointers on how to debug this. My naive assumption is that each RTP packet in the audio stream is timestamped and the receiving app FFMPEG should be honoring that time stamp and preserving the silent 5 second durations in between…

I have the wireshark capture of the RTP packets from the RTP port but wanted to check if others on this forum have any suggestions or solved similar issues with certain flags passed FFMPEG, before I look deeper.

GStreamer gives me similar behavior but I have not done an audio only recording yet with GST.

Here are arguments being fed to FFMPEG

SDP: -

[sdpString:v=0
o=- 0 0 IN IP4 192.168.4.33
s=FFmpeg
c=IN IP4 192.168.4.33
t=0 0
m=audio 45198 RTP/AVP 100
a=rtpmap:100 opus/48000/2
a=sendonly
]

FFMPEG args
‘-loglevel’,
‘warning’,
‘-protocol_whitelist’,
‘pipe,udp,rtp’,
‘-f’,
‘sdp’,
‘-i’,
‘pipe:0’,
‘-fflags’,
‘+genpts’,
‘-map’,
‘0:a:0’,
‘-c:a’,
‘copy’,
‘-flags’,
‘+global_header’,
‘./files/1623737013962.webm’,

You won’t believe it but this is a question about ffmpeg/gstreamer rather than about mediasoup :slight_smile:

1 Like

I think this might be caused by zeroRtpOnPause for instance. In that case there will be nothing flowing when producer is paused and thus recorded result will usually result in jumps over silence in players. I did see the same behavior myself, this is kind of expected behavior.

If there are gaps in the RTP stream, there will be gaps in the PTS timestamps that ffmpeg writes to disk. You can verify this with ffprobe

ffprobe -show_packets ./files/1623737013962.webm.

Under normal conditions, each packet’s PTS should have a difference of 960 samples (20ms). If there were dropped packets (or zeroRtpOnPause as nazar-pc mentioned), there will be gaps that are larger than 960.

Players don’t always handle these gaps well.

To fix this, you can use the aresample=async=1 filter in ffmpeg. This will resample the audio, and fill in any gaps with silence. Unfortunately, this requires decoding and reencoding the audio, which is more expensive than just muxing it. Our app does this anyway because we need to mix multiple audio streams together into a final mixdown.

There may be a way to insert silent opus packets without encoding, but I’ve never tried this.

2 Likes

Jonathan,

    Thank you for your response. It worked!

I do need to get more familiar with the ffmpeg toolset but your detailed answer along with the ffprobe info really helped understand this otherwise new area for me.

Really appreciate your help.

This is a great community and I hope I can one day contribute to this common wisdom pool.

Best,
Narinder

1 Like

@ngaheer is your client code (at the web browser) doing some kind of automatic track pause mechanism, when silence is detected?

That, coupled with usage of zeroRtpOnPause as nazar-pc mentioned (plus disableTrackOnPause which is enabled by default), would naturally lead to missing RTP packets when the track is on pause. Which will cause gaps in timestamps, so the recorder must be clever enough to fill them or count the gaps and adjust all following PTS with an offset.

However if your application is not doing anything at all with audio detection or pausing tracks… then that’s weird and I don’t think should be happening. Unexpected missing PTS will surely break recordings. Re-encoding the audio should not be needed.

1 Like

The gaps could be from dropped RTP packets on congested networks. This is quite common in our app.

Application is not doing any custom options when setting up producers.

Now that you mentioned, I went back and looked at old recording (without aresample filter) and used VLC video player to play them. Guess what, the recording played correctly with gaps intact.

I am kicking myself for not doubting the browser (default for playing webm files) and be more thorough.

I will go back to the old FFMPEG behavior of just copying the stream as is and see if results are consistent. Either way, learnt a thing or two here.

Which version of FFmpeg are you using?
Have you tried to remove the -fflags +genpts option?