Been exploring the possibilities to record the room and, understandably, consuming all producers individually and then post-processing them into a single file (using ffmpeg/Gstreamer) can be a big pain - code implementation-wise as well as CPU intensity-wise.
Led me to consider client-side after all, but on server side – how about we automate a peer to enter the meeting (using Selenium, or such), and record the screen of the UI/window, with the audio from speakers. Pseudo-server recording?
Has anybody thought along these lines? Any sort of constructive criticisms/input will be highly appreciated.
You can use a server side Chomium with proper output audio/vídeo routing to virtual audio/video devices and record the resulting audio/video stream using ffmpeg or similar.
Let me see if I get this right:
the headless chromium redirects all consumers(audio & video) to virtual A/V devices, and then in-turn I record those?
Just trying to understand - how will this solve not wanting to post-process individual A/V streams?
I was wondering if we could get our headless Chromium to output the (screen-share) UI to something like Xvfb (display server that runs in virtual memory) & then record that output through ffmpeg?
“Consuming all producers” is something you’ll need to do to get the A/V streams. You can do that from chromium headless (via puppeteer for sure, and Selenium, probably), and use the screen recording API to get your A/V streams and UI dressing into a recording.
In this model essentially the headless Chromium as a participant in the call, and chromium is your post processor (muxing the A/V streams and UI dressing), like any browser in the call is.
I’ve been experimenting with puppeteer - facing issues with the screen-capture API as it needs explicit permissions (dialog box popup) and that too can be achieved only in headful mode as of now.
Do you have any insights to using the screen-capture API in a specific way, or maybe any other ideas, to get the composite of A/V streams?
Explored this: for chrome extensions to work, can’t run it in headless mode. Also, people seem to complain about bad frame-rates. I guess I’ll have to resort to recording individual streams & then post-processing them.
Let me know if anybody has any other insights regarding this.
Thanks!
Unless you consider taking the Chromium mixed/composed audio and video output streams and route them to some running ffmpeg or similar as suggested in previous comments.
I don’t have anything in mind. I did exactly that in the past. However I cannot provide the full solution here. There are some tips above in this thread. In addition, Jitsi does something similar. You can look for it in Jitsi GitHub repositories.