Using Selenium for screen-recording

Been exploring the possibilities to record the room and, understandably, consuming all producers individually and then post-processing them into a single file (using ffmpeg/Gstreamer) can be a big pain - code implementation-wise as well as CPU intensity-wise.

Led me to consider client-side after all, but on server side – how about we automate a peer to enter the meeting (using Selenium, or such), and record the screen of the UI/window, with the audio from speakers. Pseudo-server recording?

Has anybody thought along these lines? Any sort of constructive criticisms/input will be highly appreciated.


You can use a server side Chomium with proper output audio/vídeo routing to virtual audio/video devices and record the resulting audio/video stream using ffmpeg or similar.

Let me see if I get this right:
the headless chromium redirects all consumers(audio & video) to virtual A/V devices, and then in-turn I record those?

Just trying to understand - how will this solve not wanting to post-process individual A/V streams?

I was wondering if we could get our headless Chromium to output the (screen-share) UI to something like Xvfb (display server that runs in virtual memory) & then record that output through ffmpeg?

Chromium redirects its single audio and video (so the mixed audio and composed single video) to (virtual) devices.

“Consuming all producers” is something you’ll need to do to get the A/V streams. You can do that from chromium headless (via puppeteer for sure, and Selenium, probably), and use the screen recording API to get your A/V streams and UI dressing into a recording.

In this model essentially the headless Chromium as a participant in the call, and chromium is your post processor (muxing the A/V streams and UI dressing), like any browser in the call is.

(And yes, xvfb is your friend - here)


You could look at this project: GitHub - aau-zid/BigBlueButton-liveStreaming: Streams a given BBB Meeting to an RTMP Server.

You can even consume them in real time and generate a single composite stream, without resorting to headless Chrome and Selenium. PM me for a demo.

1 Like

Any leads on how that can be achieved from Chromium? Getting a composite of all the video + audio streams?

Sorry, that’s too much. I cannot provide with such a detailed information.

Thank you so much for the reply!

I’ve been experimenting with puppeteer - facing issues with the screen-capture API as it needs explicit permissions (dialog box popup) and that too can be achieved only in headful mode as of now.

Do you have any insights to using the screen-capture API in a specific way, or maybe any other ideas, to get the composite of A/V streams?

Really appreciate the help. Thanks again!

No worries - thank you so much for taking out the time to help!

GitHub - muralikg/puppetcam: Export puppeteer tab as webm video might have some hints.

Explored this: for chrome extensions to work, can’t run it in headless mode. Also, people seem to complain about bad frame-rates. I guess I’ll have to resort to recording individual streams & then post-processing them.

Let me know if anybody has any other insights regarding this.

Unless you consider taking the Chromium mixed/composed audio and video output streams and route them to some running ffmpeg or similar as suggested in previous comments.

How do I get those composed A/V streams is the question I’m having trouble answering.

What do you have in mind?

I don’t have anything in mind. I did exactly that in the past. However I cannot provide the full solution here. There are some tips above in this thread. In addition, Jitsi does something similar. You can look for it in Jitsi GitHub repositories.

That’s the point of xvfb in this recipe.