difference in resource consumption between audio & video consumers

The docs state:

Depending on the host CPU capabilities, a mediasoup C++ subprocess can typically handle over ~500 consumers in total.

But intuitively it makes sense for video consumers to cost more than the audio consumers, right?
Can we draw a relation between the generic 500 number to the number of audio/video consumers?

It is a rough estimate of a mixed workflow. Always do your benchmarks with the use case and requirements you have.