mediasoup integration with generic WebRTC client speaking Unified SDP

First of all, I’d like to mention that I have read documentation, understand rationale behind current design and respect its decisions.

Mediasoup looks great in terms of low level access it provides, that is a very useful and flexible.
However, while it is often mentioned that being agnostic to signaling is a big advantage, de-facto requiring a custom client is an equally huge disadvantage. It basically makes a trivial thing of connecting anything that speaks WebRTC into long, non-trivial and IMO unnecessary complex journey.
Requiring custom data structure that needs custom implementation for parsing and generating SDP for every single platform basically defeats the purpose of being signaling agnostic (IMHO) and is not a requirement that facilitates adoption.

I’m coming from GStreamer side, but unlike many others, from webrtcbin, its WebRTC implementation, similarly to @vpalmisano in Using GStreamer webrtcbin as MediaSoup client
However, I’m using a different language (Rust) and essentially have to re-implement all of the work done by others a few times already.
This is counter-productive for many reasons and from my quick search I don’t seem to be the only one so frustrated.

What do mediasoup developers think about fromSdp/toSdp utility functions on mediasoup server?
De-facto standard to use with WebRTC today is SDP with Unified plan.
Just parse and generate Unified plan SDP, no need to change mediasoup internals.
Was this considered before as an option?
I suspect mediasoup-client has a lot of this done already, at least for SDP generation.

This would be a huge win in terms of interoperability with GStreamer, plain libwebrtc, Pion and many other implementations and bindings in various languages. It will also not force people to use mediasoup-client unless they need something other than the only way to connect to mediasoup server.

2 Likes

mediasoup v1 was like you say, without client library and consuming/producing SDPs in server side. Never again.

SDP is not suitable to signal same settings such as SVC spatial/temporal layers. Well, not true, there is some RFC or draft for that somewhere in the IETF, but nobody implements it: if you tell Chrome/libwebrtc to generate SVC VP9 with 3 spatial layers and 3 temporal layers, you don’t have that info in the SDP generated by libwebrtc.

More: sending a complete SDP to the server means that the server must inspect the whole SDP for any new addition. It also means that, based on the SDP, the server should check whether a new transport is required or not (bundle, no bundle, closed transports, etc etc). Never again. I wrote long about this here:

In mediasoup you create transports (WebRtcTransport, PlainTransport or PipeTransport) and the you create Producers and Consumers on them, and it does not even matter which kind of transport you created, you can create as many Producers and Consumer on it. Producers and Consumers do not care about which kind of transport they are created on.

mediasoup is a low level library, it will not deal with infamous monsters such as 2000 lines long SDPs.

Not sure which “custom data structure” you mean. mediasoup consumes ORTC parameters. ORTC is not in use, but WebRTC 1.0 adopted those structures/objects (RTCRtpParameters, RTCRtpCapabilities, RTCCodecParameters, etc) from ORTC. All those structures are also defined in WebRTC 1.0 spec.

I agree that having more utilities for different clients/languages would be good to have, but that’s where the community could contribute. We just focus on providing a very low level, flexible and reliable server and client side libraries.

Perhaps some of the internals of mediasoup-client (or C++ libmediasoupclient), those internals that deal with SDP generation and parsing, could be exported to other languages. Again, the community has a good chance to contribute by creating them.

Said that, I understand and respect your point but we will not go back to SDP land. Never. SDP is not serious, and when WebRTC NV becomes real and, instead of using a PeerConnection, applications implement their own messaging exchange for signaling and their own data format for media exchange (custom data over QUIC instead of SRTP over UDP/TCP), nobody will see any SDP in those new applications.

I forgot to comment this:

De-facto usage is Plan-B. Most multi-participant WebRTC applications use Plan-B and are Chrome only.

react-native-webrtc (which integrates libwebrtc) uses Plan-B instead of Unified-Plan. Now imagine that we should retrieve a SDP (PlanB) from react-native-webrtc, then convert it to Unified-Plan, then pass it to mediasoup, then parse everything. Definitely that won’t happen.

Just parse and generate Unified plan SDP, no need to change mediasoup internals.

mediasoup is not just for WebRTC endpoints. It supports plain RTP transports, and those clients (ffmpeg, gstreamer withtout webrtcbin, etc) do not generate Unified-Plan SDPs.

Rant

Well, I’d say even limited interoperability with SDP (with a list of restrictions) now that even Chrome defaults to Unified plan would be huge comparing to no support at all.

SVC has a bright future, but VP9 support is limited at best and not as much of a concern for most scenarios until AV1 SVC lands in major browsers.

So even though I understand the pain (read the article and really do), the reality is that people that know less about this than you have to go over those difficulties again and again.

I’m only talking about WebRTC endpoints here of course, not plain RTP transports.

Switching to more constructive conversation given your initial response, what do you think about abstracting away what mediasoup-client does internally into Rust or portable C library that can be easily included into basically any other language including browser (with WebAssembly).

That would also allow to consolidate effort in one place and even beyond just mediasoup.
Rust in particular has a way to generate WASM with TypeScript typings all with one command and can expose C API too, so should be straightforward to use in other projects. Compiling C to WASM is not difficult with Emscripten, but introduces a lot of otherwise unnecessary boilerplate (I have successfully ported a bunch of libraries from both languages to WASM already).

2 Likes

This is super interesting but I have tons of questions. Super interested in this BTW. I don’t have time now so will come back to this next week, just a few questions (I’ve no much idea about WASM yet):

  • Do you mean rewriting the whole mediasoup-client JS into WASM? or just those some internal components?
  • Can WASM call methods in the browser JS runtime such as new RTCPeerConnection(), etc?

Ideally at least the base of it, such that it would be easier to hook in necessary implementation details for GStreamer and Pion for instance. But I’m just throwing an idea without a full plan.

Absolutely! You can basically write inline JavaScript kind of sort of like you write inline assembly in other languages.

We’re also interested in compiling from a portable language into WASM, and would be happy to contribute to a project that’s aiming to do that. Our pain point is maintaining client-side code that needs to run on several platforms (native Android, native iOS, React Native, Electron, browser Javascript).

We have our own signaling, of course, plus lots of management of call state and logic. (For example, we switch on the fly between peer-to-peer and mediasoup-sfu modes during calls.)

We’re doing some early pieces of this “cross-platform” architecture work now, but definitely have a longer term roadmap in mind. We have opinions about architecture and are interested in hearing other peoples’ opinions, too! If we can contribute at the mediasoup level in a way that benefits the community and also fits well with what we’re doing to implement our APIs/libraries on top of mediasoup, we’d love to do that.

I love to have this but, as I said, I don’t know much about WASM yet. A few more questions:

Usually wasm file is located alongside JS file that loads it. There are even some webpack plugins for doing bundling nicely (they are aware of wasm files and move them accordingly), but I didn’t have personal experience with them.

Whenever you need you basically jump out of wasm into JS, call a few things, collect result and jump back into wasm. I do not see anything special in those dependencies that would make it impossible to integrate.

I have looked at mediasoup-client and it would be interesting to integrate indeed, but I think it should be possible to leave browser-specific code in JS and transition common code into native code. It would not be trivial, but people write game engines with 3D sound and WebGL using compilation to wasm, I’m pretty sure we can do this somewhat simpler task :slightly_smiling_face:

Wasm is a binary format, you don’t write it manually in most cases, but rather compile from other languages. I’d recommend to write things in Rust and then compile to WebAssembly/C-like library. wasm_bindgen and some other crates make development rather nice, here is a simple hello-world looks something like this (includes calling native JS function from withing Rust code):

WARNING! Rust code
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
extern {
    fn alert(s: &str);
}

// Fancy way to explain that there is a JS function with
// those arguments somewhere in JS
#[wasm_bindgen]
pub fn greet() {
    // We can now call JS function just like if
    // it is native Rust function (FFI basically)
    alert("Hello, wasm-game-of-life!");
}

Then in TypeScript (generated by wasm-pack):

import * as wasm from './wasm_game_of_life_bg';

export function greet() {
    return wasm.greet();
}

Should be quite familiar I think. It should be possible to write different externs for wasm and C API, so designing common ground would be the main challenge in my opinion.

Thanks, however:

If we leave those libs outside of WASM then they should be implemented in other languages, breaking the purpose of this WASM topic, right?

I do not fully understand the question. Let me re-phrase.
Code base of mediasoup-client can consist of 2 parts after hypothetical transition:

  • Rust-based native code with common functionality necessary for client library on any platform in any language
    • Which can be compiled to wasm for usability in browsers (with auto-generated wrapper that loads wasm and exposes nice bindings in TypeScript)
    • Same code can be compiled, for instance, as shared C library for integration with other languages
    • Same code will likely be usable as regular Rust library
  • Browser-specific code is still written in JavaScript/TypeScript and uses common code compiled to wasm
1 Like

Ok, I understand. What I mean is exactly this:

Why should be awaitqueue, sdp-transform and h264-profile-level-id “browser-specific” code? In fact they are not. Even more, all them are terribly part of the “mediasoup-client core”. For instance, all files within src/handlers/sdp depend on sdp-transform library.

Those are indeed not platform specific except probably awaitqueue that uses native promises.
For sdp-transform there are a few Rust crates (one of which I use) with type-safe API for parsing and serializing SDP, so should be replacable, as to h264-profile-level-id I do not see direct replacement, it will probably need to be ported over with the rest of the code.

It is doable and I will probably be able to help with many of those things in case mediasoup works out for the project I’m working on currently.

C++ is also compilable to wasm, but C/C++ scares me in general (unless there is no other way around) and infrastructure is not as nice there IMHO.

We strongly rely on the syntax generated by sdp-transform. Replacing it with yet another SDP lib is not a suitable option right now.

It’s a port of libwebrtc C++ code, so it should be easy to integrate, right?

Then what? moving everything to WASM but just awaitqueue? And how to deal with other languages?

Not sure about easy, but should be doable.

Other languages can plug in their own implementation if needed. I’m not very familiar where and how it is used, so can’t comment in much details for now.

I think if we move everything except those 3 dependencies that would be a huge step forward already.

I am working on very same integration. Actually i made a PR to react-native-webrtc to support unified plan. Using plan-b is just insane in 2020 for new projects. Everyone support it for more than 2 years already. Following plan-b was huge pain for us.

Hello Iñaki @ibc,

I’d like to share my experience as a “newbie”, which relates to what @nazar-pc wrote.

We’ve done an extensive review of server-side options for WebRTC recently, and among all the open source options (Jitsi, Janus, Red5, AntMedia, Medooze, Kurento, etc.), MediaSoup as a node package stood up as looking like the best mix of cutting-edge, flexibility and performance.

Then, reading the docs, we realized that the MediaSoup architecture was dependent on a client lib that was not compatible with all existing WebRTC client code.
That was a complete show-stopper for the team, and a disappointment as big as the initial excitement.

Instead of giving up silently, I thought it would be useful to share the experience, and found this thread.

Like Nazar, I understand the rationale and respect its decisions.
And reading your replies, I understand there is experience and vision to support this choice.

1/ I fully agree that SDP is a mess.

2/ I understand your vision that RTCPeerConnection might be replaced by some other transport, and by abstracting it it’s better “future-proofed”

3/ Beyond the first layer of abstraction, most structures/objects ( RTCRtpParameters , RTCRtpCapabilities , RTCCodecParameters , etc.) are W3C-WebRTC compliant, so abstraction is light and thoughtful.

Now the obvious drawback is the total incompatibility with current WebRTC client code and with any “building block” that speaks WebRTC. As Nazar highlighted, this defeats the purpose of being low-level and signaling agnostic, and creates a huge barrier to adoption.

How could this be resolved?

Porting mediasoup-client seems only part of the solution, as other WebRTC “building blocks” would not work.
In some way Google’s libwebrtc is the de-facto standard, or at least the dominant force in the marketplace.

Could a connector with Unified Plan and some SDP interoperability make sense?

Congrats for all your hard work and helping the WebRTC world progress Iñaki!

2 Likes

Thanks for this good reading. I’ll properly answer next days.

1 Like

A few days ago I wrote an independent incomplete MVP Unified Plan SDP serializer from the list of consumers (which can be active or inactive) such that I can have SDP offer for GStreamer and tiny parser that extracts DTLS parameters from the answer. So I have a bit more understanding how things work now after reading mediasoup-client.

My observation is that the hypothetical library may not need to be asynchronous or have any complex logic or even any internal state.
It can instead be a set of basically pure functions that take inputs and return outputs, independent from environment just like mediasoup is independent of signaling layer.

So you would receive something using signaling, apply to application-specific state and then call library function produce SDP from that state.
Like when you create RTCPeerConnection and get SDP to infer RTP capabilities, but in some cases you may not need to because capailities are known upfront and you can skip the step entirely.

Such a library would be a dependency of mediasoup-client and possibly other platform-specific clients, containing necessary parsing and serialization methods, but avoiding complex state and calling platform-specific APIs on its own.

Usage pseudo code could look something like use:

rtpCapabilities = {
    peer = new RTCPeerConnection()
    offer = peer.createOffer()
    lib.extractRtpCapabilities(offer)
}

signaling.sendRtpCapabilities(rtpCapabilities)

dtlsAndIceParameters = signaling.receiveDtlsAndIceParameters()
consumer = signaling.receiveConsumer()
consumers = [consumer]
offer = lib.generateOfferSdp(consumers, dtlsAndIceParameters)
peer = new RTCPeerConnection()
peer.setRemoteOffer(offer)
answer = peer.createAnswer()
dtlsParameters = lib.extractDtlsParameters(answer)
signaling.sendDtlsParameters(dtlsParameters)
signaling.resumeConsumer(consumer.id)

consumer = signaling.receiveConsumer()
consumers.push(consumer)
offer = lib.generateOfferSdp(consumers, dtlsAndIceParameters)
peer.setRemoteOffer(offer)
peer.createAnswer()
signaling.resumeConsumer(consumer.id)

I hope it makes sense :slight_smile:

1 Like

@ea167 @nazar-pc I fully understand your point and I understand the need to make it easier to integrate non-browsers into mediasoup ecosystem without forcing the app developer to deal with ORTC-to-SDP transformation.

Some questions:

  • I assume this is about using RTP clients that could be command line tools or libraries written in whichever language, so a client side JS library is not the way to go. Right?

  • If so, I think that a server-side Node library may do the job. This is, a library that runs into the serever side Node app in which mediasoup runs (but it’s not part of mediasoup).

  • Such a library just supports Unified Plan.

  • Such a library simulates to be a “SDP endpoint” but just for sending RTP or receiving RTP (and not both), for simplicity, ok?

  • When RTP traffic is from client to server, such a library must provide an API that is provided with the client’s SDP offer and creates a mediasoup WebRtcTransport and as many Producers and those indicated in the SDP offer.

    • Then it creates a suitable SDP answer.
    • The API must allow passing a new SDP re-offer from client side (i.e. to add a new sending media section or disable/stop an existing one).
  • When RTP traffic is from server to client, such a library must provide an API that is provided with all Producers to consume and it will create a WebRtcTransport and the corresponding Consumers, and will generate a SDP offer and expect a SDP answer from the remote client.

    • It will happily ignore such a SDP answer BTW, because we don’t need it for anything.
    • And here the problem: in mediasoup side we need to know WHAT WE CAN SEND to the remote endpoint before we start sending it to him, that’s why we need the remote endpoint’s RtpCapabilities, but those RtpCapabilities must have some codec payload types and header extensions ids as those in router.rtpCapabilities (see related doc).
    • So there is work to be done here, but IMHO it could be done by providing this server side library with a client full SDP (no matter pts and ids do not match) and doing some magic within the library as we do within mediasoup-client).
2 Likes