DTLS handshake failed with custom Certificate when server has non standard MTU

Hi all,
I’m experiencing this problem in a scenario where mediasoup is installed on a cloud server exposed on a public ip; client is on a private home network, connected to a wifi home router. Producers (started in the same client network and/or from other networks) are working normally, but when I try to start a consumer, the dtls transports fails to connect. Instead, the DTLS negotiation works when “role: client” (producer transport).
The problem is solved for now disabling UDP and enabling TCP in mediasoup settings. Maybe can it be related to some UDP packet loss/out-of-order issues?
Edit: the server network interface MTU = 9000.

2020-09-03T13:16:32.145Z mediasoup:Channel [pid:41] RTC::WebRtcTransport::MayRunDtlsTransport() | running DTLS transport in local role 'server'
2020-09-03T13:16:32.145Z mediasoup:Channel [pid:41] RTC::WebRtcTransport::OnDtlsTransportConnecting() | DTLS connecting
2020-09-03T13:16:32.146Z mediasoup:Channel [pid:41] RTC::DtlsTransport::Run() | running [role:server]
2020-09-03T13:16:32.146Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | DTLS handshake start
2020-09-03T13:16:32.146Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'before SSL initialization']
2020-09-03T13:16:32.146Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'before SSL initialization']
2020-09-03T13:16:32.175Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'before SSL initialization']
2020-09-03T13:16:32.175Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'SSLv3/TLS read client hello']
2020-09-03T13:16:32.175Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'SSLv3/TLS write server hello']
2020-09-03T13:16:32.175Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'SSLv3/TLS write certificate']
2020-09-03T13:16:32.178Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'SSLv3/TLS write key exchange']
2020-09-03T13:16:32.179Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'SSLv3/TLS write certificate request']
2020-09-03T13:16:32.179Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | [role:server, action:'SSLv3/TLS write server done']
2020-09-03T13:16:32.179Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:32.238Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:32.371Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:32.634Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:33.161Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:34.217Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:36.328Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:40.551Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:16:48.998Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:17:05.895Z mediasoup:Channel [pid:41] RTC::DtlsTransport::OnSslInfo() | role: server, waiting:'SSLv3/TLS write server done']
2020-09-03T13:17:39.684Z mediasoup:ERROR:Channel [pid:41 RTC::DtlsTransport::CheckStatus() | OpenSSL error [desc:'SSL status: SSL_ERROR_SSL', error:'error:1413E138:SSL routines:dtls1_check_timeout_num:read timeout expired']
2020-09-03T13:17:39.685Z mediasoup:WARN:Channel [pid:41] RTC::DtlsTransport::CheckStatus() | connection failed
2020-09-03T13:17:39.685Z mediasoup:WARN:Channel [pid:41] RTC::DtlsTransport::Reset() | resetting DTLS transport
2020-09-03T13:17:39.685Z mediasoup:WARN:Channel [pid:41] RTC::WebRtcTransport::OnDtlsTransportFailed() | DTLS failed
2020-09-03T13:18:39.686Z mediasoup:WARN:Channel [pid:41] RTC::WebRtcTransport::OnDtlsDataReceived() | Transport is not 'connecting' or 'connected', ignoring received DTLS data

Edit:
if I change the server MTU to 1500 I get this error with tcpdump:
UDP, bad length 1491 > 1472
If I set MTU = 1472, the problem seems fixed.

Maybe related to https://github.com/versatica/mediasoup/pull/217 ?

@vpalmisano,

DTLS MTU is set to 1200 bytes in mediasoup. Do you confirm that setting the MTU to 1472 in the server OOSS fixes the problem?

I’ve experienced in the past too big DTLS messages being filtered by some home routers, as for example when using signed certificates which are way bigger than self signed ones.

I see MTU=1350 here: https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/DtlsTransport.cpp#L64

When I set the server MTU to a value < 1494, from tcpdump output I see always some UDP packets with size in the range [1466, 1491], larger than 1350. In this case the DTLS connection works for some reason. When the server MTU >= 1494, the DTLS connection fails.

The home connection has MTU=1492

I see MTU=1350 here: https://github.com/versatica/mediasoup/blob/v3/worker/src/RTC/DtlsTransport.cpp#L64

Correct.

It somehow makes sense that if the server MTU is bigger than the one of the client, then the server may send the client packets without fragmenting fragmentation which the second cannot handle.

Are those [1466, 1491] range packets generated by mediasoup? can you paste here the wireshark/tcpdump extract so we can see which network layer takes which size?

Yes, these are the captured packets at server side (I can send you the complete capture files if you need them):

MTU=1500

1	0.000000	192.168.121.13	XX.XX.XX.XX	STUN	106	Binding Success Response XOR-MAPPED-ADDRESS: XX.XX.XX.XX:20057
2	0.031432	192.168.121.13	XX.XX.XX.XX	IPv4	1514	Fragmented IP protocol (proto=UDP 17, off=0, ID=9edd) [Reassembled in #3]
3	0.031457	192.168.121.13	XX.XX.XX.XX	DTLSv1.2	53	Server Hello, Certificate, Server Key Exchange (Fragment), Server Key Exchange (Reassembled), Certificate Request, Server Hello Done
UDP Length: 1499 (Reassembled)

4	0.046932	192.168.121.13	XX.XX.XX.XX	STUN	106	Binding Success Response XOR-MAPPED-ADDRESS: XX.XX.XX.XX:20057
5	0.094880	192.168.121.13	XX.XX.XX.XX	STUN	106	Binding Success Response XOR-MAPPED-ADDRESS: XX.XX.XX.XX:20057
6	0.131307	192.168.121.13	XX.XX.XX.XX	DTLSv1.2	1508	Server Hello, Certificate, Server Key Exchange, Certificate Request, Server Hello Done
UDP Length: 1474

MTU=1493

1	0.000000	192.168.121.13	XX.XX.XX.XX	STUN	106	Binding Success Response XOR-MAPPED-ADDRESS: XX.XX.XX.XX:44322
2	0.032396	192.168.121.13	XX.XX.XX.XX	IPv4	1506	Fragmented IP protocol (proto=UDP 17, off=0, ID=678e) [Reassembled in #3]
3	0.032412	192.168.121.13	XX.XX.XX.XX	DTLSv1.2	61	Server Hello, Certificate, Server Key Exchange (Fragment), Server Key Exchange (Reassembled), Certificate Request, Server Hello Done
UDP Length: 1499 (Reassembled)

4	0.046168	192.168.121.13	XX.XX.XX.XX	STUN	106	Binding Success Response XOR-MAPPED-ADDRESS: XX.XX.XX.XX:44322
5	0.093840	192.168.121.13	XX.XX.XX.XX	STUN	106	Binding Success Response XOR-MAPPED-ADDRESS: XX.XX.XX.XX:44322
6	0.131799	192.168.121.13	XX.XX.XX.XX	IPv4	1506	Fragmented IP protocol (proto=UDP 17, off=0, ID=679d) [Reassembled in #7]
7	0.131855	192.168.121.13	XX.XX.XX.XX	DTLSv1.2	36	Server Hello, Certificate, Server Key Exchange, Certificate Request, Server Hello Done
UDP Length: 1474 (Reassembled)

8	0.168816	192.168.121.13	XX.XX.XX.XX	DTLSv1.2	109	Change Cipher Spec, Encrypted Handshake Message
9	0.169521	192.168.121.13	XX.XX.XX.XX	DTLSv1.2	171	Application Data
10	0.172032	192.168.121.13	XX.XX.XX.XX	UDP	176	37191 → 44322 Len=134
11	0.192268	192.168.121.13	XX.XX.XX.XX	UDP	176	37191 → 44322 Len=134
12	0.212764	192.168.121.13	XX.XX.XX.XX	UDP	167	37191 → 44322 Len=125
13	0.224084	192.168.121.13	XX.XX.XX.XX	UDP	1250	37191 → 44322 Len=1208
14	0.224218	192.168.121.13	XX.XX.XX.XX	UDP	1251	37191 → 44322 Len=1209

One question, are you using your own TLS certificates or letting mediasoup create it’s own?

I’m using self-signed certificates generated with openssl.

I tried to modify the DtlsMtu used by SSL_set_mtu and DTLS_set_link_mtu, but I found no change in the sent UDP payload lengths sent by the server.
Setting the SSL_OP_NO_QUERY_MTU flag you are disabling the openssl fragmentation mechanism, so what happens if the dtls payload is larger than DtlsMtu?

Setting the SSL_OP_NO_QUERY_MTU flag you are disabling the openssl fragmentation mechanism, so what happens if the dtls payload is larger than DtlsMtu ?

As it seems, it prevails the NO fragmentation setting.

Could you please try with mediasoup generated certificate and let us know? Chances are that the certificate is too big.

Mediasoup generates a certificate?

I’m using this command:

openssl genrsa -out certs/privkey.pem
openssl req -new -key certs/privkey.pem -out certs/server.csr \
    -subj "/C=US/ST=State/L=Location/O=Org/OU=org/CN=name/emailAddress=info@name"
openssl x509 -req -days 365 -in certs/server.csr -signkey certs/privkey.pem -out certs/fullchain.pem

Certificate file is not mandatory, it’s created by mediasoup worker if not provided:

It worked, also for MTU=9000.
I’m looking at the `GenerateCertificateAndPrivateKey’ method to understand what option was wrong in my openssl command line.
Thanks!

1 Like