Rarely crashes on SendSctpData when enableSctp is enabled

I was investigating the crash and found a code that I assume is a workaround for a similar problem, so I have a question about it.

In what cases has this destroying flag been effective? It appears that both the destructor of Transport and the destructor of onSendSctpData are called in one thread of Worker, is this correct? If so, in what case was onSendSctpData called during the Transport destructor?

If onSendSctpData is ever called during the destruction,Since this flag is set to true in the Transport destructor, the effect of this flag appears to be limited since the WebRTCTransport was destroyed prior to that time as noted in the comments.

Thanks for reading.

this.destroyed is a member of the parent Transport class so it still exists at this time. However the WebRtcTransport child class was destroyed so we should NOT call SendSctpData() anymore since it’s an abstract method only defined in child classes.

Do the following cases occur?

  1. WebRtcTransport destructor executes (delete dtlsTransport)
  2. onSendSctpData is called (destroying is still false but dtlsTransport or iceServer is null or
    freed address)
  3. Transport destructor executes (destroyed set to true)

Crashes without reaching 3

I cannot check it now but I wrote that code for a reason (to avoid a crash). Not sure if it’s crashing for you or what. Is it? I’ll check the flow you said next week.

This is what one would expect. Destructor of the derived class is called before the destructor of the base class. Either this or some other flag should be set before WebRtcTransport starts disposing its resources.

Upon further investigation of the code, I discovered an interesting problem.
This may only occur in the Rust version where there are multiple Workers in one process.

The thread that creates/executes the WebRTCTransport and the thread onSendSctpData may be different.
This checker is thread local, but is only executed on the first Worker thread created (since numSctpAssociations is a normal global variable).

Then, Transports created in the second Workers appear to be executed in the thread of the first Worker.
Perhaps if this could be called in the same Worker thread that created the Transport, the crash would be eliminated. However, the usrsctp interface is difficult, and I did not know if that method was possible.

I think we should fix the bug properly since IMHO it could indeed also happen in mediasoup Node. PR done here, please @satoren check it:

Thank you for fix crash.

However, I believe the Rust version still has the following problems.

  1. Create Worker1 and Transport1 (create DepUsrSCTP::Checker and into DepUsrSCTP::checker as thread local variable)
  2. Create Worker2 and Transport2
  3. Destroy Worker1 and Transport1

DepUsrSCTP::Checker is not deleted because numSctpAssociations is greater than 1, but the worker thread that runs it is stopped
And DepUsrSCTP::Checker leaks because there is no holding the pointer to checker

@satoren can you please report an issue for this in GH?

1 Like

Write a reproducible test and then create an GH issue.

Yes please. Thanks.