Mediasoup worker died, exiting in 2 seconds...

System Information:

  • OS: Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-105-generic x86_64)
  • Mediasoup Version: 3.14.5
  • Mediasoup Client Version: 3.7.7
  • Compiler: gcc/g++/c++ 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
  • Node.js Version: v18.17.1
  • Python Version: 3.10.12

Hello,

Since updating to Mediasoup version 3.14.5, we have been encountering frequent instances where a Mediasoup worker terminates unexpectedly, leading to the message Mediasoup worker died, exiting in 2 seconds.... We have attempted to follow the troubleshooting steps outlined in this guide, but have not been successful in generating any core dumps in the designated folder. Additionally, manually forcing the termination does not produce a dump either.

It’s worth noting that we don't utilize the WebRTCServer option with workers.
Has anyone else experienced a similar issue, and does anyone have any insights into how we might generate the crash dump?
Thank you.

Please make core dumps work and then report an issue in GH. Without core dump we cannot do anything.

Apologies for the oversight. It appears that I forgot to include the mounting of the /tmp/cores folder in the docker-compose.yml file. Once I receive the updated crash report, I’ll ensure to promptly attach all relevant files. Thank you.

Hi there,

I’ve obtained the core dump of the worker. While I’m not an expert in GDB, I’ve tried to use it within a Docker container to analyze a core dump file located inside that container.

Here’s the output:

/src# gdb /src/node_modules/mediasoup/worker/out/Release/mediasoup-worker /tmp/cores/core.mediasoup-worke.sig11.27
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /src/node_modules/mediasoup/worker/out/Release/mediasoup-worker...
(No debugging symbols found in /src/node_modules/mediasoup/worker/out/Release/mediasoup-worker)

warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing

warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing

warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing

warning: Can't open file anon_inode:[io_uring] which was expanded to anon_inode:[io_uring] during file-backed mapping note processing
[New LWP 27]
[New LWP 28]
[New LWP 29]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/src/node_modules/mediasoup/worker/out/Release/mediasoup-worker --logLevel=debu'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005594a79ec95d in non-virtual thunk to RTC::WebRtcTransport::OnIceServerTupleRemoved(RTC::IceServer const*, RTC::TransportTuple*) ()
[Current thread is 1 (Thread 0x7ff57e279440 (LWP 27))]
(gdb) bt full
#0  0x00005594a79ec95d in non-virtual thunk to RTC::WebRtcTransport::OnIceServerTupleRemoved(RTC::IceServer const*, RTC::TransportTuple*) ()
No symbol table info available.
#1  0x00005594a791a4f2 in RTC::IceServer::OnTimer(TimerHandle*) ()
No symbol table info available.
#2  0x00005594a7cd5fc0 in uv.run_timers ()
No symbol table info available.
#3  0x00005594a7cd9cf8 in uv_run ()
No symbol table info available.
#4  0x00005594a789b959 in DepLibUV::RunLoop() ()
No symbol table info available.
#5  0x00005594a78a659a in Worker::Worker(Channel::ChannelSocket*) ()
No symbol table info available.
#6  0x00005594a789a9f4 in mediasoup_worker_run ()
No symbol table info available.
#7  0x00005594a7899b3c in main ()
No symbol table info available.

Not sure what debugging symbols were available for the worker, but the crash occurred in the RTC::WebRtcTransport::OnIceServerTupleRemoved function. This could be due to a null pointer dereference or another memory-related issue, possibly related to handling ICE server events or timers within the mediasoup worker?

Can you please fill an issue in mediasoup GitHub with context, that core dump and steps to reproduce if possible?

Also, if you could build mediasoup with debug symbols that will be helpful:

cd worker; MEDIASOUP_BUILDTYPE='Debug' make

Then use the same env variable when starting the service using mediasoup, ie: MEDIASOUP_BUILDTYPE='Debug' npm start

1 Like

It looks like after this commit IceServer::OnTimer may end up calling IceServer::RemoveTuple, in the same way IceServer::~IceServer does.

1 Like

Sure, Replicating it is the true challenge. I’ll try…

Thank you for the recommendations!
I’ve taken the steps to rebuild the worker and have made sure to enforce its usage by setting MEDIASOUP_WORKER_BIN=‘/root/mediasoup/worker/out/Debug/mediasoup-worker’.
I’ll keep you updated with the results once the issue resurfaces.

The issue has been created as requested here. Thank you!