High memory usage over time (even with jemalloc)

Paul · April 12, 2024, 12:28am

I have a chat (using datachannels) and voip application and I’m noticing increasing memory usage over time from mediasoup. I notice this behaviour with the default allocator and also with jemalloc. Some of my applications are getting oom killed with 8Gb allocated.

Using the default allocator and heaptrack, I ran one server for 45 min, and then marked it offline so that all users would disconnect. This is the flamegraph view from heaptrack of the remaining allocations after all users had disconnected:

Using jemalloc’s profiling, I created a dump of the memory in use on a server that had been running for almost a day and was at 90% memory usage. I similarly marked the server offline so that all active users would disconnect and this is the flamegraph of the remaining allocations:

These show there is a lot of memory in use that was allocated from onAlloc inside of uv_read, but I’m not sure how to get more information.

Does anyone have any tips on how to further debug this?
I’m not familiar with how libuv is used in mediasoup - is it used only for sctp stuff or also rtp? We do allow for some large messages over datachannels so I’m wondering if this could be a result of some very large buffers that don’t get cleaned up

Thanks

snnz · April 12, 2024, 9:27am

It seems to be a Rust version, and you must be creating a lot of Workers. Each Worker allocates only one buffer of this kind, about 4 MB long. It can leak only if Worker’s thread exits incorrectly somehow (without calling the destructors) or does not exit at all.

Paul · April 12, 2024, 3:56pm

Yes, this is using the rust version. We only create 1 worker at application startup

snnz · April 12, 2024, 5:34pm

Well, the graph shows a call stack. Is “176.4MB leaked in total” at the bottom related to this call stack? How many times this call happened? If 1 Worker was created, then one worker thread was created, mediasoup_worker_run was called once in this thread, and read buffer should have been allocated only once - 4 MB, not 176 MB.

Paul · April 12, 2024, 5:56pm

The “176.4MB leaked in total” is the amount leaked at the current reading. 164.4MB of that was allocated in onAlloc. These flamegraphs don’t show the number of times a call happened, only the amount of memory allocated at the time of viewing. The width of each bar is the % of the total (176.4MB in that particular one).

I’m quite confident there’s only one worker because if we had multiple we’d start running into this bug again

jason-shen · April 27, 2024, 12:10am

looks like you not cleaning up after peers left, you need to clean everything you track, rust don’t have garbage collection

Topic		Replies	Views
Need help testing significant memory optimizations mediasoup libraries	5	1007	January 6, 2022
Was there any noted memory leak between version 3.10.8 and recent? Off Topic	2	213	December 19, 2023
Possible memory leak mediasoup libraries	5	1133	February 4, 2020
Mediasoup has memory leak in v3.9.3 mediasoup libraries	5	856	October 14, 2022
mediasoup worker memory do not released Deployment & Scalability	3	354	December 19, 2023

High memory usage over time (even with jemalloc)

Related topics