I have a Java application which uses 10 threads. Each thread opens a Matlab session by using the Java Matlabcontrol library. I'm running the application on a cluster running CentOS 6.
The used physical memory (Max Memory) for the whole application is around 5GB (as expected) but the reserved physical memory (Max Swap) is around 80GB which is too high. Here a short description from the cluster wiki:
A note on terminology: in LSF the Max Swap is the memory allocated by
an application and the Max Memory is the memory physically used (i.e.,
it is actually written to). As such, Max Swap > Max Memory. In most
applications Max Swap is about 10–20% higher than Max Memory
I think the problem is Java (or perhaps a mix between Java and Matlab). Java tends to allocate about 50% of the physically available memory on a compute node by default. A java process assumes that it can use the entire resources available on the system that it is running on. That is also the reason why it starts several hundred threads (although my application only uses 11 threads). It sees 24 cores and lots of memory even though the batch system reserves only 11 core for the job.
Is there a workaround for this issue?
Edit: I've just found the following line in the Matlabcontrol documentation:
When running outside MATLAB, the proxy makes use of multiple
internally managed threads. When the proxy becomes disconnected from
MATLAB it notifies its disconnection listeners and then terminates all
threads it was using internally. A proxy may disconnect from MATLAB
without exiting MATLAB by calling disconnect().
This explains why there are a lot of threads created but it does not explain the high amount of reserved memory.
Edit2: Setting MALLOC_ARENA_MAX=4
environment variable brought the amount of reserved memory down to 30GB. What value of MALLOC_ARENA_MAX should I choose and are they other tuning possibilities?