Creating and Analyzing large heap dumps

I recently needed to investigate a possible memory leak on a JBoss container running on a 64-bit JVM on Red Hat linux enterprise 5.  The Java heap size was set to 8 GB, JRE version 1.5.0_15 and JBoss version 4.0.3SP1.  The process generally took about two weeks before it ran out of memory.

To create the heap dump I ran the following command (as root):
jmap -heap:format=b <PID>

Unfortunately, this did not work, error was:

Attaching to process ID 9660, please wait...
Exception in thread "main" java.lang.UnsatisfiedLinkError: no saproc in java.library.path
        at java.lang.ClassLoader.loadLibrary(Unknown Source)
        at java.lang.Runtime.loadLibrary0(Unknown Source)
        at java.lang.System.loadLibrary(Unknown Source)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.<clinit>(LinuxDebuggerLocal.java:563)
        at sun.jvm.hotspot.bugspot.BugSpotAgent.setupDebuggerLinux(BugSpotAgent.java:790)
        at sun.jvm.hotspot.bugspot.BugSpotAgent.setupDebugger(BugSpotAgent.java:500)
        at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:475)
        at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:314)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:146)
        at sun.jvm.hotspot.tools.JMap.main(JMap.java:126)

Did some searching around and found that the issue is JRE 1.5 does not include the 64-bit library libsaproc.so.  This library is available in 1.6. So I downloaded and installed jdk 1.6_31 for 64-bit linux on a different machine, found the library in $JAVA_HOME/jre/lib/amd64/libsaproc.so

I copied this file to $JRE_HOME/lib/amd64 on the server and ran jmap successfully.

When jmap was run, top reported the memory used by the process was around 5 GB. The resulting heap.bin file was around the same size.

Because the heap dump is greater that 2 GB, a 64-bit machine with plenty of memory was required to open the heap dump file.  To determine were the process was leaking memory, several heap dumps were taken over a period of about 10 days.  Using Eclipse MAT's compare basket, I compared histograms of the heap files and found the source of the leak, someone was writing to a Collection and not removing entries when they were finished.

No comments: