Heap dump

From Resin 3.0

Revision as of 00:02, 21 February 2006 by Sam (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


If an application gobbles up memory until it throws an OutOfMemory exception, or seems to be spending an excessive amount of time doing garbage collection, a heap dump can help track down the source of the problem.

What you really need to do is get a good CPU and heap profile. The JDK has a simple one (but not very user-friendly), so you don't absolutely need to buy a profiler.

It's a standard java argument, so "java -Xrunhprof:help" will tell you how to start it. For example, you could start Resin as

unix$ bin/httpd.sh -J-Xrunhprof:heap=sites,cpu=samples
win> httpd.exe -J-Xrunhprof:heap=sites,cpu=samples

Run a load against it for a decent amount of time (you can even use something like Apache's "ab" for 10 minutes), and then stop the server nicely. i.e. you can't just kill it with Ctrl-C, but you need to make it do a clean exit.

That will dump a java.hprof.txt. Skip to the bottom to get the trace.

Understanding java.hprof.txt heap

Assuming you're being cheap, and using the JDK's heap profiling instead of buying a profiler, you'll need a bit of help interpreting it.

The following is an example of a heap dump running Resin. Here's what you might see skipping down to the "SITES BEGIN" section. (I did mention this is the cheap way of doing things.) As with most profiling, you only want to pay attention to the top 20. Everything else is in the noise. Ignore it.

SITES BEGIN (ordered by live bytes) Tue Jan  9 17:44:33 2001
          percent         live       alloc'ed  stack class
 rank   self  accum    bytes objs   bytes objs trace name
    1 11.87% 11.87%   983520  120 1393320  170  2553 [B
    2  9.89% 21.76%   819600  100 1286772  157  4070 [B
    3  9.09% 30.85%   753756   23 3539376  108  4970 [L<Unknown>;
    4  5.83% 36.68%   483564   59  778620   95  7180 [B
    5  5.74% 42.42%   475368   58  745836   91  7178 [B
    6  4.35% 46.77%   360624   44  696660   85  7182 [B
    7  2.97% 49.74%   245880   30  450780   55  7176 [B
    8  2.37% 52.11%   196704   24  352428   43  7254 [B
    9  1.88% 53.99%   155724   19  262272   32  7174 [B
   10  1.78% 55.77%   147528   18  245880   30  7137 [B
   11  1.53% 57.30%   126988 1063 16973092 129113  3271 [C
   12  1.34% 58.64%   110684 3953 20362832 727244  1213 sun/io/CharToByteISO8859_1
   13  1.25% 59.88%   103320  738  141820 1013  5942 java/lang/Class
   14  1.21% 61.10%   100548   49  221616  108  5003 [L<Unknown>;
   15  1.21% 62.31%   100548   49  221616  108  5005 [L<Unknown>;
   16  1.07% 63.38%    89080 1532 18393580 317347  1340 [B
   17  0.79% 64.18%    65568    8   81960   10  8408 [B
   18  0.79% 64.97%    65552    4   65552    4 27630 [C
   19  0.70% 65.67%    58232   24 1110128  386  5038 [C
   20  0.68% 66.35%    56200  450  116816  980  7186 [C

There are two things to look for. First, if any one class starts chewing up a large number in the "live objs" column, you'll need to take a look. That might be a memory leak.

Second, if some class has a huge number in the "alloc'ed objs" column, you may be wasting lots of garbage collection time that could be easily solved with some caching.

The [C in the class name means a character array. To see what that really means, you'll need to look at the stack trace (3271):

TRACE 3271:
     java/lang/String.<init>(String.java:244)
     com/caucho/util/CharBuffer.close(CharBuffer.java:714)
     com/caucho/vfs/FilesystemPath.normalizePath(FilesystemPath.java:162)
     com/caucho/vfs/FilesystemPath.schemeWalk(FilesystemPath.java:127)

That's part of Resin's VFS code. Maybe in the future it'll make sense to try to reduce that.

You can get a longer trace using the "depth=" argument:

unix$ bin/httpd.sh -J-Xrunhprof:heap=sites,cpu=samples,depth=10
win> httpd.exe -J-Xrunhprof:heap=sites,cpu=samples,depth=10

Understanding java.hprof.txt CPU

The CPU profiling is a little easier to read. On some JDK's you'll need to run it with the JIT disabled.

CPU SAMPLES BEGIN (total = 424614) Tue Jan  9 17:44:33 2001
rank   self  accum   count trace method
   1 21.36% 21.36%   90704  7266 com/caucho/server/http/VirtualHost.logAccess
   2 10.84% 32.20%   46041  7269 java/net/SocketInputStream.socketRead
   3  5.99% 38.19%   25428  1213 java/lang/Class.newInstance0
   4  5.11% 43.31%   21715  7896 com/caucho/util/CharBuffer.toString
   5  4.82% 48.13%   20463  1286 sun/io/CharToByteISO8859_1.convert
   6  3.54% 51.66%   15018  1242 sun/io/CharToByteConverter.<init>
   7  2.68% 54.35%   11388  7241 java/io/PrintWriter.<init>
   8  2.47% 56.82%   10508  7748 com/caucho/server/http/Request.fillCookies
   9  2.27% 59.09%    9650  1214 sun/io/ByteToCharConverter.<init>
  10  1.85% 60.94%    7857  5097 java/lang/String.<init>
  11  1.59% 62.53%    6754  1341 java/lang/String.substring
  12  1.57% 64.10%    6650  1340 java/lang/String.getBytes
  13  0.92% 65.02%    3907  7897 java/lang/String.<init>
  14  0.76% 65.78%    3227  3259 com/caucho/vfs/FilePath.fsWalk
  15  0.75% 66.53%    3195  7895 com/caucho/server/http/Request.fillCookie
  16  0.71% 67.25%    3031  7321 java/lang/String.getBytes
  17  0.71% 67.95%    2996  3270 com/caucho/util/CharBuffer.close
  18  0.68% 68.63%    2892  3271 java/lang/String.<init>
  19  0.66% 69.29%    2782  7318 com/caucho/vfs/FilePath.openWriteImpl
  20  0.61% 69.90%    2604  7320 java/io/FileOutputStream.<init>

You should only pay attention to the top 20 or so. You'll probably need to ignore a few of the top 10, because they're just waiting for a user response. The SocketInputStream.socketRead is an example.

You can use the trace number to get a call trace:

TRACE 7266:
      com/caucho/server/http/VirtualHost.logAccess(VirtualHost.java:487)
      com/caucho/server/http/Application.logAccess(Application.java:1846)
      com/caucho/server/http/Response.finish(Response.java:1345)
      com/caucho/server/http/Request.finish(Request.java:416)
Personal tools