Tuning (JVM)

From Resin 3.0

Jump to: navigation, search

<document>

 <header>
   <product>resin</product>
   <title>Performance Tuning</title>
 </header>
 <body>
   <localtoc/>

<s1 title="Resin Threads">

Resin will automatically allocate and free threads as the load requires. Since the threads are pooled, Resin can reuse old threads without the performance penalty of creating and destroying the threads. When the load drops, Resin will slowly decrease the number of threads in the pool until is matches the load.

Most users can set thread-max to something large (200 or greater) and then forget about the threading. Some ISPs dedicate a JVM per user and have many JVMs on the same machine. In that case, it may make sense to reduce the thread-max to throttle the requests.

Since each servlet request gets its own thread, thread-max determines the maximum number of concurrent users. So if you have a peak of 100 users with slow modems downloading a large file, you'll need a thread-max of at least 100. The number of concurrent users is unrelated to the number of active sessions. Unless the user is actively downloading, he doesn't need a thread (except for "keepalives").

</s1>

<s1 title="Keepalives">

Keepalives make HTTP and srun requests more efficient. Connecting to a TCP server is relatively expensive. The client and server need to send several packets back and forth to establish the connection before the first data can go through. HTTP/1.1 introduced a protocol to keep the connection open for more requests. The srun protocol between Resin and the web server plugin also uses keepalives. By keeping the connection open for following requests, Resin can improve performance.

<example title="resin.xml for thread-keepalive"> <resin xmlns="http://caucho.com/ns/resin">

 <cluster id="app-tier">
   <server-default>
     <thread-max>250</thread-max>
     <keepalive-max>500</keepalive-max>
     <keepalive-timeout>120s</keepalive-timeout>
   </server-default>
   ...
 </cluster>

</resin> </example>

<s2 title="Timeouts">

Requests and keepalive connections can only be idle for a limited time before Resin closes them. Each connection has a read timeout, request-timeout. If the client doesn't send a request within the timeout, Resin will close the TCP socket. The timeout prevents idle clients from hogging Resin resources.

<example> <resin xmlns="http://caucho.com/ns/resin">

 <cluster id="app-tier">
   <server-default>
     <thread-max>250</thread-max>
     <socket-timeout>30s</socket-timeout>
     <http port="8080"/>
   </server-default>
   ...
 </cluster>

</resin> </example>

<example> <resin xmlns="http://caucho.com/ns/resin">

 <cluster id="app-tier">
   <server-default>
     <thread-max>250</thread-max>
     <load-balance-idle-time>20s</load-balance-idle-time>
     <socket-timeout>30s</socket-timeout>
     <http port="8080"/>
   </server-default>
    <server id="app-a" address="192.168.2.1" port="6802"/>
    ...
  <cluster>
<resin>

</example>

In general, the socket-timeout and keepalives are less important for Resin standalone configurations than Apache/IIS/srun configurations. Very heavy traffic sites may want to reduce the timeout for Resin standalone.

Since socket-timeout will close srun connections, its setting needs to take into consideration the <<a href="server-tags.xtp#load-balance-idle-time">load-balance-idle-time</a> setting for mod_caucho or isapi_srun. load-balance-idle-time is the time the plugin will keep a connection open. socket-timeout must always be larger than load-balance-idle-time, otherwise the plugin will try to reuse a closed socket.

</s2>

<s2 title="Plugin keepalives (mod_caucho/isapi_srun)">

The web server plugin, mod_caucho, needs configuration for its keepalive handling because requests are handled differently in the web server. Until the web server sends a request to Resin, it can't tell if Resin has closed the other end of the socket. If the JVM has restarted or if closed the socket because of socket-timeout, mod_caucho will not know about the closed socket. So mod_caucho needs to know how long to consider a connection reusable before closing it. load-balance-idle-time tells the plugin how long it should consider a socket usable.

Because the plugin isn't signalled when Resin closes the socket, the socket will remain half-closed until the next web server request. A netstat will show that as a bunch of sockets in the FIN_WAIT_2 state. With Apache, there doesn't appear to be a good way around this. If these become a problem, you can increase socket-timeout and load-balance-idle-time so the JVM won't close the keepalive connections as fast.

<example> unix> netstat ... localhost.32823 localhost.6802 32768 0 32768 0 CLOSE_WAIT localhost.6802 localhost.32823 32768 0 32768 0 FIN_WAIT_2 localhost.32824 localhost.6802 32768 0 32768 0 CLOSE_WAIT localhost.6802 localhost.32824 32768 0 32768 0 FIN_WAIT_2 ... </example>

</s2>

<s2 title="TCP limits (TIME_WAIT)">

A client and a server that open a large number of TCP connections can run into operating system/TCP limits. If mod_caucho isn't configured properly, it can use too many connections to Resin. When the limit is reached, mod_caucho will report "can't connect" errors until a timeout is reached. Load testing or benchmarking can run into the same limits, causing apparent connection failures even though the Resin process is running fine.

The TCP limit is the TIME_WAIT timeout. When the TCP socket closes, the side starting the close puts the socket into the TIME_WAIT state. A netstat will short the sockets in the TIME_WAIT state. The following shows an example of the TIME_WAIT sockets generated while benchmarking. Each client connection has a unique ephemeral port and the server always uses its public port:

<example title="Typical Benchmarking Netstat"> unix> netstat ... tcp 0 0 localhost:25033 localhost:8080 TIME_WAIT tcp 0 0 localhost:25032 localhost:8080 TIME_WAIT tcp 0 0 localhost:25031 localhost:8080 TIME_WAIT tcp 0 0 localhost:25030 localhost:8080 TIME_WAIT tcp 0 0 localhost:25029 localhost:8080 TIME_WAIT tcp 0 0 localhost:25028 localhost:8080 TIME_WAIT ... </example>

The socket will remain in the TIME_WAIT state for a system-dependent time, generally 120 seconds, but usually configurable. Since there are less than 32k ephemeral socket available to the client, the client will eventually run out and start seeing connection failures. On some operating systems, including RedHat Linux, the default limit is only 4k sockets. The full 32k sockets with a 120 second timeout limits the number of connections to about 250 connections per second.

If mod_caucho or isapi_srun are misconfigured, they can use too many connections and run into the TIME_WAIT limits. Using keepalives effectively avoids this problem. Since keepalive connections are reused, they won't go into the TIME_WAIT state until they're finally closed. A site can maximize the keepalives by setting thread-keepalive large and setting live-time and request-timeout to large values. thread-keepalive limits the maximum number of keepalive connections. live-time and request-timeout will configure how long the connection will be reused.

<example title="Configuration for a medium-loaded Apache"> <resin xmlns="http://caucho.com/ns/resin">

 <cluster id="app-tier">
   <server-default>
     <thread-max>250</thread-max>
     <keepalive-max>250</keepalive-max>
     <keepalive-timeout>120s</keepalive-timeout>
     <load-balance-idle-time>100s</load-balance-idle-time>
   </server-default>
   <server id="app-a" address="192.168.2.1"/>
   ...
 </cluster>

</resin> </example>

socket-timeout must always be larger than load-balance-idle-time. In addition, keepalive-max should be larger than the maximum number of Apache processes.

</s2>

<s2 title="Apache 1.3 issues">

Using Apache as a web server on Unix introduces a number of issues because Apache uses a process model instead of a threading model. The Apache processes don't share the keepalive srun connections. Each process has its own connection to Resin. In contrast, IIS uses a threaded model so it can share Resin connections between the threads. The Apache process model means Apache needs more connections to Resin than a threaded model would.

In other words, the keepalive and TIME_WAIT issues mentioned above are particularly important for Apache web servers. It's a good idea to use netstat to check that a loaded Apache web server isn't running out of keepalive connections and running into TIME_WAIT problems.

</s2>

</s1>

 </body>

</document>

<document> <header> <product>resin</product> <title>JVM Tuning</title> <description>

Better performance in production servers is possible with proper configuration of JVM parameters, particularily those related to memory usage and garbage collection.

</description> </header>

<body> <summary/>


<s1 name="memory" title="Heap size">

The allocation of memory for the JVM is specified using -X options when starting Resin (the exact options may depend upon the JVM that you are using, the examples here are for the Sun JVM).

<deftable> <tr>

 <th>JVM option passed to Resin</th>
 <th>Meaning</th>

</tr> <tr>

 <td>-Xms</td>
 <td>initial java heap size</td>

</tr> <tr>

 <td>-Xmx</td><td>maximum java heap size</td>

</tr> <tr>

 <td>-Xmn</td>
 <td>the size of the heap for the young generation</td>
 </tr>
 </deftable>

<example title="Example: resin.xml startup with heap memory options"> <resin xmlns="http://caucho.com/ns/resin"> <cluster id="">

 <server id="" address="127.0.0.1" port="6800">
   <jvm-arg>-Xmx500M</jvm-arg>
   <jvm-arg>-Xms500M</jvm-arg>
   <jvm-arg>-Xmn100M</jvm-arg>
   <http port="80"/>
 </server>
 ...

</cluster> </resin> </example>

It is good practice with server-side Java applications like Resin to set the minimum -Xms and maximum -Xmx heap sizes to the same value.

For efficient <a href="#garbage-collection">garbage collection</a>, the -Xmn value should be lower than the -Xmx value.

<s2 title="Heap size does not determine the amount of memory your process uses">

If you monitor your java process with an OS tool like top or taskmanager, you may see the amount of memory you use exceed the amount you have specified for -Xmx. -Xmx limits the java heap size, java will allocate memory for other things, including a stack for each thread. It is not unusual for the total memory consumption of the VM to exceed the value of -Xmx.

</s2>

</s1>

<s1 name="garbage-collection" title="Garbage collection">


(thanks to Rob Lockstone for his comments)

There are essentially two GC threads running. One is a very lightweight thread which does "little" collections primarily on the Eden (a.k.a. Young) generation of the heap. The other is the Full GC thread which traverses the entire heap when there is not enough memory left to allocate space for objects which get promoted from the Eden to the older generation(s).


If there is a memory leak or inadequate heap allocated, eventually the older generation will start to run out of room causing the Full GC thread to run (nearly) continuously. Since this process "stops the world", Resin won't be able to respond to requests and they'll start to back up.


The amount allocated for the Eden generation is the value specified with -Xmn. The amount allocated for the older generation is the value of -Xmx minus the -Xmn. Generally, you don't want the Eden to be too big or it will take too long for the GC to look through it for space that can be reclaimed.


See also:

</s1>

<s1 name="stack-size" title="Stack size">

Each thread in the VM get's a stack. The stack size will limit the number of threads that you can have, too big of a stack size and you will run out of memory as each thread is allocated more memory than it needs.

The Resin startup scripts (resin.exe on Windows, resin.sh on Unix) will set the stack size to 2048k, unless it is specified explicity. 2048k is an appropriate value for most situations.

<deftable title="Stack configuration"> <tr>

 <th><jvm-arg></th>
 <th>Meaning</th>

</tr> <tr>

 <td>-Xss</td>
 <td>the stack size for each thread</td>

</tr> </deftable>

-Xss determines the size of the stack: -Xss1024k. If the stack space is too small, eventually you will see an exception <a href="javadoc|java.lang.StackOverflowError|"/>.

Some people have reported that it is necessary to change stack size settings

at the OS level for Linux. A call to ulimit may be necessary, and

is usually done with a command in /etc/profile:

<example title="Limit thread stack size on Linux"> unix> ulimit -s 2048 </example> </s1>

<s1 name="monitor" title="Monitoring the JVM">

JDK 5 includes a number of tools that are useful for monitoring the JVM. Documentation for these tools is available from the <a href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/index.html#manage">Sun website</a>. For JDK's prior to 5, Sun provides the <a href="http://developer.sun.com/dev/coolstuff/jvmstat">jvmstat tools</a>.

The most useful tool is jconsole. Details on using jconsole are provided in the <a href="jmx.xtp#console">Administration</a> section of the Resin documentation.

<example title="Example: jconsole configuration"> <resin xmlns="http://caucho.com/ns/resin"> <cluster id="">

 <server-default>
   <jvm-arg>-Dcom.sun.management.jmxremote</jvm-arg>
 </server-default>
 <server id="" address="127.0.0.1" port="6800"/>
 ...

</cluster> </resin> </example>

<example title="Example: jconsole launching"> ... in another shell window ...

win> jconsole.exe unix> jconsole

Choose Resin's JVM from the "Local" list. </example>

jps and jstack are also useful, providing a quick command line method for obtaining stack traces of all current threads. Details on obtaining and interpreting stack traces is in the <a href="troubleshoot.xtp#thread-dump">Troubleshooting</a> section of the Resin documentation.

<example title="jps and jstack"> # jps 12903 Jps 20087 Resin # jstack 20087 Attaching to process ID 20087, please wait... Debugger attached successfully. Client compiler detected. JVM version is 1.5.0-beta2-b51 Thread 12691: (state = BLOCKED)

- java.lang.Object.wait(long) (Compiled frame; information may be imprecise)
- com.caucho.util.ThreadPool.runTasks() @bci=111, line=474 (Compiled frame)
- com.caucho.util.ThreadPool.run() @bci=85, line=423 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)


Thread 12689: (state = BLOCKED)

- java.lang.Object.wait(long) (Compiled frame; information may be imprecise)
- com.caucho.util.ThreadPool.runTasks() @bci=111, line=474 (Compiled frame)
- com.caucho.util.ThreadPool.run() @bci=85, line=423 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)

...

</example>

</s1>

</body> </document>

Personal tools