Admin: Postmortem

From Resin 3.0

Jump to: navigation, search


When Resin restarts or crashes for an unknown reason, you can use the postmortem analysis to get a better understanding of what was happening in the server just before the crash.

Contents

Health

Resin's internal health check is graphed for the hour before the crash. The values are reported as numbers: OK is zero, Warning is one, and Fail is two. If any health check is two before the restart, then Resin restarted to protect itself from a fatal error.

If the graph shows zero for the whole time period, then Resin itself did not detect any problems just before the crash.

HTTP Request Count

The HTTP request count measures the number of HTTP requests just before the restart. Each count is a the number of requests per minute.

An unusually high number of requests might indicate a security attack or a load-balancer failure. For example, the load balancer directing too many requests to this server.

A low count might be an indication of a deadlock.

HTTP Request Time

The HTTP request time shows the average time in milliseconds for all the HTTP requests in the last minute.

An unexpected spike higher in the request time might indicate a reason for the restart, for example, threads getting into a loop.

Database Query Active

Measures the total number of active database queries at the end of the measurement period.

If the number of active queries is much higher before the restart, it may indicate a blockage or overload of the database.

Database Query Time

Measures the average time of database queries across the measurement period.

If the average time is much higher before the restart, it may indicate a database freeze or overload situation.

Thread

Shows the active thread count and state of the Resin thread pool.

If the threads are much higher than normal, it may indicate a deadlock or other locking problem.

If the Thread Idle Count goes to zero, then Resin's thread pool has become empty and Resin cannot dispatch threads or requests. This can either be a configuration issue, or related to a very high active thread count.

CPU

Shows the CPU usage on systems where Resin can measure it.

If the average CPU is high, or if any individual CPU is at 100%, that may indicate a thread or process that's gone out of control.

Memory

Shows the heap memory usage in the JVM.

If the free memory goes to zero, either heap or PermGen, then the restart may have been due to an out of memory condition.

GC Time

Shows the time in milliseconds taken by the garbage collector for each 60s measurement interval.

If the GC time is high (in the seconds range), the server may be running out of memory. Normally this will also show up in the memory graph.

Log Messages

The postmortem displays the last log messages at the warning level or higher. The logs may show information about why the server restarted.

Personal tools