Admin: Postmortem
From Resin 3.0
(New page: Category: Admin When Resin restarts or crashes for an unknown reason, you can use the postmortem analysis to get a better understanding of what was happening in the server just before...) |
|||
(One intermediate revision by one user not shown) | |||
Line 11: | Line 11: | ||
== HTTP Request Count == | == HTTP Request Count == | ||
− | The HTTP request count measures the number of HTTP requests just before the restart. Each count is a the number of requests per minute. An unusually high number of requests might indicate a security attack or a load-balancer failure. For example, the load balancer directing too many requests to this server. | + | The HTTP request count measures the number of HTTP requests just before the restart. Each count is a the number of requests per minute. |
+ | |||
+ | An unusually high number of requests might indicate a security attack or a load-balancer failure. For example, the load balancer directing too many requests to this server. | ||
A low count might be an indication of a deadlock. | A low count might be an indication of a deadlock. | ||
+ | |||
+ | == HTTP Request Time == | ||
+ | |||
+ | The HTTP request time shows the average time in milliseconds for all the HTTP requests in the last minute. | ||
+ | |||
+ | An unexpected spike higher in the request time might indicate a reason for the restart, for example, threads getting into a loop. | ||
+ | |||
+ | == Database Query Active == | ||
+ | |||
+ | Measures the total number of active database queries at the end of the measurement period. | ||
+ | |||
+ | If the number of active queries is much higher before the restart, it may indicate a blockage or overload of the database. | ||
+ | |||
+ | == Database Query Time == | ||
+ | |||
+ | Measures the average time of database queries across the measurement period. | ||
+ | |||
+ | If the average time is much higher before the restart, it may indicate a database freeze or overload situation. | ||
+ | |||
+ | == Thread == | ||
+ | |||
+ | Shows the active thread count and state of the Resin thread pool. | ||
+ | |||
+ | If the threads are much higher than normal, it may indicate a deadlock or other locking problem. | ||
+ | |||
+ | If the Thread Idle Count goes to zero, then Resin's thread pool has become empty and Resin cannot dispatch threads or requests. This can either be a configuration issue, or related to a very high active thread count. | ||
+ | |||
+ | == CPU == | ||
+ | |||
+ | Shows the CPU usage on systems where Resin can measure it. | ||
+ | |||
+ | If the average CPU is high, or if any individual CPU is at 100%, that may indicate a thread or process that's gone out of control. | ||
+ | |||
+ | == Memory == | ||
+ | |||
+ | Shows the heap memory usage in the JVM. | ||
+ | |||
+ | If the free memory goes to zero, either heap or PermGen, then the restart may have been due to an out of memory condition. | ||
+ | |||
+ | == GC Time == | ||
+ | |||
+ | Shows the time in milliseconds taken by the garbage collector for each 60s measurement interval. | ||
+ | |||
+ | If the GC time is high (in the seconds range), the server may be running out of memory. Normally this will also show up in the memory graph. | ||
+ | |||
+ | == Log Messages == | ||
+ | |||
+ | The postmortem displays the last log messages at the warning level or higher. The logs may show information about why the server restarted. |
Latest revision as of 22:48, 28 September 2010
When Resin restarts or crashes for an unknown reason, you can use the postmortem analysis to get a better understanding of what was happening in the server just before the crash.
Contents |
[edit] Health
Resin's internal health check is graphed for the hour before the crash. The values are reported as numbers: OK is zero, Warning is one, and Fail is two. If any health check is two before the restart, then Resin restarted to protect itself from a fatal error.
If the graph shows zero for the whole time period, then Resin itself did not detect any problems just before the crash.
[edit] HTTP Request Count
The HTTP request count measures the number of HTTP requests just before the restart. Each count is a the number of requests per minute.
An unusually high number of requests might indicate a security attack or a load-balancer failure. For example, the load balancer directing too many requests to this server.
A low count might be an indication of a deadlock.
[edit] HTTP Request Time
The HTTP request time shows the average time in milliseconds for all the HTTP requests in the last minute.
An unexpected spike higher in the request time might indicate a reason for the restart, for example, threads getting into a loop.
[edit] Database Query Active
Measures the total number of active database queries at the end of the measurement period.
If the number of active queries is much higher before the restart, it may indicate a blockage or overload of the database.
[edit] Database Query Time
Measures the average time of database queries across the measurement period.
If the average time is much higher before the restart, it may indicate a database freeze or overload situation.
[edit] Thread
Shows the active thread count and state of the Resin thread pool.
If the threads are much higher than normal, it may indicate a deadlock or other locking problem.
If the Thread Idle Count goes to zero, then Resin's thread pool has become empty and Resin cannot dispatch threads or requests. This can either be a configuration issue, or related to a very high active thread count.
[edit] CPU
Shows the CPU usage on systems where Resin can measure it.
If the average CPU is high, or if any individual CPU is at 100%, that may indicate a thread or process that's gone out of control.
[edit] Memory
Shows the heap memory usage in the JVM.
If the free memory goes to zero, either heap or PermGen, then the restart may have been due to an out of memory condition.
[edit] GC Time
Shows the time in milliseconds taken by the garbage collector for each 60s measurement interval.
If the GC time is high (in the seconds range), the server may be running out of memory. Normally this will also show up in the memory graph.
[edit] Log Messages
The postmortem displays the last log messages at the warning level or higher. The logs may show information about why the server restarted.