Admin: Graphs
From Resin 3.0
(9 intermediate revisions by one user not shown) | |||
Line 27: | Line 27: | ||
# Select all the meters in the "Resin|Thread" group. You should see a graph with about 4 lines visible and the rest at zero. | # Select all the meters in the "Resin|Thread" group. You should see a graph with about 4 lines visible and the rest at zero. | ||
# Type "threads" in the Meter Save Name form and select "Save Meters". Saving the meters will add "threads" as a predefined meter group in the Meters selection at the top. | # Type "threads" in the Meter Save Name form and select "Save Meters". Saving the meters will add "threads" as a predefined meter group in the Meters selection at the top. | ||
+ | |||
+ | [[Image: Graphs-threads.png]] | ||
= Meters = | = Meters = | ||
Line 39: | Line 41: | ||
The JVM compilation group measure JIT compilation times as reported by the JVM. | The JVM compilation group measure JIT compilation times as reported by the JVM. | ||
− | + | ; Compilation Time | |
+ | : the time taken for garbage collection in the last 60 seconds. | ||
== JVM|Memory == | == JVM|Memory == | ||
Line 45: | Line 48: | ||
The JVM's memory and garbage collection information is useful when tuning memory and checking for memory leak situations, and checking that GC time is in a reasonable range. | The JVM's memory and garbage collection information is useful when tuning memory and checking for memory leak situations, and checking that GC time is in a reasonable range. | ||
− | + | ; GC Time|PS MarkSweep | |
− | + | : the GC time taken in the last 60 seconds for full mark-sweep collection as reported by the JVM. | |
+ | ; GC Time|PS Scavenge | ||
+ | : the GC time taken for short GC scavenging as reported by the JVM. | ||
+ | ; Heap Memory Free | ||
+ | : free heap memory in bytes as reported by the JVM | ||
+ | ; Heap Memory Used | ||
+ | : total allocated memory in the heap | ||
+ | ; Loaded Classes | ||
+ | : total number of classes loaded by the JVM | ||
+ | ; PermGen Memory Free | ||
+ | : memory free in the "perm gen" group, used for .class data | ||
+ | ; PermGen Memory Used | ||
+ | : allocated memory in the perm gen pool. | ||
+ | ; Tenured Memory Free | ||
+ | : free memory in the long-term tenured heap | ||
+ | ; Tenured Memory Used | ||
+ | : allocated memory in the long-term tenured heap | ||
+ | |||
+ | == JVM|Thread == | ||
+ | |||
+ | The JVM's thread group reports the total number of threads in the JVM. | ||
+ | |||
+ | ; JVM Thread Count | ||
+ | : The total threads in the JVM | ||
+ | |||
+ | == OS|CPU == | ||
+ | |||
+ | The CPU load as reported by the JVM. This report is different for different operating systems. On Linux, the CPU is reported for each CPU and combined. | ||
+ | |||
+ | ; Unix Load Avg | ||
+ | : On Unix systems (non-Linux), reports the system's Load Average. The load average is the count of runnable processes; it's not directly a CPU load measure | ||
+ | |||
+ | == OS|Memory == | ||
+ | |||
+ | The OS|Memory group report operating system memory. | ||
+ | |||
+ | ; Physical Memory Free | ||
+ | : The physical free memory as reported in JMX. | ||
+ | ; Swap Free | ||
+ | : The free swap as reported in JMX. | ||
+ | |||
+ | == OS|Process == | ||
+ | |||
+ | Process-related information as reported by the OS. | ||
+ | |||
+ | ; File Descriptor Count | ||
+ | : The number of open files and sockets in the JVM process | ||
+ | |||
+ | == Resin|Cache == | ||
+ | |||
+ | The cache statistics include both the proxy cache and Resin's underlying block cache, which is also used for distributed sessions. | ||
+ | |||
+ | : Block Miss Count | ||
+ | ; How many times the low-level block cache missed, causing Resin to read or write from disk | ||
+ | : Block Read Count | ||
+ | ; The count of blocks read in the last 60 seconds. | ||
+ | : Block Write Count | ||
+ | ; How many blocks were written to disk in the last 60 seconds. | ||
+ | : Proxy Cache Hit Count | ||
+ | ; How many requests successfully used the proxy cache in the last 60 seconds. | ||
+ | ; Proxy Cache Miss Count | ||
+ | ; How many cacheable requests failed to find a valid page in the proxy cache in the last 60 seconds. | ||
+ | |||
+ | === Troubleshooting === | ||
+ | |||
+ | A high block read or write count may indicate that the block cache is too small. Since the purpose of the block cache is to reduce the slow filesystem reads and writes, high block reads and writes means Resin is spending more time reading and writing files. | ||
+ | |||
+ | == Resin|Cluster == | ||
+ | |||
+ | The Cluster group measures outgoing connections to other servers in the Resin system. This measurement is similar to the [[heartbeat]] since it counts cluster connections. | ||
+ | |||
+ | There are separate meters for each outgoing server. So server #2 will have data going to servers #0 and #1. | ||
+ | |||
+ | ; Connection Active|NN cluster-id | ||
+ | : Measure the current active connections from this server to a target server named by "NN:cluster-id" | ||
+ | ; Connection Count|NN cluster-id | ||
+ | : Counts the number of connections created in the last 60s from this server to the target server. | ||
+ | ; Idle Active|NN cluster-id | ||
+ | : Counts the current idle connections in the pool from this server to the target server. | ||
+ | ; Idle Count|NN cluster-id | ||
+ | : Counts the number of transitions to the idle state. | ||
+ | ; Request Active|NN cluster-id | ||
+ | : The current number of active requests from this server to the target server | ||
+ | ; Request Count|NN cluster-id | ||
+ | : The number of requests to the target server in the last 60s. | ||
+ | ; Request Fail|NN cluster-id | ||
+ | : The number of failed requests to the target server in the last 60s | ||
+ | ; Request Time|NN cluster-id | ||
+ | : The average request time for requests to the target server in the last 60s. | ||
+ | ; Request Time Max:NN cluster-id | ||
+ | : The longest request time for a request to the target server in the last 60s | ||
+ | ; Request Time 95%:NN cluster-id | ||
+ | : The time for 95% of requests to complete | ||
+ | |||
+ | == Resin|Database == | ||
+ | |||
+ | The data for the Resin database pool lets you tune the pool, and check for slow database queries. | ||
+ | |||
+ | ; Query Active | ||
+ | : Counts the current number of active queries. | ||
+ | ; Query Count | ||
+ | : Counts the queries in the last 60s | ||
+ | ; Query Time | ||
+ | : The average query time in the last 60s | ||
+ | ; Query Time Max | ||
+ | : The maximum query time in the last 60s | ||
+ | ; Query Time 95% | ||
+ | : The time for 95% of queries to complete | ||
+ | ; Connection Active | ||
+ | : The current number of active database connections | ||
+ | ; Connection Count | ||
+ | : The number of connections created in the last 60s | ||
+ | ; Connection Time | ||
+ | : The average open time for connections in the last 60s. | ||
+ | ; Idle Active | ||
+ | : The current number of idle connections in the database pool | ||
+ | ; Idle Count | ||
+ | : The number of connections changing to the idle state in the last 60s | ||
+ | ; Idle Time | ||
+ | : The average time connections are idle for the last 60s. | ||
+ | |||
+ | == Resin|Health == | ||
+ | |||
+ | Each health check in Resin's health system records its current status as a meter. Since the "OK" level is zero, a stable system has a zero graph. The warning level is 1 and the fail level is 2. | ||
+ | |||
+ | == Resin|Http == | ||
+ | |||
+ | HTTP requests and sessions are recorded in the Resin|Http section, letting you check for slow requests and unexpected HTTP session sizes. | ||
+ | |||
+ | ; Request Active | ||
+ | : The current number of active requests | ||
+ | ; Request Bytes | ||
+ | : The number of bytes transferred in the last 60s | ||
+ | ; Request Count | ||
+ | : The number of requests in the last 60s | ||
+ | ; Request Time | ||
+ | : The average time for a request in the last 60s | ||
+ | ; Request Time Max | ||
+ | : The slowest request in the last 60s | ||
+ | ; Request Time 95% | ||
+ | : The time for 95% of requests to complete | ||
+ | ; Session Save Count | ||
+ | : The number of sessions saved in the last 60s | ||
+ | ; Session Save Size | ||
+ | : The average serialized session size in the last 60s | ||
+ | |||
+ | == Resin|Thread == | ||
+ | |||
+ | Statistics related to the Resin thread pool, used for requests and timers. | ||
+ | |||
+ | ; Thread Active Count | ||
+ | : The number of threads currently active | ||
+ | ; Thread Count | ||
+ | : The total number of threads managed by Resin | ||
+ | ; Thread Create Count | ||
+ | : The threads created by Resin in the last 60s | ||
+ | ; Thread Idle Count | ||
+ | : The current number of threads idle in the pool. | ||
+ | ; Thread Overflow Count | ||
+ | : The number of threads Resin created using the overflow method. | ||
+ | ; Thread Priority Queue | ||
+ | : The number of threads dispatching the priority queue | ||
+ | ; Thread Starting Count | ||
+ | : The current number of threads starting | ||
+ | ; Thread Task Queue | ||
+ | : The number of threads reading from the task queue | ||
+ | ; Thread Wait Count | ||
+ | : The requests waiting for an active thread | ||
+ | |||
+ | === Troubleshooting === | ||
+ | |||
+ | * The thread create count should generally be very low, and preferably zero. If the create count is high, the pool isn't being effective. | ||
+ | * The overflow count should be zero unless the pool is overflowing. | ||
+ | * The Priority Queue and Task Queue counts should generally be zero unless there's a thread spike. |
Latest revision as of 23:28, 14 October 2010
Contents |
Graphs
The Graphs tab in the /resin-admin gives you a view of the meter data collected by Resin across the cluster. The Statistics service that gathers the Meter data is available in Resin Professional.
Graph Browsing
When looking at the server statistics, you first need to select meters to display. Resin's meters are the statistics data streams gathered every minute and stored by the triad servers. Some meters are JMX attributes collected over time, and others record data from Resin's embedded sensors.
Selecting a meter will add the meter to the graph. Once you've selected a group of meters, you can save them as a named meter set using meter save.
Meter Names
Meter names follow a standard convention: "00|Author|Group|Name". The "00" is the server index in the cluster. Author is soemthing like "JVM", "OS", or "Resin", or "MyCom" for custom meters.
Server Groups
Graphs for servers in the cluster can be displayed in three basic modes: single server, one graph, or multiple graphs. In the single graph mode, each meter for each server has its own graph line. In the multiple graph mode, each server gets its own graph. The multiple-graph mode is more useful comparisons across the cluster.
Cookbook: setting up a thread graph
- Clear all the meters by clicking the "Clear Meters" button on the right.
- Open the "JVM|Thread" group to find the recorded data from the JVM's own thread count.
- Select "JVM Thread Count". You should see a graph of the JVM's thread count in the graph. You can use the "Time" selector to change the timescale to use.
- Open the "Resin|Thread" group for the meters in Resin's own thread pool.
- Select all the meters in the "Resin|Thread" group. You should see a graph with about 4 lines visible and the rest at zero.
- Type "threads" in the Meter Save Name form and select "Save Meters". Saving the meters will add "threads" as a predefined meter group in the Meters selection at the top.
Meters
The predefined meters are in three groups: JVM, OS, and Resin.
- JVM is data from the JVM's JMX beans, like thread counts and garbage collection.
- OS is data from the operating system, like CPU counts.
- Resin is data from Resin's JMX and sensors
JVM|Compilation
The JVM compilation group measure JIT compilation times as reported by the JVM.
- Compilation Time
- the time taken for garbage collection in the last 60 seconds.
JVM|Memory
The JVM's memory and garbage collection information is useful when tuning memory and checking for memory leak situations, and checking that GC time is in a reasonable range.
- GC Time|PS MarkSweep
- the GC time taken in the last 60 seconds for full mark-sweep collection as reported by the JVM.
- GC Time|PS Scavenge
- the GC time taken for short GC scavenging as reported by the JVM.
- Heap Memory Free
- free heap memory in bytes as reported by the JVM
- Heap Memory Used
- total allocated memory in the heap
- Loaded Classes
- total number of classes loaded by the JVM
- PermGen Memory Free
- memory free in the "perm gen" group, used for .class data
- PermGen Memory Used
- allocated memory in the perm gen pool.
- Tenured Memory Free
- free memory in the long-term tenured heap
- Tenured Memory Used
- allocated memory in the long-term tenured heap
JVM|Thread
The JVM's thread group reports the total number of threads in the JVM.
- JVM Thread Count
- The total threads in the JVM
OS|CPU
The CPU load as reported by the JVM. This report is different for different operating systems. On Linux, the CPU is reported for each CPU and combined.
- Unix Load Avg
- On Unix systems (non-Linux), reports the system's Load Average. The load average is the count of runnable processes; it's not directly a CPU load measure
OS|Memory
The OS|Memory group report operating system memory.
- Physical Memory Free
- The physical free memory as reported in JMX.
- Swap Free
- The free swap as reported in JMX.
OS|Process
Process-related information as reported by the OS.
- File Descriptor Count
- The number of open files and sockets in the JVM process
Resin|Cache
The cache statistics include both the proxy cache and Resin's underlying block cache, which is also used for distributed sessions.
- Block Miss Count
- How many times the low-level block cache missed, causing Resin to read or write from disk
- Block Read Count
- The count of blocks read in the last 60 seconds.
- Block Write Count
- How many blocks were written to disk in the last 60 seconds.
- Proxy Cache Hit Count
- How many requests successfully used the proxy cache in the last 60 seconds.
- Proxy Cache Miss Count
- How many cacheable requests failed to find a valid page in the proxy cache in the last 60 seconds.
Troubleshooting
A high block read or write count may indicate that the block cache is too small. Since the purpose of the block cache is to reduce the slow filesystem reads and writes, high block reads and writes means Resin is spending more time reading and writing files.
Resin|Cluster
The Cluster group measures outgoing connections to other servers in the Resin system. This measurement is similar to the heartbeat since it counts cluster connections.
There are separate meters for each outgoing server. So server #2 will have data going to servers #0 and #1.
- Connection Active|NN cluster-id
- Measure the current active connections from this server to a target server named by "NN:cluster-id"
- Connection Count|NN cluster-id
- Counts the number of connections created in the last 60s from this server to the target server.
- Idle Active|NN cluster-id
- Counts the current idle connections in the pool from this server to the target server.
- Idle Count|NN cluster-id
- Counts the number of transitions to the idle state.
- Request Active|NN cluster-id
- The current number of active requests from this server to the target server
- Request Count|NN cluster-id
- The number of requests to the target server in the last 60s.
- Request Fail|NN cluster-id
- The number of failed requests to the target server in the last 60s
- Request Time|NN cluster-id
- The average request time for requests to the target server in the last 60s.
- Request Time Max
- NN cluster-id
- The longest request time for a request to the target server in the last 60s
- Request Time 95%
- NN cluster-id
- The time for 95% of requests to complete
Resin|Database
The data for the Resin database pool lets you tune the pool, and check for slow database queries.
- Query Active
- Counts the current number of active queries.
- Query Count
- Counts the queries in the last 60s
- Query Time
- The average query time in the last 60s
- Query Time Max
- The maximum query time in the last 60s
- Query Time 95%
- The time for 95% of queries to complete
- Connection Active
- The current number of active database connections
- Connection Count
- The number of connections created in the last 60s
- Connection Time
- The average open time for connections in the last 60s.
- Idle Active
- The current number of idle connections in the database pool
- Idle Count
- The number of connections changing to the idle state in the last 60s
- Idle Time
- The average time connections are idle for the last 60s.
Resin|Health
Each health check in Resin's health system records its current status as a meter. Since the "OK" level is zero, a stable system has a zero graph. The warning level is 1 and the fail level is 2.
Resin|Http
HTTP requests and sessions are recorded in the Resin|Http section, letting you check for slow requests and unexpected HTTP session sizes.
- Request Active
- The current number of active requests
- Request Bytes
- The number of bytes transferred in the last 60s
- Request Count
- The number of requests in the last 60s
- Request Time
- The average time for a request in the last 60s
- Request Time Max
- The slowest request in the last 60s
- Request Time 95%
- The time for 95% of requests to complete
- Session Save Count
- The number of sessions saved in the last 60s
- Session Save Size
- The average serialized session size in the last 60s
Resin|Thread
Statistics related to the Resin thread pool, used for requests and timers.
- Thread Active Count
- The number of threads currently active
- Thread Count
- The total number of threads managed by Resin
- Thread Create Count
- The threads created by Resin in the last 60s
- Thread Idle Count
- The current number of threads idle in the pool.
- Thread Overflow Count
- The number of threads Resin created using the overflow method.
- Thread Priority Queue
- The number of threads dispatching the priority queue
- Thread Starting Count
- The current number of threads starting
- Thread Task Queue
- The number of threads reading from the task queue
- Thread Wait Count
- The requests waiting for an active thread
Troubleshooting
- The thread create count should generally be very low, and preferably zero. If the create count is high, the pool isn't being effective.
- The overflow count should be zero unless the pool is overflowing.
- The Priority Queue and Task Queue counts should generally be zero unless there's a thread spike.