From Resin 3.0

Revision as of 22:36, 19 September 2010 by Ferg (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Graphs

The Graphs tab in the /resin-admin gives you a view of the meter data collected by Resin across the cluster. The Statistics service that gathers the Meter data is available in Resin Professional.

Graph Browsing

When looking at the server statistics, you first need to select meters to display. Resin's meters are the statistics data streams gathered every minute and stored by the triad servers. Some meters are JMX attributes collected over time, and others record data from Resin's embedded sensors.

Selecting a meter will add the meter to the graph. Once you've selected a group of meters, you can save them as a named meter set using meter save.

Meter Names

Meter names follow a standard convention: "00|Author|Group|Name". The "00" is the server index in the cluster. Author is soemthing like "JVM", "OS", or "Resin", or "MyCom" for custom meters.

Server Groups

Graphs for servers in the cluster can be displayed in three basic modes: single server, one graph, or multiple graphs. In the single graph mode, each meter for each server has its own graph line. In the multiple graph mode, each server gets its own graph. The multiple-graph mode is more useful comparisons across the cluster.

Cookbook: setting up a thread graph

Clear all the meters by clicking the "Clear Meters" button on the right.
Open the "JVM|Thread" group to find the recorded data from the JVM's own thread count.
Select "JVM Thread Count". You should see a graph of the JVM's thread count in the graph. You can use the "Time" selector to change the timescale to use.
Open the "Resin|Thread" group for the meters in Resin's own thread pool.
Select all the meters in the "Resin|Thread" group. You should see a graph with about 4 lines visible and the rest at zero.
Type "threads" in the Meter Save Name form and select "Save Meters". Saving the meters will add "threads" as a predefined meter group in the Meters selection at the top.

Meters

The predefined meters are in three groups: JVM, OS, and Resin.

JVM is data from the JVM's JMX beans, like thread counts and garbage collection.
OS is data from the operating system, like CPU counts.
Resin is data from Resin's JMX and sensors

JVM|Compilation

The JVM compilation group measure JIT compilation times as reported by the JVM.

Compilation Time: the time taken for garbage collection in the last 60 seconds.

JVM|Memory

The JVM's memory and garbage collection information is useful when tuning memory and checking for memory leak situations, and checking that GC time is in a reasonable range.

GC Time|PS MarkSweep: the GC time taken in the last 60 seconds for full mark-sweep collection as reported by the JVM.
GC Time|PS Scavenge: the GC time taken for short GC scavenging as reported by the JVM.
Heap Memory Free: free heap memory in bytes as reported by the JVM
Heap Memory Used: total allocated memory in the heap
Loaded Classes: total number of classes loaded by the JVM
PermGen Memory Free: memory free in the "perm gen" group, used for .class data
PermGen Memory Used: allocated memory in the perm gen pool.
Tenured Memory Free: free memory in the long-term tenured heap
Tenured Memory Used: allocated memory in the long-term tenured heap

JVM|Thread

The JVM's thread group reports the total number of threads in the JVM.

JVM Thread Count: The total threads in the JVM

OS|CPU

The CPU load as reported by the JVM. This report is different for different operating systems. On Linux, the CPU is reported for each CPU and combined.

Unix Load Avg: On Unix systems (non-Linux), reports the system's Load Average. The load average is the count of runnable processes; it's not directly a CPU load measure

OS|Memory

The OS|Memory group report operating system memory.

Physical Memory Free: The physical free memory as reported in JMX.
Swap Free: The free swap as reported in JMX.

OS|Process

Process-related information as reported by the OS.

File Descriptor Count: The number of open files and sockets in the JVM process

Resin|Cache

The cache statistics include both the proxy cache and Resin's underlying block cache, which is also used for distributed sessions.

Block Miss Count
How many times the low-level block cache missed, causing Resin to read or write from disk: Block Read Count
The count of blocks read in the last 60 seconds.: Block Write Count
How many blocks were written to disk in the last 60 seconds.: Proxy Cache Hit Count
How many requests successfully used the proxy cache in the last 60 seconds.
Proxy Cache Miss Count
How many cacheable requests failed to find a valid page in the proxy cache in the last 60 seconds.

Troubleshooting

A high block read or write count may indicate that the block cache is too small. Since the purpose of the block cache is to reduce the slow filesystem reads and writes, high block reads and writes means Resin is spending more time reading and writing files.

Resin|Cluster

The Cluster group measures outgoing connections to other servers in the Resin system. This measurement is similar to the heartbeat since it counts cluster connections.

There are separate meters for each outgoing server. So server #2 will have data going to servers #0 and #1.

Connection Active|NN: cluster-id; Measure the current active connections from this server to a target server named by "NN:cluster-id"
Connection Count|NN: cluster-id; Counts the number of connections created in the last 60s from this server to the target server.
Idle Active|NN: cluster-id; Counts the current idle connections in the pool from this server to the target server.
Idle Count|NN: cluster-id; Counts the number of transitions to the idle state.
Request Active|NN: cluster-id; The current number of active requests from this server to the target server
Request Count|NN: cluster-id; The number of requests to the target server in the last 60s.
Request Fail|NN: cluster-id; The number of failed requests to the target server in the last 60s
Request Time|NN: cluster-id; The average request time for requests to the target server in the last 60s.
Request Time Max: NN:cluster-id; The longest request time for a request to the target server in the last 60s
Request Time 95%: NN:cluster-id; The time for 95% of requests to complete

Admin: Graphs