Wednesday, July 9, 2008

How to use Windows Server 2008 Reliability and Performance Monitor

The Reliability and Performance Monitor snap-in enables you to monitor server performance in real time. You can monitor hardware and application performance and create threshold alerts and performance reports. In terms of defining performance and reliability, performance describes how quickly the server completes the tasks it must accomplish. Reliability, on the other hand, is more a measure of how often the server performs exactly as you would expect in relation to its configuration.

The Reliability and Performance Monitor snap-in also provides access to the Performance Monitor, which was available in Windows Server 2003, and the new Reliability Monitor. The Performance Monitor enables you to add counters to quickly view real-time hardware information such as the percent processor time and also view information related to system services such as HTTP (on a web server).

The Reliability Monitor provides a System Stability chart that can be used to quickly view specific information about hardware, application, and Windows failures. You can click on a chart date, which runs along the x-axis of the chart and then view various system stability reports related to alerts and failures. The Reliability Monitor, which, in effect, provides some of the same type of information that you could glean from the Event Viewer, is discussed later in the hour.

Obviously, the Reliability and Performance Monitor provides a lot of potential information related to how a server is performing in terms of both hardware and software (including the operating system). What you are really trying to do when you monitor server performance is identify potential performance bottlenecks (say the CPU or the hard drive). When you measure reliability, you are looking for such things as device drivers that failed to initialize or services that had to stop and restart. Reliability often relates to the server configuration rather than hardware configuration, as performance does.

You can open the Reliability and Performance Monitor in the Server Manager (Start, Administrative Tools, Reliability and Performance Monitor). Expand the Diagnostic node and then select the Reliability and Performance node.

You can also run the Reliability and Performance Monitor snap-in in the MMC (Start, Administrative Tools, Reliability and Performance Monitor

The Resource View pane of the Reliability and Performance Monitor provides you with a quick look at CPU, Disk, Network, and Memory usage on the server. Real-time counters at the top of the window show you how each of these resources is currently affected by demand on the server from such things as user access, resources served to users, and other processes running on the server that are related to the various roles you have assigned the server.

Below the Resource View graphs is the Resource View details area. By default, all the Resource details are closed and show a counter that provides the running data points that are shown in the associated graph.

You can expand each of the Resource views to view the details related to a particular resource such as the CPU resource, which measures the total percentage of CPU capacity currently in use. When you expand the CPU resource, you are in the Resource Overview details (for CPU capacity), which provides a detail table.

Let's look at each of the resources measured in the Reliability and Performance Monitor and what kind of details are provided when you look at the expanded view details for a particular resource. The Resource view provides the following information:

CPU— The total percentage of CPU use is displayed in green. The CPU Maximum Frequency is displayed in blue. The details table contains the following:
Image— Application using the CPU
PID— The application instance's process ID
Description— The application name
Threads— Active threads from the application instance
CPU— CPU cycles active from the application instance
Average CPU— Average CPU load (over the last 60 seconds) from the application instance

The PID or process identifier is the unique number the operating system assigns to a process. A thread is part of an application that can execute independently.

Disk— The total input/output (current) is displayed in green. The percentage for the highest active time is displayed in blue. The details table contains the following:
Image— Application using the disk
PID— The application instance's process ID
File— The file read/written by an application
Read— The current read speed (in bytes/minute) for the data by an application
Write— The speed (bytes/minute) at which the application is writing data
IO Priority— The I/O task priority for the application
Response Time— Disk response time in milliseconds

Network— Displays the total network traffic (Kbps) in green and the network capacity percentage currently in use in blue. The details table contains the following:
Image— Application using the network resources
PID— The application instance's process ID
Description— The application name
Address— The network address (IP address, FQDN name, or computer name) with which the local computer is exchanging information
Send— The data currently being sent from the local computer (as sent by the application named in the Image line)
Receive— The amount of data currently being received (bytes/minute)
Total— Total bandwidth used (that is, sent and received) in bytes/minute by the application

Memory— Displays the hard faults per second in green and the physical memory currently in use percentage in blue. The details table contains the following:

A hard fault or page fault is basically when data requested by the application instance is not in real memory and so must be retrieved from the paging file and loaded into memory.

Image— Application using the network resources
PID— The application instance's process ID
Description— The application name
Hard Faults/Min— Hard faults (per minute) resulting from the application instance; a lot of hard faults would indicate that your server's memory is becoming a performance bottleneck
Working Set (KB)— The amount of memory (in kilobytes) currently being used by the application instance
Shareable (KB)— The amount of memory in the working set that may be available to other applications.
Private (KB)— The amount of memory in the working set reserved for the application instance

Obviously, the Resource view details provide a lot of information. But the key to using this information really lies in the fact that server performance can be affected in a negative way by two things: hardware problems and software problems.

The typical hardware bottlenecks for a server are the CPU, disks, network adapter (or adapters), and memory. The Reliability and Performance Monitor provides graphs for these hardware components because they can often be the reason the server is underperforming.

If the problem isn't directly related to a hardware malfunction, the problem can be a software issue that is monopolizing one of the key server hardware components, such as the CPU or the network adapter. Having quick access to the information related to the application instance enables you to potentially identify a malfunctioning software entity. So, although you can gain more specific real-time data using the various counters available in the Performance Monitor and more details related to server hardware and software events that are logged in the Event Viewer, the Reliability and Performance Monitor is definitely a quick way to survey a server's health.

The Reliability Monitor, a new tool provided by the Reliability and Performance Monitor snap-in, provides a system stability chart that enables you to view events related to software, application, and hardware failures. It provides quick access to "bad" events in a timeline, making it a useful addition to server troubleshooting, particularly when used with Event Viewer data.

No comments: