Friday, March 7, 2014

Windows Performance Counters

Microsoft Windows has a mechanism for recording performance metrics from running applications using the Windows Performance Counters.  This is a high speed system that can be used to gather counts, rats, averages or other numbers based on raw counts or ratios like items/second.  You can see these counters using the perfmon application.  This system has low overhead and is capable of a high rate of capture. I measured 10 million messages / second using 4 threads on a 2011 Macbook Pro.  Performance Counters are generated/updated by major system modules and can be extended to include your own modules. Windows 8 comes with over 29,000 standard counters. You can see them by running the typeperf.exe -qx > counterlist.txt.  You can see the Performance counters in action using perfmon.

This screen shot shows the perfmon application monitoring a cpu performance counter and a custom test performance counter generated by this c# library I've put up on github.  The ROCPS64 counter is a Rate Operations Per Second 64 bit counter that counts the number of updates per second.  Look at the Last field where it shows 10,056,419 operations per second.  (I've scaled the view divided by 1,000,000 so that it can be overlayed with the % CPU processor time. This test consumes about 45% of the CPU which represents 4/8 hperthreads running at almost 100% untilization.

Note: Perfmon is showing a standard CPU counter overlayed with a custom "ROCPS64" rate of operations per second counter.

Hierarchy


Performance counters are managed under a Category / Instance / Counter hierarchy.  

  • The Category represents the area of interest like CPU, Disk or Application.  
  • The Instance  represents one of many of that item like the an individual disk drive, C:, D:.  Categories can have only one, default, instance. You're laptops power management counters only exist once for the one laptop. These would use the default instance which is specified by a null in the API.  
  • Actual performance information is recorded in individual Counters.  Each Category/Instance can have any number of counters. Each instance within a category normally has the same counters.  For example:  each CPU\Core has the same statistics, %load , current frequency.
Microsoft provides access to the performance counters with a C function library and  a set of C# wrapper classes.  Both APIs are string based where each counter update call accepts the string name of the Category, (optional) Instance and the Counter.  It is a string based API.

API Performance

The Performance Counter sub-system is capable of very high parallel counter update rates so that all the O/S and application metrics can be written with minimal impact to the system. I mentioned above that parallel testing shows that a user land application can update a custom counter at about 2.5 million updates per second per hyperthread core on a circa 2011 Intel I7 laptop. This is maximum performance. Programs should/would update counters at significantly lower rates to leave CPU time to do actual work.  

Programs first retrieve the counter as part of a Category. My testing shows that Category/Counter retrieval is very slow on the order order of 300-400ms.  

Programs should retrieve the Category/Counter objects one time and cache them to get reasonable performance.Counters are thread safe so you can updated individual counters from multiple threads with almost linear performance.

Programs update the counters through the Increment(), IncrementBy() or Decrement() methods provided by the API. Increment() is up to 10X times faster than IncrementBy(1).   Use IncrementBy() to update counters showing things like "Bytes per message". Do not call Increment() 40,000 times to represent 40,000 bytes. 

Counter Types

All counters have the same Increment(), IncrementBy(), Decrement() update API and the same NextValue() retrieval API.  NextValue() returns the calculated value based on the counter type.  Microsoft doesn't call it this but I think of the counters as three different types. I think of them as Simple, ImplicitRatio and Compound Ratio.  The first two types each use a single actual counter.  The last type requires two bound counters where the two counter values are combined for some calculated values.  

Simple

 Simple counter Counters that just keep a rolling count.  NextValue() returns the current count. 

Implicate Ratio

Implicit Ratio counters calculate ratios based on the counter value as a numerator and some implicit denominator like seconds.  The  RateOfCountsPerSecond counter is a good example.  You increment the counter and the calculated NextValue() is computed using the count and some values from the system clocks.  

Compound Ratio

More sophisticated counters represent ratios our calculation where more than one counter are involved to provide values.  Each of the compound counters is made up of a primary counter and a base counter.   Example: You could have a bytes per dollar counter where one counter represented the number of bytes processed and the other counter represented the CPU cost for the processing. Windows requires that a counter and base counter be created sequentially with the counter first and the base counter second.

Creating Categories and Counters

Microsoft requires that you have Administrative privileges to create Performance Counters.  You can create your own counters using Powershell or in your C# code.  You must run the Powershell, your program or Visual Studio as Administrators in order to create your own counters. Programs can read counters without escalated rights.

This PowerShell code creates a Category with four counters that make up three metrics: There is one raw counter, one simple rate counter and a compound counter made up of two counters. Paired counters must be added to the counter list consecutively with the main counter first and the base counter second.


Code

C# code is on github that provides automatic caching Category / Instance / Counter objects.  I wrote this code to provide Performance Counter access to Java that I'll discuss in another blog. The C# code and unit tests are located in the WindowsPerformanceCountersForJava directory. Benchmark results were generated using the multi-threaded performance unit-test.  

All unit tests create their own performance counters before running the test and remove them afterwards.  This means you have to build and run the code with Visual Studio running as Administrator. I could have created a Powershell script for counter creation so that VS wouldn't require Administrator access but this provides a single solution for automated testing.

You can find a pre-built FreemanSoft.PerformanceCounters.dll is in the Packages folder on github.

Related Posts


http://flyingpies.wordpress.com/2011/04/07/performance-counters-types-with-the-net-framework/


1 comment:

  1. Hi Joe,

    I tried to post a comment previously but not sure if it made it through. You seem to be quite familiar with the windows performance counters and its performance. I'm currently having an issue with performance, and trying to determine if it's my code, or just the performance of the counter repository. I've posted an issue over on StackOverflow; would you be so kind as to take a look and let me know if you have any insight? I would greatly appreciate it! http://stackoverflow.com/questions/25372073/performance-counters-nextvalue-very-slow-1-000-counters

    ReplyDelete