Calculating the Core Frequencies of a Modern Intel CPU with Clock-Varying Features in Visual C++ on a Windows Machine

Note that the complete solution and download link is at the bottom. Modern Intel CPU’s have features such as Turbo Boost, which will vary the frequency of your CPU. Please note that (and I have seen this on multiple machines I own) you should disable the Hyper-V feature in Windows before trying out the code below, as Hyper-V seems to stop clock-variations from taking place on Intel CPU’s. In other words, it seems to stop Turbo Boost. To disable Hyper-V, go to Control Panel -> Uninstall a program -> Turn Windows features on or off (from the menu to the left of the window), and then deselect the Hyper-V check-box, and hit OK.

Before the introduction of Turbo Boost, functions such as QueryPerformanceFrequency in conjunction with some other methods provided an effective means of detecting a CPU’s clock speed. This is no longer the case. QueryPerformanceFrequency only seems to return the maximum frequency before clock-varying technologies start to kick in. However, this function is still going to be important (and still going to be used in our calculations). The actual formula for calculating CPU frequency is the maximum CPU frequency before clock-varying technologies multiplied by two Intel-specific registers which give you the ratio of actual performance over maximum performance (actualFrequency = maximumFrequencyWithoutClockSpeedVariations * APERF / MPERF). There registers are known as APERF and MPERF, and the ratio of APERF / MPERF is what we need to fetch.

To obtain the values of model-specific registers such as APERF or MPERF, we need to generate the rdmsr instruction using an Intrinsic Function known as __readmsr. Unfortunately, calling this function is not as simple as just, well…calling it. As the remark for it on MSDN goes, “This function is only available in kernel mode, and the routine is only available as an intrinsic.” When you write an application in Windows, like a WPF application or a Console Application, it runs as a user-mode application, meaning that it has some security restrictions in what you can do with it and that it runs in a ring of security called Ring 3, the least privileged security level. However, a kernel-mode application runs in Ring 0, the most privileged security level. Wikipedia has a great article on Protection Rings if you’d like to read more about them. As far as I know, the only way to get into kernel-mode is to write a kernel-mode driver. What this all means is that we need to have a user-mode application that talks to a kernel-mode driver in order to fetch the values of APERF and MPERF.

I started by downloading the Windows Driver Kit, which you will also need, so go ahead and grab it. I spent a lot of time trying to find examples of kernel-mode drivers that communicated with user-mode applications, and I came across a nice example on CodeProject. I noticed that the example in question uses driver source code that is only for WDM-based drivers, whereas most driver templates in the Windows Driver Kit are WDF-based. In fact, the latest Windows Driver Kit has no WDM driver with source code in it (only a blank template with absolutely no code). You can read about the differences between WDF and WDM on MSDN. WDM is clearly what we are after, as this article states, “The WDM model is closely tied to the operating system. Drivers interact directly with the operating system by calling system service routines and manipulating operating system structures. Because WDM drivers are trusted kernel-mode components, the system provides limited checks on driver input.” After spending some time looking through the Windows Driver Kit samples, I found the IOCTL sample driver which is a perfect sample skeleton driver that we can alter to fit our needs. Its description reads, “This sample driver is not a Plug and Play driver. This is a minimal driver meant to demonstrate a feature of the operating system. Neither this driver nor its sample programs are intended for use in a production environment. Instead, they are intended for educational purposes and as a skeleton driver.” I began modifying that driver to fit my actual needs. I also learned a few things along the way. For one, kernel drivers don’t like floating point arithmetic. I’m not even entirely sure that they will compile if you try to make use of a double or float type unless you find a means of explicitly including that type definition inside of your driver. Also, you’re limited because you cannot use the Windows header, and must use a special kernel-mode header known as the Ntddk header. I’ve modified the IOCTL sample skeleton as follows:

driver.c

//
// Include files.
//

#include <ntddk.h>          // various NT definitions
#include <string.h>
#include <intrin.h>

#include "driver.h"

#define NT_DEVICE_NAME      L"\\Device\\KernelModeDriver"
#define DOS_DEVICE_NAME     L"\\DosDevices\\KernelModeDriver"

#if DBG
#define DRIVER_PRINT(_x_) \
                DbgPrint("KernelModeDriver.sys: ");\
                DbgPrint _x_;

#else
#define DRIVER_PRINT(_x_)
#endif

//
// Device driver routine declarations.
//

DRIVER_INITIALIZE DriverEntry;

_Dispatch_type_(IRP_MJ_CREATE)
_Dispatch_type_(IRP_MJ_CLOSE)
DRIVER_DISPATCH DriverCreateClose;

_Dispatch_type_(IRP_MJ_DEVICE_CONTROL)
DRIVER_DISPATCH DriverDeviceControl;

DRIVER_UNLOAD DriverUnloadDriver;

VOID
PrintIrpInfo(
    PIRP Irp
    );
VOID
PrintChars(
    _In_reads_(CountChars) PCHAR BufferAddress,
    _In_ size_t CountChars
    );

#ifdef ALLOC_PRAGMA
#pragma alloc_text( INIT, DriverEntry )
#pragma alloc_text( PAGE, DriverCreateClose)
#pragma alloc_text( PAGE, DriverDeviceControl)
#pragma alloc_text( PAGE, DriverUnloadDriver)
#pragma alloc_text( PAGE, PrintIrpInfo)
#pragma alloc_text( PAGE, PrintChars)
#endif // ALLOC_PRAGMA


NTSTATUS
DriverEntry(
    _In_ PDRIVER_OBJECT   DriverObject,
    _In_ PUNICODE_STRING      RegistryPath
    )
/*++

Routine Description:
    This routine is called by the Operating System to initialize the driver.

    It creates the device object, fills in the dispatch entry points and
    completes the initialization.

Arguments:
    DriverObject - a pointer to the object that represents this device
    driver.

    RegistryPath - a pointer to our Services key in the registry.

Return Value:
    STATUS_SUCCESS if initialized; an error otherwise.

--*/

{
    NTSTATUS        ntStatus;
    UNICODE_STRING  ntUnicodeString;    // NT Device Name "\Device\KernelModeDriver"
    UNICODE_STRING  ntWin32NameString;    // Win32 Name "\DosDevices\KernelModeDriver"
    PDEVICE_OBJECT  deviceObject = NULL;    // ptr to device object

    UNREFERENCED_PARAMETER(RegistryPath);

    RtlInitUnicodeString( &ntUnicodeString, NT_DEVICE_NAME );

    ntStatus = IoCreateDevice(
        DriverObject,                   // Our Driver Object
        0,                              // We don't use a device extension
        &ntUnicodeString,               // Device name "\Device\KernelModeDriver"
        FILE_DEVICE_UNKNOWN,            // Device type
        FILE_DEVICE_SECURE_OPEN,		// Device characteristics
        FALSE,                          // Not an exclusive device
        &deviceObject );                // Returned ptr to Device Object

    if ( !NT_SUCCESS( ntStatus ) )
    {
        DRIVER_PRINT(("Couldn't create the device object\n"));
        return ntStatus;
    }

    //
    // Initialize the driver object with this driver's entry points.
    //

	DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverCreateClose;
	DriverObject->MajorFunction[IRP_MJ_CLOSE] = DriverCreateClose;
	DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DriverDeviceControl;
	DriverObject->DriverUnload = DriverUnloadDriver;

    //
    // Initialize a Unicode String containing the Win32 name
    // for our device.
    //

    RtlInitUnicodeString( &ntWin32NameString, DOS_DEVICE_NAME );

    //
    // Create a symbolic link between our device name  and the Win32 name
    //

    ntStatus = IoCreateSymbolicLink(
                        &ntWin32NameString, &ntUnicodeString );

    if ( !NT_SUCCESS( ntStatus ) )
    {
        //
        // Delete everything that this routine has allocated.
        //
        DRIVER_PRINT(("Couldn't create symbolic link\n"));
        IoDeleteDevice( deviceObject );
    }


    return ntStatus;
}


NTSTATUS
DriverCreateClose(
    PDEVICE_OBJECT DeviceObject,
    PIRP Irp
    )
/*++

Routine Description:

    This routine is called by the I/O system when the KernelModeDriver is opened or
    closed.

    No action is performed other than completing the request successfully.

Arguments:

    DeviceObject - a pointer to the object that represents the device
    that I/O is to be done on.

    Irp - a pointer to the I/O Request Packet for this request.

Return Value:

    NT status code

--*/

{
    UNREFERENCED_PARAMETER(DeviceObject);

    PAGED_CODE();

    Irp->IoStatus.Status = STATUS_SUCCESS;
    Irp->IoStatus.Information = 0;

    IoCompleteRequest( Irp, IO_NO_INCREMENT );

    return STATUS_SUCCESS;
}

VOID
DriverUnloadDriver(
    _In_ PDRIVER_OBJECT DriverObject
    )
/*++

Routine Description:

    This routine is called by the I/O system to unload the driver.

    Any resources previously allocated must be freed.

Arguments:

    DriverObject - a pointer to the object that represents our driver.

Return Value:

    None
--*/

{
    PDEVICE_OBJECT deviceObject = DriverObject->DeviceObject;
    UNICODE_STRING uniWin32NameString;

    PAGED_CODE();

    //
    // Create counted string version of our Win32 device name.
    //

    RtlInitUnicodeString( &uniWin32NameString, DOS_DEVICE_NAME );


    //
    // Delete the link from our device name to a name in the Win32 namespace.
    //

    IoDeleteSymbolicLink( &uniWin32NameString );

    if ( deviceObject != NULL )
    {
        IoDeleteDevice( deviceObject );
    }



}

NTSTATUS
DriverDeviceControl(
    PDEVICE_OBJECT DeviceObject,
    PIRP Irp
    )

/*++

Routine Description:

    This routine is called by the I/O system to perform a device I/O
    control function.

Arguments:

    DeviceObject - a pointer to the object that represents the device
        that I/O is to be done on.

    Irp - a pointer to the I/O Request Packet for this request.

Return Value:

    NT status code

--*/

{
    PIO_STACK_LOCATION  irpSp;// Pointer to current stack location
    NTSTATUS            ntStatus = STATUS_SUCCESS;// Assume success
    ULONG               inBufLength; // Input buffer length
	ULONG               outBufLength; // Output buffer length
	void				*inBuf; // pointer to input buffer
	unsigned __int64    *outBuf; // pointer to the output buffer

    UNREFERENCED_PARAMETER(DeviceObject);

    PAGED_CODE();

    irpSp = IoGetCurrentIrpStackLocation( Irp );
	inBufLength = irpSp->Parameters.DeviceIoControl.InputBufferLength;
	outBufLength = irpSp->Parameters.DeviceIoControl.OutputBufferLength;

	if (!inBufLength || !outBufLength || outBufLength != sizeof(unsigned __int64)*2)
    {
        ntStatus = STATUS_INVALID_PARAMETER;
        goto End;
    }

    //
    // Determine which I/O control code was specified.
    //

    switch ( irpSp->Parameters.DeviceIoControl.IoControlCode )
    {
    case IOCTL_SIOCTL_METHOD_BUFFERED:

        //
        // In this method the I/O manager allocates a buffer large enough to
        // to accommodate larger of the user input buffer and output buffer,
        // assigns the address to Irp->AssociatedIrp.SystemBuffer, and
        // copies the content of the user input buffer into this SystemBuffer
        //

        DRIVER_PRINT(("Called IOCTL_SIOCTL_METHOD_BUFFERED\n"));
        PrintIrpInfo(Irp);

        //
        // Input buffer and output buffer is same in this case, read the
        // content of the buffer before writing to it
        //

        inBuf = (void *)Irp->AssociatedIrp.SystemBuffer;
		outBuf = (unsigned __int64 *)Irp->AssociatedIrp.SystemBuffer;

        //
        // Read the data from the buffer
        //

        DRIVER_PRINT(("\tData from User :"));
        //
        // We are using the following function to print characters instead
        // DebugPrint with %s format because we string we get may or
        // may not be null terminated.
        //
        PrintChars(inBuf, inBufLength);

        //
        // Write to the buffer
        //

		unsigned __int64 data[sizeof(unsigned __int64) * 2];
		data[0] = __readmsr(232);
		data[1] = __readmsr(231);

		DRIVER_PRINT(("data[0]: %d", data[0]));
		DRIVER_PRINT(("data[1]: %d", data[1]));

		RtlCopyBytes(outBuf, data, outBufLength);

        //
        // Assign the length of the data copied to IoStatus.Information
        // of the Irp and complete the Irp.
        //

		Irp->IoStatus.Information = sizeof(unsigned __int64)*2;

        //
        // When the Irp is completed the content of the SystemBuffer
        // is copied to the User output buffer and the SystemBuffer is
        // is freed.
        //

       break;

    default:

        //
        // The specified I/O control code is unrecognized by this driver.
        //

        ntStatus = STATUS_INVALID_DEVICE_REQUEST;
        DRIVER_PRINT(("ERROR: unrecognized IOCTL %x\n",
            irpSp->Parameters.DeviceIoControl.IoControlCode));
        break;
    }

End:
    //
    // Finish the I/O operation by simply completing the packet and returning
    // the same status as in the packet itself.
    //

    Irp->IoStatus.Status = ntStatus;

    IoCompleteRequest( Irp, IO_NO_INCREMENT );

    return ntStatus;
}

VOID
PrintIrpInfo(
    PIRP Irp)
{
    PIO_STACK_LOCATION  irpSp;
    irpSp = IoGetCurrentIrpStackLocation( Irp );

    PAGED_CODE();

    DRIVER_PRINT(("\tIrp->AssociatedIrp.SystemBuffer = 0x%p\n",
        Irp->AssociatedIrp.SystemBuffer));
    DRIVER_PRINT(("\tIrp->UserBuffer = 0x%p\n", Irp->UserBuffer));
    DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.Type3InputBuffer = 0x%p\n",
        irpSp->Parameters.DeviceIoControl.Type3InputBuffer));
    DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.InputBufferLength = %d\n",
        irpSp->Parameters.DeviceIoControl.InputBufferLength));
    DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.OutputBufferLength = %d\n",
        irpSp->Parameters.DeviceIoControl.OutputBufferLength ));
    return;
}

VOID
PrintChars(
    _In_reads_(CountChars) PCHAR BufferAddress,
    _In_ size_t CountChars
    )
{
    PAGED_CODE();

    if (CountChars) {

        while (CountChars--) {

            if (*BufferAddress > 31
                 && *BufferAddress != 127) {

                KdPrint (( "%c", *BufferAddress) );

            } else {

                KdPrint(( ".") );

            }
            BufferAddress++;
        }
        KdPrint (("\n"));
    }
    return;
}

And finally, the user-mode Win32 Console Application that loads and also runs this driver:

FrequencyCalculator.cpp

#include "stdafx.h"
#include <iostream>
#include <windows.h>
#include <winioctl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strsafe.h>
#include <process.h>
#include "..\KernelModeDriver\driver.h"

using namespace std;

BOOLEAN
ManageDriver(
_In_ LPCTSTR  DriverName,
_In_ LPCTSTR  ServiceName,
_In_ USHORT   Function
);

HANDLE hDevice;
TCHAR driverLocation[MAX_PATH];

void InstallDriver()
{
	DWORD errNum = 0;
	GetCurrentDirectory(MAX_PATH, driverLocation);
	_tcscat_s(driverLocation, _T("\\KernelModeDriver.sys"));

	std::wcout << "Trying to install driver at " << driverLocation << std::endl;

	//
	// open the device
	//

	if ((hDevice = CreateFile(_T("\\\\.\\KernelModeDriver"),
		GENERIC_READ | GENERIC_WRITE,
		0,
		NULL,
		CREATE_ALWAYS,
		FILE_ATTRIBUTE_NORMAL,
		NULL)) == INVALID_HANDLE_VALUE) {

		errNum = GetLastError();

		if (errNum != ERROR_FILE_NOT_FOUND) {

			printf("CreateFile failed!  ERROR_FILE_NOT_FOUND = %d\n", errNum);

			return;
		}

		//
		// The driver is not started yet so let us the install the driver.
		// First setup full path to driver name.
		//

		if (!ManageDriver(_T(DRIVER_NAME),
			driverLocation,
			DRIVER_FUNC_INSTALL
			)) {

			printf("Unable to install driver. \n");

			//
			// Error - remove driver.
			//

			ManageDriver(_T(DRIVER_NAME),
				driverLocation,
				DRIVER_FUNC_REMOVE
				);

			return;
		}

		hDevice = CreateFile(_T("\\\\.\\KernelModeDriver"),
			GENERIC_READ | GENERIC_WRITE,
			0,
			NULL,
			CREATE_ALWAYS,
			FILE_ATTRIBUTE_NORMAL,
			NULL);

		if (hDevice == INVALID_HANDLE_VALUE){
			printf("Error: CreatFile Failed : %d\n", GetLastError());
			return;
		}
	}
}

void UninstallDriver()
{
	//
	// close the handle to the device.
	//

	CloseHandle(hDevice);

	//
	// Unload the driver.  Ignore any errors.
	//
	ManageDriver(_T(DRIVER_NAME),
		driverLocation,
		DRIVER_FUNC_REMOVE
		);
}

double GetPerformanceRatio()
{
	BOOL bRc;
	ULONG bytesReturned;

	int input = 0;
	unsigned __int64 output[2];
	memset(output, 0, sizeof(unsigned __int64) * 2);

	//printf("InputBuffer Pointer = %p, BufLength = %d\n", &input, sizeof(&input));
	//printf("OutputBuffer Pointer = %p BufLength = %d\n", &output, sizeof(&output));

	//
	// Performing METHOD_BUFFERED
	//

	//printf("\nCalling DeviceIoControl METHOD_BUFFERED:\n");

	bRc = DeviceIoControl(hDevice,
		(DWORD)IOCTL_SIOCTL_METHOD_BUFFERED,
		&input,
		sizeof(&input),
		output,
		sizeof(unsigned __int64)*2,
		&bytesReturned,
		NULL
		);

	if (!bRc)
	{
		//printf("Error in DeviceIoControl : %d", GetLastError());
		return 0;

	}
	//printf("    OutBuffer (%d): %d\n", bytesReturned, output);
	if (output[1] == 0)
	{
		return 0;
	}
	else
	{
		return (float)output[0] / (float)output[1];
	}
}

struct Core
{
	int CoreNumber;
};

int GetNumberOfProcessorCores()
{
	SYSTEM_INFO sysinfo;
	GetSystemInfo(&sysinfo);
	return sysinfo.dwNumberOfProcessors;
}

float GetCoreFrequency()
{
	// __rdtsc: Returns the processor time stamp which records the number of clock cycles since the last reset.
	// QueryPerformanceCounter: Returns a high resolution time stamp that can be used for time-interval measurements.
	// Get the frequency which defines the step size of the QueryPerformanceCounter method.
	LARGE_INTEGER frequency;
	QueryPerformanceFrequency(&frequency);
	// Get the number of cycles before we start.
	ULONG cyclesBefore = __rdtsc();
	// Get the Intel performance ratio at the start.
	float ratioBefore = GetPerformanceRatio();
	// Get the start time.
	LARGE_INTEGER startTime;
	QueryPerformanceCounter(&startTime);
	// Give the CPU cores enough time to repopulate their __rdtsc and QueryPerformanceCounter registers.
	Sleep(1000);
	ULONG cyclesAfter = __rdtsc();
	// Get the Intel performance ratio at the end.
	float ratioAfter = GetPerformanceRatio();
	// Get the end time.
	LARGE_INTEGER endTime;
	QueryPerformanceCounter(&endTime);
	// Return the number of MHz. Multiply the core's frequency by the mean MSR (model-specific register) ratio (the APERF register's value divided by the MPERF register's value) between the two timestamps.
	return ((ratioAfter + ratioBefore) / 2)*(cyclesAfter - cyclesBefore)*pow(10, -6) / ((endTime.QuadPart - startTime.QuadPart) / frequency.QuadPart);
}

struct CoreResults
{
	int CoreNumber;
	float CoreFrequency;
};

CRITICAL_SECTION printLock;

static void printResult(void *param)
{
	EnterCriticalSection(&printLock);
	CoreResults coreResults = *((CoreResults *)param);
	std::cout << "Core " << coreResults.CoreNumber << " has a speed of " << coreResults.CoreFrequency << " MHz" << std::endl;
	delete param;
	LeaveCriticalSection(&printLock);
}

bool closed = false;

static void startMonitoringCoreSpeeds(void *param)
{
	Core core = *((Core *)param);
	SetThreadAffinityMask(GetCurrentThread(), 1 << core.CoreNumber);
	while (!closed)
	{
		CoreResults *coreResults = new CoreResults();
		coreResults->CoreNumber = core.CoreNumber;
		coreResults->CoreFrequency = GetCoreFrequency();
		_beginthread(printResult, 0, coreResults);
	}
	delete param;
}

int _tmain(int argc, _TCHAR* argv[])
{
	InitializeCriticalSection(&printLock);
	InstallDriver();
	for (int i = 0; i < GetNumberOfProcessorCores(); i++)
	{
		Core *core = new Core{ 0 };
		core->CoreNumber = i;
		_beginthread(startMonitoringCoreSpeeds, 0, core);
	}
	std::cin.get();
	closed = true;
	UninstallDriver();
	DeleteCriticalSection(&printLock);
}

To have the user-mode application actually install this driver, you need to either:

  • Sign the driver with a Code Signing certificate you’ve obtained from some third party issuer. To sign the driver, right click the driver’s project file, go to Configuration Properties -> Driver Signing -> General, and change the Sign Mode, setting the Production Certificate to the Code Signing certificate you have installed into your machine from the third party issuer.
  • Do not sign the driver and either disable Driver Signature Verification from the advanced Windows start-up boot options or disable Secure Boot from your BIOS and enable test signing, and generate a new test certificate from the drop-down menu under the driver’s Configuration Properties -> Driver Signing -> General -> Test Certificate.

If you’ve completed the above steps, the code should run just fine (if you run it under Administrator credentials). Finally, you can download the complete source code of my implementation. Note that, you use this code at your own risk and are entirely liable for any problems that may arise out of it, whatsoever. I do ask that if you use my driver, you give me a little credit. Just throw my name in a comment somewhere. Here is the download link: FrequencyCalculator.zip.

Debugging Heap Corruptions in Production (Release Mode) MSVC++ Windows Applications with Global Flags

Say you have a Release mode MSVC++ Win32 Console Application with the following code:

#include "stdafx.h"
#include <iostream>

using namespace std;

void CorruptTheHeap()
{
	// Create a pointer to an integer.
	int *reallyBadlyCodedIntegerAllocation;
	// Allocate insufficient space for this memory space to store an actual integer.
	cout << "The expected memory size of this type is " << sizeof(int) << " bytes, which is " << 8*sizeof(int) << " bits." << endl;
	int sizeToAllocateForThisType = 0;
	cout << "The actual size of this integer will be set to " << sizeToAllocateForThisType << " bytes." << endl;
	reallyBadlyCodedIntegerAllocation = (int *)malloc(sizeToAllocateForThisType);
	// Try to set the integer to an integer value (causing a memory leak at this point since we are writing over the bounds of the memory we actually own).
	*reallyBadlyCodedIntegerAllocation = 0;
	// Trying to free this integer would cause a heap corruption because it will try to free the size of the type, but we have allocated it to be nothing. In a Release build, this will cause a crash.
	free(reallyBadlyCodedIntegerAllocation);
}

int _tmain(int argc, _TCHAR* argv[])
{
	CorruptTheHeap();
	cout << "Press any key to exit." << endl;
	cin.get();
	return 0;
}

Make sure the application’s project settings build it so that its *.pdb file(s) is or are in the same folder as where you execute it. Now get WinDbg, which contains Global Flags. For this example, my application is built in Release (must be in Release) as an x64 architecture application, so the next step for me is to start Global Flags (X64). A simple Windows start screen search for Global Flags should uncover it, but for me it installed at C:\Program Files (x86)\Windows Kits\8.1\Debuggers\x64\gflags.exe. Open it up, go to the Image File tab. Type in the name of your Image file. For me, my Console Application compiles and runs as HeapCorruptor.exe, so I typed that in, and pressed the TAB key on my keyboard. You’ll notice that when running Global Flags, it seems to keep track of executables based on their name, so when you restart and go through the same process of typing in the executable’s name and press the TAB key, it loads your last configuration (I know, its kind of weird and takes a bit of getting used to).

Check off Debugger and set its path to the path of your WinDbg executable. For me, a default setup of the debugging tools installed WinDbg at “C:\Program Files (x86)\Windows Kits\8.1\Debuggers\x64\windbg.exe”. Then set up any other necessary flags, as I have in the screenshot below, hit Apply, and then OK to close it down.

What you need to to set in Global Flags to catch heap corruptions in production.
Settings for detecting heap corruptions using Global Flags.

Now, run your executable. Global Flags will start up along with it. It starts paused, so type in g (for go) to get it to continue, or hit F5 in Global Flags. After you type in g and hit enter, you should see an error as the application starts up and runs:

0:000> g


===========================================================
VERIFIER STOP 000000000000000F: pid 0x3DC0: corrupted suffix pattern 

	0000003E3B071000 : Heap handle
	0000003E3C191500 : Heap block
	0000000000000001 : Block size
	0000003E3C191501 : corruption address
===========================================================
This verifier stop is not continuable. Process will be terminated 
when you use the `go' debugger command.

We have caught a heap corruption at memory address 0000003E3C191500. Type in !heap –p –a 0000003E3C191500 to get more information on the error:

    address 0000003e3c191500 found in
    _HEAP @ 3e3c150000
              HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state
        0000003e3c1914b0 0008 0000  [00]   0000003e3c191500    00001 - (busy)
        7ffc110a83f3 verifier!VerifierDisableFaultInjectionExclusionRange+0x00000000000022ff
        7ffc4e438e98 ntdll!RtlpNtMakeTemporaryKey+0x0000000000000330
        7ffc4e3f333a ntdll!memset+0x00000000000148fa
        7ffc4e3774e7 ntdll!RtlAllocateHeap+0x0000000000000187
        7ffc110c17ab verifier!VerifierGetPropertyValueByName+0x00000000000133cf
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\WINDOWS\SYSTEM32\MSVCR120.dll - 
        7ffc10e96a57 MSVCR120!malloc+0x000000000000005b
*** WARNING: Unable to verify checksum for HeapCorruptor.exe
        7ff7698d1316 HeapCorruptor!CorruptTheHeap+0x00000000000000a6
        7ff7698d1339 HeapCorruptor!wmain+0x0000000000000009
        7ff7698d1e37 HeapCorruptor!__tmainCRTStartup+0x000000000000010f
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\WINDOWS\system32\KERNEL32.DLL - 
        7ffc4bc216ad KERNEL32!BaseThreadInitThunk+0x000000000000000d
        7ffc4e3b4629 ntdll!RtlUserThreadStart+0x000000000000001d

This is great because it shows us some important bits of information, like the call-stack that caused our corruption. In particular, we can see that CorruptTheHeap() was called and that, while running it, Global Flags was then unable to verify a check-sum, so we can pinpoint the problem to be in that function…sometimes Global Flags may show you even more granular methods of underlying framework code that you are using to help you drill down to the actual function causing problems, but in a nutshell that’s essentially it. Make sure to follow the steps above and disable all check-boxes for the context of that executable’s name in Global Flags, or else it will always start with your executable.

Visual Studio’s Out-of-the-Box Low-Level Debugging Tools: An IL Disassembler and IL Assembler How-To

Ildasm.exe (IL Disassembler) is an out-of-the box disassembler packaged with Visual Studio, and Ilasm.exe (IL Assembler) is an out-of-the-box assembler packaged with Visual Studio. Let’s try using Ildasm.exe and Ilasm.exe to disassemble and then re-assemble a portable executable from a C# Console Application. Take the following C# Console Application for example:

using System;

namespace PrintSomething
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Started the console application. Press any key to exit.");
            Console.ReadLine();
        }
    }
}

Build this code in Debug mode and run it. Figure out where the code was built and copy the path to your executable for this C# Console Application. Then do a search on your computer for the VS2012 x86 Native Tools Command Prompt and open it up (it will help us run the IL Assembler and Disassembler more easily). Try to change the directory somewhere where you have permissions…for me, the desktop is an alright place so the first command I ran was this:

cd "C:\Users\YourUserAccountName\Desktop"

Here comes the sweet stuff. To disassemble your executable, run this command on it (PrintSomething.exe is the name of the Console Application I’ve shown the code for above, but you can paste whatever path you have copied to your own executable):

ildasm "C:\PathToYourExecutable\PrintSomething.exe" /out:"Disassembly.asm"

You can inspect the assembly code in Disassembly.asm now by opening it up in your favourite text editor (you will find Disassembly.asm in your working directory, which I changed to be my desktop earlier):


//  Microsoft (R) .NET Framework IL Disassembler.  Version 4.0.30319.18020
//  Copyright (c) Microsoft Corporation.  All rights reserved.



// Metadata version: v4.0.30319
.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                         // .z\V.4..
  .ver 4:0:0:0
}
.assembly PrintSomething
{
  .custom instance void [mscorlib]System.Runtime.Versioning.TargetFrameworkAttribute::.ctor(string) = ( 01 00 1C 2E 4E 45 54 46 72 61 6D 65 77 6F 72 6B   // ....NETFramework
                                                                                                        2C 56 65 72 73 69 6F 6E 3D 76 34 2E 35 2E 31 01   // ,Version=v4.5.1.
                                                                                                        00 54 0E 14 46 72 61 6D 65 77 6F 72 6B 44 69 73   // .T..FrameworkDis
                                                                                                        70 6C 61 79 4E 61 6D 65 14 2E 4E 45 54 20 46 72   // playName..NET Fr
                                                                                                        61 6D 65 77 6F 72 6B 20 34 2E 35 2E 31 )          // amework 4.5.1
  .custom instance void [mscorlib]System.Reflection.AssemblyTitleAttribute::.ctor(string) = ( 01 00 0E 50 72 69 6E 74 53 6F 6D 65 74 68 69 6E   // ...PrintSomethin
                                                                                              67 00 00 )                                        // g..
  .custom instance void [mscorlib]System.Reflection.AssemblyDescriptionAttribute::.ctor(string) = ( 01 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Reflection.AssemblyConfigurationAttribute::.ctor(string) = ( 01 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Reflection.AssemblyCompanyAttribute::.ctor(string) = ( 01 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Reflection.AssemblyProductAttribute::.ctor(string) = ( 01 00 0E 50 72 69 6E 74 53 6F 6D 65 74 68 69 6E   // ...PrintSomethin
                                                                                                67 00 00 )                                        // g..
  .custom instance void [mscorlib]System.Reflection.AssemblyCopyrightAttribute::.ctor(string) = ( 01 00 12 43 6F 70 79 72 69 67 68 74 20 C2 A9 20   // ...Copyright .. 
                                                                                                  20 32 30 31 34 00 00 )                            //  2014..
  .custom instance void [mscorlib]System.Reflection.AssemblyTrademarkAttribute::.ctor(string) = ( 01 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Runtime.InteropServices.ComVisibleAttribute::.ctor(bool) = ( 01 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Runtime.InteropServices.GuidAttribute::.ctor(string) = ( 01 00 24 36 37 36 65 36 33 32 34 2D 38 65 62 34   // ..$676e6324-8eb4
                                                                                                  2D 34 33 33 33 2D 39 33 37 37 2D 39 61 65 37 39   // -4333-9377-9ae79
                                                                                                  38 62 33 31 61 30 39 00 00 )                      // 8b31a09..
  .custom instance void [mscorlib]System.Reflection.AssemblyFileVersionAttribute::.ctor(string) = ( 01 00 07 31 2E 30 2E 30 2E 30 00 00 )             // ...1.0.0.0..

  // --- The following custom attribute is added automatically, do not uncomment -------
  //  .custom instance void [mscorlib]System.Diagnostics.DebuggableAttribute::.ctor(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) = ( 01 00 07 01 00 00 00 00 ) 

  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::.ctor(int32) = ( 01 00 08 00 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::.ctor() = ( 01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78   // ....T..WrapNonEx
                                                                                                             63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01 )       // ceptionThrows.
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}
.module PrintSomething.exe
// MVID: {5750456D-A154-4CDE-A849-0BC5414E34EE}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003       // WINDOWS_CUI
.corflags 0x00020003    //  ILONLY 32BITPREFERRED
// Image base: 0x009C0000


// =============== CLASS MEMBERS DECLARATION ===================

.class private auto ansi beforefieldinit PrintSomething.Program
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       19 (0x13)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Started the console application. Press any key to "
    + "exit."
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  call       string [mscorlib]System.Console::ReadLine()
    IL_0011:  pop
    IL_0012:  ret
  } // end of method Program::Main

  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Program::.ctor

} // end of class PrintSomething.Program


// =============================================================

// *********** DISASSEMBLY COMPLETE ***********************
// WARNING: Created Win32 resource file Disassembly.res

Try changing the text in this file. For example, change this line:

IL_0001:  ldstr      "Started the console application. Press any key to "
    + "exit."

To this line:

IL_0001:  ldstr      "That just happened."

Now assemble the code with the following command:

ilasm "Disassembly.asm"

You should see a new executable pop out of this file in your current working directory (for me, its the desktop as I’ve shown before): Disassembly.exe. Run it, and voila, you’ll see that the assembler compiled your assembly (and included any changes you’ve made). Pretty nice for touching up some very low-level code in Visual Studio applications, or even for dynamically changing your application’s manifest information.

Adding or Posting Source Code in a WordPress Blog

This took me a little while to get the hang of because my source code didn’t maintain its formatting whenever I uploaded it to my blog. In order to get source code to format correctly, you need to use the “Text” tab in the “Add New Post” menu instead of the “Visual” tab in order for it to preserve tabs and spaces. It seems that, by default, the Visual tab’s behavior causes pastings of source code to remove or delete spaces and tabs. Here is a screenshot of where you should be when paste code in:

Where you should be pasting.
Where you want to be when you paste code in a WordPress blog.

Once you’ve pasted it with correct formatting, just wrap your code in [code language=”X”] and [/code] tags, where the value of X is any one of the following values (representative of the language your code is written in and thus, the language you want your code to be parsed as):

  • actionscript3
  • bash
  • clojure
  • coldfusion
  • cpp
  • csharp
  • css
  • delphi
  • erlang
  • fsharp
  • diff
  • groovy
  • html
  • javascript
  • java
  • javafx
  • matlab (keywords only)
  • objc
  • perl
  • php
  • text
  • powershell
  • python
  • r
  • ruby
  • scala
  • sql
  • vb
  • xml

Load Testing CPU Cores in C#

I wrote a small Console application in C# that will nuke your logical processors (by nuke, I mean to say that it will eat your CPU utilization, and you’ll see each core spool up to 100% usage):

using System;
using System.Runtime.InteropServices;
using System.Threading;

namespace CPUHog
{
    class Program
    {

        [DllImport("kernel32.dll")]
        static extern IntPtr GetCurrentThread();
        [DllImport("kernel32.dll")]
        static extern IntPtr SetThreadAffinityMask(IntPtr hThread, IntPtr dwThreadAffinityMask);

        static void Main(string[] args)
        {
            Console.WriteLine("Starting the CPU hog.");
            Console.WriteLine("Number of logical processors: {0}", Environment.ProcessorCount);
            for (int i = 0; i < Environment.ProcessorCount; i++)
            {
                SpoolThread(i);
            }
            Console.WriteLine("Press any key to exit the CPU hog.");
            Console.ReadLine();
            Environment.Exit(0);
        }

        // Spools a spinning thread on a given CPU core.
        static void SpoolThread(int core)
        {
            new Thread(() =>
            {
                Thread.BeginThreadAffinity();
                Console.WriteLine("Nuking logical processor {0}.", core);
                SetThreadAffinityMask(GetCurrentThread(), new IntPtr(1 << (int)(core)));
                // The while loop below is what actually does the nuking.
                while (true) { }
            }).Start();
        }
    }
}

To view its effects, use Task Manager (CTRL + SHIFT + ESCAPE) -> Performance tab -> Click on CPU -> Right click the graph on the right-hand side (in the CPU view) -> Change graph to -> Logical processors. You’ll see something like this:

See the CPUHog in action
Showcasing the CPUHog in action.

Decoding VOX Files in C# (Converting VOX Files to WAV Files)

I wrote a C# class to decode VOX files into WAV files. It follows the Dialogic ADPCM specification strictly. If you read through that specification, the code below will become a lot clearer, otherwise you might think you’re reading another language altogether. The specification is really quite simple and nice once you boil it down. Note that the Dialogic ADPCM specification is different from the way NMS Communications libraries create VOX files as their file format is slightly different, and for files such as those, the code below will not work without some tweaks.

My implementation to decode from VOX to WAV files is as follows:

using System;
using System.IO;

class VOXDecoder
{

    static float signal = 0;
    static int previousStepSizeIndex = 0;
    static bool computedNextStepSizeOnce = false;
    static int[] possibleStepSizes = new int[49] { 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552 };

    public static void Decode(string inputFile, out string outputFile)
    {
        outputFile = String.Format("{0}\\{1}.wav", Path.GetDirectoryName(inputFile), Path.GetFileNameWithoutExtension(inputFile));
        using (FileStream inputStream = File.Open(inputFile, FileMode.Open))
        using (BinaryReader reader = new BinaryReader(inputStream))
        using (FileStream outputStream = File.Create(outputFile))
        using (BinaryWriter writer = new BinaryWriter(outputStream))
        {
            // Note that 32-bit integer values always take up 4 bytes.
            // Note that 16-bit integer values (shorts) always take up 2 bytes.
            // Note that HEX values resolve as 32-bit integers unless casted as something else, such as short values.
            // ChunkID: "RIFF"
            writer.Write(0x46464952);
            // ChunkSize: The size of the entire file in bytes minus 8 bytes for the two fields not included in this count: ChunkID and ChunkSize.
            writer.Write((int)(reader.BaseStream.Length * 4) + 36);
            // Format: "WAVE"
            writer.Write(0x45564157);
            // Subchunk1ID: "fmt " (with the space).
            writer.Write(0x20746D66);
            // Subchunk1Size: 16 for PCM.
            writer.Write(16);
            // AudioFormat: 1 for PCM.
            writer.Write((short)1);
            // NumChannels: 1 for Mono. 2 for Stereo.
            writer.Write((short)1);
            // SampleRate: 8000 is usually the default for VOX.
            writer.Write(8000);
            // ByteRate: SampleRate * NumChannels * BitsPerSample / 8.
            writer.Write(12000);
            // BlockAlign: NumChannels * BitsPerSample / 8. I rounded this up to 2. It sounds best this way.
            writer.Write((short)2);
            // BitsPerSample: I will set this as 12 (12 bits per raw output sample as per the VOX specification).
            writer.Write((short)12);
            // Subchunk2ID: "data"
            writer.Write(0x61746164);
            // Subchunk2Size: NumSamples * NumChannels * BitsPerSample / 8. You can also think of this as the size of the read of the subchunk following this number.
            writer.Write((int)(reader.BaseStream.Length * 4));
            // Write the data stream to the file in linear audio.
            while (reader.BaseStream.Position != reader.BaseStream.Length)
            {
                byte b = reader.ReadByte();
                float firstDifference = GetDifference((byte)(b / 16));
                signal += firstDifference;
                writer.Write(TruncateSignalIfNeeded());
                float secondDifference = GetDifference((byte)(b % 16));
                signal += secondDifference;
                writer.Write(TruncateSignalIfNeeded());
            }
        }
    }

    static short TruncateSignalIfNeeded()
    {
        // Keep signal truncated to 12 bits since, as per the VOX spec, each 4 bit input has 12 output bits.
        // Note that 12 bits is 0b111111111111. That's 0xFFF in HEX. That's also 4095 in decimal.
        // The sound wave is a signed signal, so factoring in 1 unused bit for the sign, that's 4095/2 rounded down to 2047.
        if (signal > 2047)
        {
            signal = 2047;
        }
        if (signal < -2047)
        {
            signal = -2047;
        }
        return (short)signal;
    }

    static float GetDifference(byte nibble)
    {
        int stepSize = GetNextStepSize(nibble);
        float difference = ((stepSize * GetBit(nibble, 2)) + ((stepSize / 2) * GetBit(nibble, 1)) + (stepSize / 4 * GetBit(nibble, 0)) + (stepSize / 8));
        if (GetBit(nibble, 3) == 1)
        {
            difference = -difference;
        }
        return difference;
    }

    static byte GetBit(byte b, int zeroBasedBitNumber)
    {
        // Shift the bits to the right by the number of the bit you want to get and then logic AND it with 1 to clear bits trailing to the left of your desired bit. 
        return (byte)((b >> zeroBasedBitNumber) & 1);
    }

    static int GetNextStepSize(byte nibble)
    {
        if (!computedNextStepSizeOnce)
        {
            computedNextStepSizeOnce = true;
            return possibleStepSizes[0];
        }
        else
        {
            int magnitude = GetMagnitude(nibble);
            if (previousStepSizeIndex + magnitude > 48)
            {
                previousStepSizeIndex = previousStepSizeIndex + magnitude;
                return possibleStepSizes[48];
            }
            else if (previousStepSizeIndex + magnitude > 0)
            {
                previousStepSizeIndex = previousStepSizeIndex + magnitude;
                return possibleStepSizes[previousStepSizeIndex];
            }
            else
            {
                return possibleStepSizes[0];
            }
        }
    }

    static int GetMagnitude(byte nibble)
    {
        if (nibble == 15 || nibble == 7)
            return 8;
        else if (nibble == 14 || nibble == 6)
            return 6;
        else if (nibble == 13 || nibble == 5)
            return 4;
        else if (nibble == 12 || nibble == 4)
            return 2;
        else
            return -1;
    }
}

It is easily called through the following two lines:

string outputWAVFilePath;
VOXDecoder.Decode(pathToYourVOXFile, out outputWAVFilePath);

Give it a shot with this sample Dialogic ADPCM VOX audio file.

P/Invoke NotifyServiceStatusChange from C#

This article actually touches up on some advanced topics of C#, and some things that you may not have ever come across. MSDN has this to say about threads:

An operating-system ThreadId has no fixed relationship to a managed thread, because an unmanaged host can control the relationship between managed and unmanaged threads. Specifically, a sophisticated host can use the CLR Hosting API to schedule many managed threads against the same operating system thread, or to move a managed thread between different operating system threads.

What this is really trying to explain is thread affinity, and that you are not guaranteed to have a native thread map 1-to-1 to a managed thread depending on the CLR (Common Language Runtime) that is hosting your code. This is important to know when you P/Invoke into native functions that require calling back into your C# code after a period of time (such as NotifyServiceStatusChange). We want to maintain that 1-to-1 relationship using Thread.BeginThreadAffinity() because the marshaling layer needs to have a valid callback reference at all times.

Here is the code that you can use to P/Invoke NotifyServiceStatusChange in C# in order to wait for a service to stop:

using System;
using System.Runtime.InteropServices;
using System.Threading;

class ServiceAssistant
{
    [System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential)]
    public class SERVICE_NOTIFY
    {
        public uint dwVersion;
        public IntPtr pfnNotifyCallback;
        public IntPtr pContext;
        public uint dwNotificationStatus;
        public SERVICE_STATUS_PROCESS ServiceStatus;
        public uint dwNotificationTriggered;
        public IntPtr pszServiceNames;
    };

    [System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential)]
    public struct SERVICE_STATUS_PROCESS
    {
        public uint dwServiceType;
        public uint dwCurrentState;
        public uint dwControlsAccepted;
        public uint dwWin32ExitCode;
        public uint dwServiceSpecificExitCode;
        public uint dwCheckPoint;
        public uint dwWaitHint;
        public uint dwProcessId;
        public uint dwServiceFlags;
    };

    [DllImport("advapi32.dll")]
    static extern IntPtr OpenService(IntPtr hSCManager, string lpServiceName, uint dwDesiredAccess);

    [DllImport("advapi32.dll")]
    static extern IntPtr OpenSCManager(string machineName, string databaseName, uint dwAccess);

    [DllImport("advapi32.dll")]
    static extern uint NotifyServiceStatusChange(IntPtr hService, uint dwNotifyMask, IntPtr pNotifyBuffer);

    [DllImport("kernel32.dll")]
    static extern uint SleepEx(uint dwMilliseconds, bool bAlertable);

    [DllImport("advapi32.dll")]
    static extern bool CloseServiceHandle(IntPtr hSCObject);

    delegate void StatusChangedCallbackDelegate(IntPtr parameter);

    /// <summary> 
    /// Block until a service stops, is killed, or is found to be already dead.
    /// </summary> 
    /// <param name="serviceName">The name of the service you would like to wait for.</param>
    /// <param name="timeout">An amount of time you would like to wait for. uint.MaxValue is the default, and it will force this thread to wait indefinitely.</param>
    public static void WaitForServiceToStop(string serviceName, uint timeout = uint.MaxValue)
    {
        // Ensure that this thread's identity is mapped, 1-to-1, with a native OS thread.
        Thread.BeginThreadAffinity();
        GCHandle notifyHandle = default(GCHandle);
        StatusChangedCallbackDelegate changeDelegate = ReceivedStatusChangedEvent;
        IntPtr hSCM = IntPtr.Zero;
        IntPtr hService = IntPtr.Zero;
        try
        {
            hSCM = OpenSCManager(null, null, (uint)0xF003F);
            if (hSCM != IntPtr.Zero)
            {
                hService = OpenService(hSCM, serviceName, (uint)0xF003F);
                if (hService != IntPtr.Zero)
                {
                    SERVICE_NOTIFY notify = new SERVICE_NOTIFY();
                    notify.dwVersion = 2;
                    notify.pfnNotifyCallback = Marshal.GetFunctionPointerForDelegate(changeDelegate);
                    notify.ServiceStatus = new SERVICE_STATUS_PROCESS();
                    notifyHandle = GCHandle.Alloc(notify, GCHandleType.Pinned);
                    IntPtr pinnedNotifyStructure = notifyHandle.AddrOfPinnedObject();
                    NotifyServiceStatusChange(hService, (uint)0x00000001, pinnedNotifyStructure);
                    SleepEx(timeout, true);
                }
            }
        }
        finally
        {
            // Clean up at the end of our operation, or if this thread is aborted.
            if (hService != IntPtr.Zero)
            {
                CloseServiceHandle(hService);
            }
            if (hSCM != IntPtr.Zero)
            {
                CloseServiceHandle(hSCM);
            }
            // Keep our callback method around until it is called (until this line of code).
            GC.KeepAlive(changeDelegate);
            if (notifyHandle != default(GCHandle))
            {
                notifyHandle.Free();
            }
            Thread.EndThreadAffinity();
        }
    }

    static void ReceivedStatusChangedEvent(IntPtr parameter)
    {
        // Do nothing.
    }
}

Its so simple, that it can just be called as follows:

ServiceAssistant.WaitForServiceToStop("YourWindowsServiceName");

Note that this is significantly different from the WaitForStatus method that is available to you out of the box in C#, because the WaitForStatus method polls every 250ms between status checks according to the remarks, whereas NotifyServiceStatusChange is event-driven and subscribes to that particular event (so its less overhead in terms of CPU usage).