Windows Error Reporting

Posted by Matt | Filed under , , ,

Microsoft has created their own error reporting system that’s built into Windows XP and Vista.  These operating systems will automatically collect crash data for any application that crashes and upload it to their servers.  There the data is collected, sorted, and distributed to the companies creating the offending software.  I’ve been using the system with my software for quite some time now, so I figured I’m collect my thoughts on the system for people thinking about using it.

From your software’s point of view, there’s nothing you really need to do to get it working.  When your software crashes, Windows will automatically catch the error, as long as you’re not catching it yourself.  It’s helpful though if you properly version your binaries (EXE and all DLLs).  This makes reading the logs later on easier.

In order to see your software crash logs, you need to have a code signing certificate.  This is required to create your companies account with winqual, and this costs some money.  I’d prefer if we were able to use the system without needing this requirement.

Once logged in, you’ll notice that the winqual system does more than just error reporting, but that will be outside the scope of this article. 

In order to view reports, you’ll need to inform the system about your product.  Winqual has a small downloadable application that will inspect your binaries and create a file mapping XML file for your product.  Once you upload the XML file to winqual, the system will start collecting data for your product.

The system allows you to see a couple of different reports:  all crashes by version and hottest crashes by version.  These are the most common ones you’ll use.  It’s nice that it breaks the reports down by product and version.  You’re able to see a list of crashes, sorted by occurrences and counted.  You can see a date break-down for each crash of when the crash occurred.  For many of the crashes, you’re then able to download the crash data for debugging.

What’s missing though is the ability to search for crashes by date.  Given a date, you cannot see what crashes occurred.  This would be useful if you knew that a crash occurred on a certain day, then it would be easier to track down.

As it is now, the system is great for fixing your most common crashes.  However, the system is not good for finding and fixing particular crashes.  There’s no way to link a crash to a customer.  Microsoft says this is intentional.  However, in the end, as a support tool, it fails.

Also, it often takes a week to a week-and-a-half to get updated with crash data.  So you’ll be waiting a while for a particular crash to appear.

On the good side, when you do look at a particular crash and download the crash data, you’re able to easily load the data in Visual Studio 2008 and see the crash as if it occurred locally.  Nice touch.  The initial data does not contain heap data, but you can later instruct the system collect heap data on subsequent crashes.  So produce debugging information for your release binaries and keep archive them.

If you’re planning on certifying your product, the use of winqual error reporting is a requirement.  We were disappointed with this because we had our own error reporting system that we feel better serves support issues.  We got our report data immediately, and we are able to search the data any way we see fit.  Because we certified our product, we had to abandon our better system for Microsoft’s.

Useful Links

Selecting Multiple Files using CFileDialog, Properly

Posted by Matt | Filed under , , , ,

I ran into an issue today.  I had a CFileDialog object set to select multiple files (using OFN_ALLOWMULTISELECT).  When I launched the dialog, I selected 24 files.  However, after looping through the files using CFileDialog::GetNextPathName(), there were only 10 being returned.

The problem is that there wasn’t an error being set anywhere (that I could find).  CFileDialog was being quite sneaky in that it wasn’t giving me any signal that 14 of the selected files were basically being ignored.  At first, I thought it was just a bug in Windows.  After some thinking, it turns out that there’s an internal buffer used to store these files.  This may seem very obvious to some people, but it wasn’t obvious to me.

Part of the OPENFILENAME structure is a buffer for holding (a) the default file when the dialog is opened, and (b) the selected files when the dialog is closed.  This member is called lpstrFile.  nMaxFile is also used in conjunction to specify the length of the buffer pointed to by lpstrFile.

By default, CFileDialog uses a buffer 260 characters long.  In my problem above, 260 characters is basically able to only hold 10 of the selected files.  The rest were just dropped quietly.

The workaround is to use a larger buffer.  It’s quite easy to do this:

// Create our dialog object
CFileDialog dialog(TRUE, NULL, NULL, OFN_HIDEREADONLY | 
  OFN_ALLOWMULTISELECT);

// This is the trick
const int nBufferSize = 128*1024; // May be excessive?
TCHAR *szBuffer = new TCHAR[nBufferSize];
memset(szBuffer, 0, sizeof(TCHAR) * nBufferSize);
dialog.m_ofn.lpstrFile = szBuffer;
dialog.m_ofn.nMaxFile = nBufferSize - 1;

// Show the dialog
if (dialog.DoModal() == IDOK)
{
  POSITION pos = dialog.GetStartPosition();
  while (pos != NULL)
  {
    CString sFile = dialog.GetNextPathName(pos);
    // Process the file
  }
}

delete[] szBuffer; // No leaks

The trick is to set lpstrFile and nMaxFile to a larger buffer.  128k may be excessive, but at least it will work.  Feel free to use whatever size buffer you want to.

Note: Above, I created the buffer on the heap using ‘new’.  If you don’t want to deal with ‘new’ and ‘delete’, you could create it on the stack like:

TCHAR szBuffer[nBufferSize];

and skip the ‘delete’ call.  However, this will eat up a good chunk of your stack space.  The default stack is 1 MB.  So 128k is 10% of your stack.  For this reason, I created it on the heap.

Macromedia Flash as a performance enhancer, solved.

Posted by Matt | Filed under , , ,

Well, after much struggling and many uses of AQTime profiler, the issue has been solved.  The problem was not in the painting, which was where I was looking initially. My first hypothesis (along with most of my other) was that the video card was drawing slowly, but flash somehow kicked up the video acceleration, or something to that effect.  But I knew it wasn’t necessarily a hardware problem because a quick MFC test application worked find.  So it must be something in our code.

That entire time, I was looking at the wrong end of the pipeline.   I was looking at the paint events and other message handlers.  The problem was at the message pump end.

Many Windows programs have a message pump that looks like this:

MSG msg;
while (GetMessage(&msg, NULL, 0, 0))
{
    TranslateMessage(&msg);
    DispatchMessage(&msg);
}

We needed some custom functionality to handle idle-time functions, so our pump looked different:

for (;;)
{
    MSG msg;
    yield(); // So we don't hog the CPU
    if (PeekMessage(&msg, 0, 0, 0, PM_REMOVE))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
    else
    {
        doIdleWork();
    }
}

What this ended up doing is yielding the CPU between messages, even though we had messages still pending.

Moving the yield to after the idle work did the trick:

for (;;)
{
    MSG msg;
    if (PeekMessage(&msg, 0, 0, 0, PM_REMOVE))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
    else
    {
        doIdleWork();
        yield(); // So we don't hog the CPU
    }
}

The buttons paint faster, and theoretically, the application should be more responsive.  This chunk of code is over 10 years old, written for Windows 95/98.   We’ve noticed this problem for a while, but were unable to fix it until now.  All because we were looking in the wrong place.  In the end, it was a one-line fix.

Macromedia Flash as a performance enhancer?

Posted by Matt | Filed under , , ,

My application has run into an interesting problem on some computers:  it draws very slowly (especially toolbars).  However, on those same computers, if you go to a webpage that has a Macromedia Flash animation on it, the painting suddenly becomes lightning fast.

It could be Windows, it could be the video driver, it could be our code.

There is some common threads between the computers that are affected, many are Dell, but not all.

We cannot explain it.

The problem is that this is one of those things that once you start profiling or debugging, you’re then affecting the performance outcome which leads you down many false trails.  These are the worst types of problems to debug.

Tutorial: How to Get a List of Available Network Interfaces

Posted by Matt | Filed under , , , ,

If you're creating a network application, often you need to know if you have an active network connection to a LAN.  (Note that this is not the same thing as determining if you have a valid Internet connection.)  Winsock has a function which will let you get a list of available network interfaces:  WSAIoctl.

Using WSAIoctl, you can get a list of interfaces on the computer as well as some status information about them.  For example, it will tell you if the connection is up, loopback, point-to-point, supports multicasting, or supports broadcasting.

To get the list, you would use:

INTERFACE_INFO interfaces[32];
unsigned long nReturned = 0;
int nRet = WSAIoctl(s, // socket handle
    SIO_GET_INTERFACE_LIST, 
    0, 
    0, 
    &interfaces,
    sizeof(INTERFACE_INFO) * 32, 
    &nReturned, 
    0, 
    0);

if (nRet == SOCKET_ERROR) 
{
    int nError = WSAGetLastError();
    _tprintf(_T("Error getting interface list: %i\n"), nError);
    return;
}

After this, your list of interfaces will be in the array and the number of interfaces can be determined by:

int nNumInterfaces = nReturned / sizeof(INTERFACE_INFO);

From here, you can iterate through the interfaces checking for a connected interface.

for (int i = 0; i < nNumInterfaces; ++i) 
{
    INTERFACE_INFO *pIf = &interfaces[i];
    if ((pIf->iiFlags & IFF_LOOPBACK) != 0)
        continue;

    if ((pIf->iiFlags & IFF_UP) == 0)
        continue;

    // If you get here, you have a valid
    // interface to the LAN.
}

In the above case, we're looking for an interface that is up and not a loopback.

There is some other useful information in the INTERFACE_INFO structure:  ip address of the interface, netmask, broadcast mask, whether the connection is point-to-point, and supports multicasting or broadcasting.

I will reiterate that this does not tell you if you are connected to the Internet, nor will it tell you whether there's anything on the other end to communicate with.