The case of a hard to reproduce bug



This is the story of a bug that was hard to reproduce. It is about the investigation and it shows how well intended code can make bugs even harder to reproduce. The bug (and the fix) is trivial. However, a combination of factors make it remarkable.

When Know Inc. starts for the first time after a clean installation, it displays a popup window with an introduction to the game. After the dialog closes it should not show up again. However, on some installations, the dialog would still appears on the second or even third game start. This would happen until a mission was started. Then, it would stop showing up no matter what. The bug was reported about a month ago and until two days ago I was not able to reproduce it.

I uninstalled the game on my phone and reinstalled it from the Store. Then I noticed that every time I start the game, I get the intro popup. On Windows Phone, if you install the game from the Store you cannot access the isolated storage where the files are stored. That can only be done for side-loaded applications. So I had the bug reproducing on my phone but no way to investigate.

I uninstalled the store version of the game and installed the developer version so I can debug. Guess what? I couldn’t reproduce it anymore. At least I knew that it is somehow reproducible on my phone. I tried to remember what I did with the store version before I noticed the bug. I opened the game… clicked the Story button… probably closed the game… opened it again… started a mission… dismissed the how to play dialog… closed the game… I knew it was reproducible, I just had to find the correct steps. Tried for an hour or so but nothing. At that point, I didn’t know that it only works on clean installations (you’ll see soon why). No luck, so that was it, no repro.

I was working on a new feature when I saw the bug again. Eureka! Repro on developer build. The first thing I did was to enable all the exceptions to be thrown. I noticed that a SerializationException was thrown but it was hidden by a catch. The code below makes sure that even if the player profile file gets corrupted, the game can still recover and it doesn’t crash. The invalid profile is discarded and a new profile is created. While not a nice solution, it prevents situations when the entire game gets in a corrupted state.

PlayerProfile profile = null;
try
{
    using (IsolatedStorageFile store = IsolatedStorageFile.GetUserStoreForApplication())
    {
        if (store.FileExists(ProfileFileName))
        {
            using (IsolatedStorageFileStream profileFile = store.OpenFile(ProfileFileName, FileMode.Open, FileAccess.Read))
            {
                DataContractSerializer serializer = new DataContractSerializer(typeof(PlayerProfile));
                profile = serializer.ReadObject(profileFile) as PlayerProfile;
            }
        }
    }
}
catch (SerializationException)
{
    profile = new PlayerProfile();
}
if (profile == null)
{
    profile = new PlayerProfile();
}

A SerializationException means that whatever was saved before is invalid. (Ugh!) The player profile file is an XML that stores information about the completed missions and some flags about what dialogs were already displayed. If this file is corrupted, a new profile is created and all the flags are reset - it makes sense now why the game intro shows on every game start. However, the question are “why?” and “how to fix it?”.

I looked over the code that saves the game and there was nothing obviously wrong:

DataContractSerializer serializer = new DataContractSerializer(typeof(PlayerProfile));
using (IsolatedStorageFile store = IsolatedStorageFile.GetUserStoreForApplication())
{
    using (IsolatedStorageFileStream profileFile = store.OpenFile(ProfileFileName, FileMode.OpenOrCreate, FileAccess.Write))
    {
        serializer.WriteObject(profileFile, profile);
    }
}

However, I downloaded the profile file from the isolated storage and indeed it was invalid (notice the extra > at the end of the file):

<profile xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/VH.Cresson.Game">
  <notifications>
    <howtoplay>false</howtoplay>
    <introgame>true</introgame>
  </notifications></p>
<played>0</played>
  <story />
  <won>0</won>
</profile>>

How is that even possible? Is that a bug in the XML data serializer?

Pause here for a minute and try to see if you can find the bug yourself.

Did you figure it out? The bug is in the code that saves the profile file:

DataContractSerializer serializer = new DataContractSerializer(typeof(PlayerProfile));
using (IsolatedStorageFile store = IsolatedStorageFile.GetUserStoreForApplication())
{
    using (IsolatedStorageFileStream profileFile = store.OpenFile(ProfileFileName, FileMode.OpenOrCreate, FileAccess.Write))
    {
        serializer.WriteObject(profileFile, profile);
    }
}

Notice the FileMode on the highlighted line. The mode means that if the file exists, it is opened and the existing content is preserved in the stream. The serializer overwrites the content of the stream. However, if the XML representation of the new profile is shorter than the previous one, there will be some leftovers from the old profile. Here is an example:

Before (on disk):

<profile>Hello world</profile>

Object to write:

<profile>Hello</profile>

After (on disk):

<profile>Hello</profile>ofile>

But now it gets interesting. Why was this so hard to find? Because in the case of Know Inc., most of the changes to the profile are additive. However, in the case of popups, the flag gets changed from false to true (5 characters -> 4 characters). Which means that unless other additive changes are made to the profile, we get one character from the old profile file (the extra >).

What is more interesting, is that almost no one will hit this bug because all the boolean flags are changed only before a mission is won. There are no flags that get changed after that. You can call this a lucky coincidence.

As I mentioned in the beginning of the post, the fix is trivial: change the file mode from OpenOrCreate to CreateNew. This way, the stream is empty by default.