I had a very nice design and setup ready for my Hextille project when I decided to dabble with serialization. Like the old sweater where you pull on one string only to dislodge another, my design had to change to allow for serialization. In the end, I just wanted to scream “Not now, silent singer. Not now!” Basically what I want to accomplish is to serialize a graph (vertices and edges) including support for cyclic references, generics, child nodes, enumerated types, etc.
I encountered a fair amount of pain trying to get various forms of serialization working. There’s a nice long article which may or may not ever see the light of day about that. Regardless, this post is specifically about how fast certain serialization techniques.
So, what forms of serialization are there? The main ones covered with any depth here will be:
|[Serializable]||Simplest and easiest implementation. You tag each object with the [Serializable] attribute and let the serializer work it out.|
|ISerializable Interface||Adding the ISerializable interface to your class objects lets you have fine-grained control over how things are serialized.|
|XML Attribute Serialization||Similar to the [Serializable] attribute, but outputs XML.|
|IXmlSerializable Interface||Similar to adding the ISerializable interface to your objects, but allows you to write output via an XML Serializer|
|JSON||Similar to XML serialization, except it produces somewhat more readable text with less verbosity.|
|Binary||Involves simply dumping members of your object(s) to a binary stream.|
Here’s the thing: every form of serialization has some form of drawback.
|Type||How it “Brings The Pain”|
|[Serializable]||Produces files that are a fair bit bigger than one might expect. Isn’t very fast (see below)|
|ISerializable Interface||Implementing the ISerializable Interface on your classes means that you can’t use the default [Serializable] method. Once you start implementing the interface, it’s all or nothing. Suffers performance issues (see below).|
|XML Serialization||For collections, only works on those which implement ICollection or IEnumerable. That means serialization of Dictionary<> is not possible without a custom solution (and once you use that solution, things get messy if you want to also use [Serializable]). Suffers massive performance and size issues (see below).|
|JSON||JSON produces very nice-looking, readable and compact output, except you can’t export a dictionary that isn’t <string, object>.|
|Binary||Saving things as binary-only loses all type information and has potential byte-order problems.|
There is a more in-depth writeup about the pros and cons of serialization techniques by Iván Loire.
[It's a Big, Big, Big World Out There]
When creating a Hextille<> data structure, you pass in a radius which represents how many cells you can go between the origin and the edge of the “map”. The formula for determining how many cells there are is as follows:
Each cell (vertex) contains path information (edges) to adjacent cells, including the destination, a cost and whether the path is open. For a Hextille<> with a radius of 1000, you’re looking at 3003001 vertices (the center cell is the odd one out) and 18006000 edges. The benefits (or at least the detriments) of this implementation are moderately clear. On the one hand, since this isn’t implemented using flat arrays, you need to have some contextual information about your location and the destination location. That adds a little bit of overhead to each cell and each path. On the other hand, you are given the ability to have a true graph structure and quick lookups, not to mention the ability to have a path to non-adjacent nodes. Since I’m also storing references to adjacent nodes, that creates a situation where memory becomes an issue at about R=1200. One positive aspect of this system is that extensions to the HexNode class don’t proportionally add much more to the map sizes.
For most potential games using this system, I don’t foresee needing more than R=100 for any individual map (and maps can be stitched together regardless).
[I Feel The Need... The Need For Speed]
So, given the above list of serialization options, how did they fare?
The first thing I did was to try out [Serializable]. The speed was poor, and every subsequent test just produced worse results:
The methodology employed was to create a Hextille<> of radius R (0 to 99), save it and then reload it. Since there was some overhead involved in creating the file, and saving a single cell time was so small, I tossed out all the 0-radius times. I used a binary formatter to serialize.
From the above it becomes obvious that the best “serialization” technique is to simply use the default [Serializable] attribute tag. For raw speed though, nothing beat simply outputting the data as binary. More on that below. The worst result — the IXmlSerializable Interface — took nearly 10 seconds to save a Hextille<> of radius=100. One might be able to get away with that in a map editor, but for anything real-time that just won’t work.
Loading times produced interesting results:
Here, again, we see [Serializable] as the best choice for “serialization”. What I found surprising was that the loading times for using either interface method came out about the same. Still, the loading time for a Hextille<> with R=99 took nearly 25 seconds to load. That’s simply unacceptable for a game.
However, the BinaryReader was blazingly fast. So fast that I quadruple-checked to ensure it was working properly. Want to know how fast it is? When creating a Hextille<>, the constructor will link the cells together with proper path information, which also contains a reference to the destination node. Obviously you cannot serialize a reference to an object in memory, so that information is lost when saving. To fix this, one has to re-link the references at load time. This re-linking isn’t included in the timing for [Serializable], ISerializable, or IXmlSerializable. Only the BinaryReader loading technique implemented the re-linking (so I could check it was importing properly). The time above for BinaryReader includes the re-linking. It’s that fast.
["All Programming Is An Exercise In Caching" -- Terje Mathisen]
“Okay”, I hear you think, “But what if the data is fitting nicely into the file cache when using the BinaryReader?” Well, that’s a good point. Outputting just the binary loses all type information, and everything is nicely aligned and small. What about that?
That’s obviously not the issue. A Hextille<> with a radius=250 has 188251 vertices and 1126500 edges and still it only took about 2.5 seconds to handle.
Even with the BinaryReader/Writer being the clear winner for storing or transmitting data, I still see uses for going with the [Serialization] attribute method. Losing type information is a big deal in situations where you may not know how the object was originally stored. It’s also ridiculously simple to add support to a project. I am a big fan of the fire-and-forget methodology, and so I will be revisiting using [Serialization] in the future.
Using XML to store data that doesn’t need to be edited by a human with a text-editor is a bad idea. There’s very little benefit, and if you’re going to go that route you might as well use something like JSON, provided you don’t need to store dictionaries that aren’t <string,object>. (There may be a fix for that, I’m still looking and will report back when I know with any certainty).
Overall, the most surprising thing was that using the ISerializable Interface turned out to be slower than the built-in method. I would have figured it to be faster since you control exactly how the object gets serialized. In a future post I’ll track down exactly why this is the case.