Last few months I have been working on a highly concurrent blockchain system. We choose Go language as it has some nice language level constructs to deal with concurrency. The overall experience of using Go for the project had been very pleasant because of the fast compilation time and server startup time. These two along make it very productive for the development team.
There was one nagging issue that took a while to figure out. The protocol requires collecting a set of unique tickets. These tickets were stored as an array within a struct representing the block (of the block chain). Occasionally, we see the block being received to contain duplicate tickets. This should never happen. We took care that this never happens with a mutex as tickets can be added concurrently (very rarely).
It turns out the reason for the duplicates is a combination of concurrency, reflection and marshalling. That is, while the struct is getting marshalled, the slice is getting simultaneously updated resulting in this problem. The reason it took a while to figure this out is, the modification is done in a such a way that it doesn’t preserve the order of the original tickets when merging another set of tickets.
If you look at the code in Msgpack, at
https://github.com/vmihailenco/msgpack/blob/master/encode_slice.go
it first gets the length of the array and iterates over that. So, while the iteration is happening, even if the array grows, it will only marshall up to the initial length. That’s not the problem. The real problem is, the reflection on a struct field always returns the latest value in the field. So, even if the array field has been changed, the reflected value will always gives the array field of the struct and the Index function will return the latest value in the index. Obviously this is the right behavior as reflection shouldn’t be working off of copies of data as of a call to Value or Field.
So, what was the fix? Ensuring that the merging happens by only appending new tickets and not disturbing the order of existing tickets. It was OK to marshall only a subset of tickets based on the length at the time of calling the encoding function.
One interesting aspect of this is that when dealing with marshalling, it’s practically not possible to do fine grained locking to the field level since the marshalling is for the enclosed object (immediate or multiple levels above).