Ok, let’s first have a look at the vertex computation class, SimpleHopsComputation. In the compute-method you can see the types of the messages and the internal vertex state (SimpleHopsMessage, SimpleHopsVertexValue). You can also see, that the computation is kicked-off by asking all neighbors for the number of hops towards other vertices.
After that, a lot of messages are floating around which need to be processed. A message can be processed in 3 ways:
The current vector is not the final destination of the message. In this case, we will just forward it to all other neighbors.
The current vector is the final destination of the message. In this case, the vector sends the message back to its origin, so that this node can get the hop-count between the two vertices.
The message already has reached its final destination and came back. In this case we only need to update the number of hops towards the other vector in our internal state and it is done.
Vertex- & Message-data
Working with complex data-types in Apache Giraph is actually easy, you only need to consider two main things:
Implement the org.apache.hadoop.io.Writable Interface
Always have a public constructor without any parameters. This is necessary, so that Giraph is able to create your object via reflections.
You can see, this class looks like any other value-object class, except for implementing the Writable-interface. This interface brings with it the two void methods, write(DataOutput) and readFields(DataInput). Those are the methods, where you need to serialize or deserialize your class. Because SimpleHopsMessage contains only scalar values, the serialization part is easy. You can use the predefined DataOutput.writeX and DataInput.readX methods. But make sure, that the order in both methods, write and readFields remains the same.
SimpleHopsMessage was a peace of cake. Now let’s have a look at the more complex SimpleHopsVertexValue class. Instead of scalar values, we have now a Map<Long, Integer> and a Set<Map.Entry<Long, Long>>.
Looking at DataOutput and DataInput there are only ways how to read/write scalar values. So what can we do with complex values? For me, the easiest way of doing it, seemed to be, to serialize both Iterable variables simply element by element. Starting with the count of elements, and than Key followed by Value.
There is also a different way, how to accomplish something similar, in a more modular fashion, I’ll talk about this in a later post. For now, this is sufficient to serialize and deserialize the elements.
Using complex type within Giraph is easy and straight forward. Yet working with large datasets of lists and maps is not very efficient. They can either consume the memory, or slow down performance, when you are serializing and deserializing them.