I used Apache Giraph during a project for my studies. There is a great official Quick Start-Tutorial on how to set up Giraph on top of Apache Hadoop 0.20.203.0-RC1 and run your first example application. When you follow it, you will have a running environment where you can test your code. Still, you won’t have any idea, how to write it (at least I didn’t).
A friend of mine pointed me to her blog posts about Giraph:
This gave me a clearer picture. I understood better, what is happening and how to make new things. (Think as a vertex, you are the vertex.) But there was still a clear explanation of the code missing. Ok, Apache Giraph is an open source project, and there is a giraph-examples folder in the repository, so let’s just have a look at it.
Making sense of the code for SimpleShortestPathVertex wasn’t too difficult. It is only few lines, which are relevant and if we remove the logging and add some comments it will be actually easy to grasp it:
Now the problem with this example is, that it’s too easy. All messages and the internal state consist of only one
Double-Value. In reality, however, this won’t be enough, and I just could not find anything that would describe the next steps. So I had to experiment and eventually wrote my own simple algorithm to learn how to write code for Giraph.
In the upcoming days I’ll publish more blog posts about:
- Implementing a simple hop-count (or modified shortest-path) algorithm in Giraph (source code)
- Implementing Ja-be-Ja algorithm in Giraph (source code)
- And why not to use Giraph for edge-centric algorithms