Riak is a key-value store written in Erlang by a company called Basho. They originally wrote it for their own use, but then came to the conclusion that they could build a company around it, similar to the origin story of Redis. Basho also released a couple of other products, also using Erlang, as well as creating an auto-build system. For those out of the know, Erlang is a functional programming language written for high-volume, high-availability telephony installations, originally created by Ericsson. The VM for it is called OTP (Open Telecom Platform), but don’t be put off by the name: it can be used outside of telephone switches. There’s another NoSQL DB written in Erlang, CouchDB, which I wrote another article on. Erlang has been credited with enabling these programs to be written quickly and with multi-threading and high-availability built-in.

What makes Riak different than other NoSQL solutions is that it uses key-value for its storage and it comes with out-of-the-box clustering (key-ring cluster). It does this through a very similar model to Cassandra, allowing the application to specify how many nodes to write to and to read from, as well as allowing the application to talk to any node. It takes care of the background replication. Just like other NoSQL datastores, durability is supposedly achieved through replication rather than disk writes. Unlike Redis, it allows you to use a storage engine that can handle data sets larger than memory. This is another feature of Riak is that there are pluggable storage engines, similar to say MySQL (in fact, one of the two options is Innostore, an API to embedded InnoDB of MySQL fame).

The installation

Oh boy. There’s only a few new DBs that don’t work well with Windows, either natively or through the magic of Cygwin, but Riak is one of them. Why? Erlang works on Windows, so why should Riak not? I don’t know. I wish I did, but there you go. I tried compiling Erlang with Cygwin, but once again was limited by the extend of the libraries available. I stuck with this one for probably three or four hours, but after making sluggish progress I gave up and installed on an Ubuntu 10.10 Server VM. The installation took almost no time after Erlang was installed, and I was up and running.

The product

Riak is built to be like Dynamo, the eventually consistent key-value store built by Amazon, and described in their seminal paper. The Basho team took this paper and built a system very similar to this product, including features like a gossip protocol to detect failed nodes as well as vector clocks to resolve versioning issues. Another feature is consistent hashing that allows nodes to be added and removed without too much fuss.

The basic design is a pluggable datastore with a server written in Erlang exposing a REST API as well as a Protocol Buffers API. The storage is key-value, with the addition of buckets (groupings of keys, similar to a table or a database).

The client

Fortunately, this took almost no time because Basho already had a Java client. Man, what a joy that was! The Java client is about 123K, and it was slightly more complex than the Redis client, just because of the bucket organization of Riak.

The implementation

Riak just like Redis and Kyoto Tycoon has a very simple client API, so I just used the client and JSON serialization to throw my objects into the bit bucket. I used the etags feature to tag the items that I wanted to retrieve by a secondary key. That took no time and I had my implementation done in less than an hour.

The conclusion

Riak is a solid product that has been used by several customers in a production environment, and it is backed up by a commercial company that pushes new features and provides (paid) support. Riak has some interesting features and I barely event touched on them: MapReduce, clustering, and pluggable back-ends. I think this could come in handy, but frankly I want really boring features like: replica consistency, durability, and data modeling. Oh, yeah, and hot point-in-time backups.

Why am I bringing up those features? I think Riak is trying to position itself as a possible replacement to MySQL, and it is focusing on providing that sphere of features. I think that eventually they will have to write those features, but most RDBMS already has those. Why trust your data with untested systems that lack basic functionality? On the other hand, it has a more advanced protocol than say, Redis, so implementing a .NET client (which is not officially supported) takes some time. Also, it has more overhead than Redis. Why use product that is stuck between the beautiful minimalism of Redis or the powerful feature-mine of Postgres? You tell me.