CouchDB was my gateway drug into NoSQL. It was the most appealing because of its simplicity and its out-of-the-box functionality (a GUI admin, specifically). It is a document store which means it stores multiple fields, and those fields can be documents in and of themselves. One quote that I think is particularly telling of Couch is this: “CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP,” comment one developer (Jacob Kaplan-Moss). It uses HTTP as its application protocol and uses JSON to represent data models, and its equivalent of materialized views are written in JavaScript. Furthermore, it is written in Erlang, a language that was specifically created for massive concurrency; in other words, a language suited for large-scale Web applications.

Installation

I work on Windows, and while I have several Linux VMs, I only use those as a last resort for testing new products because I like being able to develop on one system. I found an installer that had both Erlang and CouchDB in it, and it installed without any trouble. Installing on Ubuntu was as simple running “apt-get install couchdb”.

Setup

The installer handled most of the setup so really nothing to report. CouchDB stores things in databases, but it doesn’t require that they be created before being used. All I had to do was start CouchDB and begin inserting documents into a database.

Clients

I looked around at the various Java clients for Couch, but most required that I change my data objects. Having just finished experimenting with db4o, I knew this was not necessary and so I decided to implement my own.

Model

Couch does not require that you specify anything beforehand, except views, which are the main way of querying a database. The first container is a database, and after that, there are just documents, or JSON objects. The objects are not required to have any of the same fields to be in a database. You write MapReduce JavaScrpt that builds a materialized view. These can take a while to build, so just make a note of that.

Writing a client

Since it uses HTTP, writing a client required two things: an HTTP client and a JSON library. I chose Apache Commons and Flexjson (SourceForge) to fill these roles, the former for obvious reasons and the latter because it required no modification of the objects it was serializing or deserializing. I also wanted to be able use Couch views to gather up collections of objects, and for this I used Jackson (Codehaus). The biggest trouble with these was the difficulty of the finding all the dependencies. If you are doing the same thing, I would recommend looking at the site itself for dependencies, or google “Jackson dependencies”.

That was the hardest part. After that, it was just a matter of reading the documentation on how to implement the various CRUD functions, and then how to do basic query, which I found out was as simple as adding get variables. This is an important mention, so take note of it: you can do queries on things other than the primary key: just use the standard query string (localhost:5984/db/views/view?x=y) in your URL.

Unforeseen complications

CouchDB uses an MVCC design to accomplish concurrency. For those who don’t know, that means that there are multiple versions of an object, which in Couch means that you have to reference both an object ID and a version ID. Consequently, I could not just use my data objects without adding a secondary ID, unfortunately. (This is not altogether true, but the other way of accomplishing this would be to store the object and the version IDs in the same field, something that strikes me as denormalizing, and I’m sure you’ve dealt with those types of issues before). It is rather strange that you cannot just write to the latest version, but I would guess this is the way that Couch enables current writing and access.

Replication

The process is rather simple but has some downsides. It comes in two flavors: replicate once or continuously. You can’t enable continuous replication from the graphic interface, so you have to use wget or curl. It was easy to setup after making sure that the two nodes are not binding to the loopback address, but instead are binding to all addresses or the specific external IP of your choice. You control this through Futon or through the ini file. If the server restarts, the replication goes down.

Downsides

The downsides of Couch are many. Replication is not that stable, it seems, and it is one-way, master to slave. The first version (v1.0) would accidentally delete data. Views are not to be created at run-time. There is no binary protocol. There is no clustering. Developers have to roll their own authentication and security. Jobs are also something that don’t come out-of-the-box, but I suppose you can always use cron or scheduled tasks, but both of those are lacking much functionality available say in SQL Server. Hot backups are not available, although you can snapshot to another server.

Uses

I think Couch has some really good applications: mobile apps (it is available for the iOS and Android), small apps that need rapid development but also some interoperability gained by JSON, and applications that have variable schema (specifically schema that has some fixed parts, but also variable fields). If you used a client like I built, you don’t have to do much to start using Couch.

Summary

Couch makes a great little store for applications that won’t get that big and need the richness and flexibility of the document model. Being a paranoid DBA, I would not trust it in production to handle important data or large loads. The motto of Couch is relax, but the creators of this software need to get going if they want to build a product that can handle the big-time.