The most serious competition to SQL databases in the sphere of normal development comes from document systems. A document is a collection of key/value pairs, and they vary from document to document, even those that represent the same thing. There are several such systems on the market currently, and MongoDB is one such system written in C++ and made for clustering. The name indicates its function: storing a lot of data very quickly. It is open-source but created and maintained by a commercial company called 10gen.

When my company was looking for a database that was to help us with session storage, we looked at Mongo because we wanted really fast reads and writes, and we did not need durability (more on Mongo’s architecture later). One thing we did need was consistency, and Mongo does not guarantee this. We did find that it was very easy to setup and the .NET client was great. It also won the write tests with flying colors. We installed it on Linux (Ubuntu 10.10) for some of our tests as well as Windows, and both seemed to work great.

The product

Like many other NoSQL products, Mongo has clustering built-in from the ground up, but it is not in the Dynamo-family, instead implementing an architecture like Google’s BigTable that has specific nodes to control load-balancing. It uses sharding (horizontal partitioning) to spread the load. When you write to the system, you’re just writing to memory, and then Mongo writes it to disk. This allows incredible speed because you never have to wait for spinning disks, but it does not guarantee durability (there is an option is available through journaling, although this slows down writes and reads somewhat). The way that Mongo structures its data is also prone to massive corruption as well as lots of space on the disk being used. (This specific way is to write to disk in a similar way that in-memory data is stored, called memory-mapped files). There is only one writer and there are no transaction logs so no hot-backups. Backups can be run from offline secondary nodes, however.

Mongo uses its own serialization protocol called BSON (the B is for Binary) similar but different in important ways from the ever-popular JSON. At the logical level, there are databases, collections (tables), and documents (rows). A document is retrieved by a primary key or by a field within the document, or it can be returned by a map reduce query. Operations on single documents are atomic, but it cannot span multiple documents. The Mongo client will also return before a write is done, so you can get lost write errors, or there can be a read error if you try to read right after a write.

The client

Unfortunately, the Java client is not as well developed as the .NET client, and it does not included one-step serialization. I wrote a client that would allow you to serialize an object in one-step, but this took a little while and it is not recursive. Unlike CouchDB that can take a JSON doc, Mongo requires that it be put into BSON. It was not that difficult and the other client features were simple after learning the basic logical structure.

The implementation

I wrote the client very quickly. Of course, retrieving by a secondary key was difficult, but it didn’t take that long. The way you do this is querying and getting a cursor back, then iterating over that cursor to get the documents. I presume that the client is going back each time the cursor’s next function is called, but I hope that it caches at least a few at a time.

The conclusion

Document systems are incredibly powerful in that they have a schema but that schema does not need to be specified beforehand and can vary from document to document. Sometimes this is an asset when the very nature of data being stored is variable, but other times I would presume the lack of control over what gets saved to your database can be annoying. I regret that 10gen decided to write their own protocol when there are already so many standards out there. Translating datatypes is the main reason that people hate working with SQL, and creating another data-storage language for one product is nonsensical.

10gen has done a good job maintaining its products for both Linux and Windows, which I think is absolutely great. There are many downsides to Windows but it is still the best consumer OS by far (and I used an OS X machine as a workstation for a year). There are also many clients for it as well as a healthy amount of documentation.

In the end, my company decided to go with Redis because it fit our needs: low administration, fast writes and reads, and a very simple API. Mongo is a definitely a product that I will continue watching, because I think it has potential due to the professional quality of 10gen, but not because it has the best vision or technology. The best use for it would be medium-priority data that varies a lot, and you want to access it with secondary keys and map reduce. In the end, I would not use it lest you face mongo data loss and with it loss of your money.

2 Responses to MongoDB

  1. Great article :)

    You are right that MongoDB _in its default configuration_ does not offer durability, but since 1.8 it has been possible to enable journaling, thus sacrificing some of the speed for durability and greatly reduced risk of corruption.

    See http://www.mongodb.org/display/DOCS/1.8+Release+Notes and http://www.mongodb.org/display/DOCS/Journaling for more info.