It’s only been a year or two since I first discovered the idea of non-relational databases. I realized that in some strange way my brain had been conditioned to think of modeled data in a relational way, even though language after language I was using represented it hierarchically. It’s been the accepted status quo of structured data since the 1970’s. Who was I to question it? However, I grew more and more excited with the idea of storing my data in the same way I transmit it. as a structured document. The first NoSQL database I read about was MongoDB but shortly after, I heard of many others, and some that even run a fine line between if they are technically databases, or just in-memory caching layers. I got excited about using MongoDB because it still functioned as a traditional database with dynamic querying, but with the advantages to storing JSON-like documents. However, at the time my current employer wasn’t exactly known for trying new technologies. I was stuck with just playing with MongoDB in pet projects.
Fast forward to 2013, and things have changed. I am now employed at eMeals, where I have been given much more technical freedom. While building new services for eMeals, I got to use both DynamoDB and MongoDB against a couple of small real-world problems. Here’s a few interesting things I have found:
I originally loved the idea of using DynamoDB because of one very big distinction. DynamoDB technically isn’t a database, it’s a database service. Amazon is responsible for the availability, durability, performance, configuration, optimization and all other manner of minutia that I didn’t want occupying my mind. I’ve never been a big fan of managing the day-to-day operations of a database, so I liked the idea of taking that task off my plate. I began building some simple services with DynamoDB as the database, and I quickly found some limiting things. For example, let’s say we are building a simple RESTful API in node.js and we want to define a POST method to create a new resource. Assume the request body has already been validated. Here’s is what our DynamoDB example would look like in CoffeeScript with Express:
Sure, it looks simple enough, but what about the 50th and 100th time you have to remodel your hierarchical data into different, but still hierarchical data? What’s the point of a schema-less database, if I’m redefining the schema on every operation? Yes, the answer would be to write a model wrapper to automatically do this for you on every operation, but that’s still more code I need to write. Furthermore, let’s say we want to query items for a simple search service. DynamoDB only allows you to query against the primary key, or the primary key and range. There are ways to periodically index your data using a separate service like CloudSearch, but we are quickly losing the initial simplicity of it being a database service.
But let’s not throw the baby out with the bathwater. DynamoDB still has some very big advantages. First of all, it’s dynamically performant. If you have a particular task that requires predictable splkes in read or write requirements, than you are set. Secondly the range query works well for retrieving event logging data. At AWS Re:Invent, someone from the Weatherbug team showed how they used Dynamo to store lightning strikes with GPS coordinates and timestamps as secondary range indexes.
Ironically, it was a session at AWS Re:invent that initially scared me away from MongoDB. The session “Optimizing MongoDB on AWS” quickly turned into informal arguments from DBA types over which arcane settings would best optimize their MongoDB servers on EC2. I turned to my friend at the end of the session and said “So far, that was the most compelling argument for using DynamoDB I have heard.” I just didn’t want to play with server configurations like that. To quote a very wise eye-witness “Ain’t nobody got time for ‘dat”.
However, it turns out MongoDB isn’t quite as difficult as the nerds had me believe, at least not at our scale. MongoDB works as advertised and auto-shards and provides a very simple way to get up and running with replica sets. But so far, my favorite feature of MongoDB is just how easy it is to code against. The very same example of a RESTful resource POST against a resource becomes:
And if we want to query back an item on the GET method:
So not only is the big win here that we don’t have to remodel our data at the API layer, but we also have the future flexibility to query back our data against any of it’s fields.
So currently I’m making the finishing touches on our services authentication layer, and a mail and webhooks listener services, all using MongoDB to store and retrieve data. I haven’t turned my back on DynamoDB, but at this point it would take a very special kind of problem to make me choose DynamoDB over MongoDB.