Technically Speaking: MongoDB

By

image

According to IBM, 90% of the world’s data was created in the last two years. In this data age, there’s an ever increasing need for tools and techniques to help analyze and store that data. Enter MongoDB, an open source, document-oriented database that was built to provide enhancements in data modeling and make data storage a breeze.

MongoDB takes a document-based approach to data modeling. This allows developers to model their data in whatever way makes sense to their application, without sacrificing anything in the ability to query their data or database performance. Compare this to relational database systems (such as MySQL and PostgreSQL), which model data in rows and columns like an Excel spreadsheet. Alas, not all data sets can conform to these strict structures.

A Brief History

MongoDB is an open source project started by Dwight Merriman and Eliot Horowitz. Dwight, former CTO of DoubleClick (acquired by Google), and Eliot, former engineer at DoubleClick and founder of ShopWiki, wanted to build distributed systems that could scale.

In 2007, they began building a product similar to Google App engine, called 10gen, which included a proprietary database. They realized the database they were building was an interesting product that could accomplish a lot of design goals for modern web applications, and thus MongoDB was born. 10gen, still headquartered in NYC, continues to support and develop MongoDB and employs most of its core contributors.

Fast forward five years – MongoDB is now used in thousands of products around the world, including Made in NYC startups such as foursquare, Thrillist, Art.sy, and larger institutions such as the New York Times and Forbes Media.

A Simplified Data Model

To get a better idea of how the document storage model works, let’s look through the structure and components of a simple blog:

  • Post: title, publish date, text, slug
  • Comments: comment, date
  • Tags: tag, URL
  • Category: category, URL
  • Users: username

In a relational database system, you’d have to create a table for each bullet point: one table for the post and its components, another for the comments and the comment dates, and so on. The cost of connecting these different tables – what we call “joins” – and performing query operations is very high, and often very slow.

MongoDB offers a simplified data model. Instead of tables, data are stored in documents that are represented using BSON (a JSON-like format). One can communicate with MongoDB using a BSON driver in a language like Ruby or using the MongoDB JavaScript shell.

The “schema-free” nature of MongoDB leads to a much faster development cycle for many people using MongoDB in production. It makes sense intuitively: a blog post’s comments and categories should be stored in the same place as the blog post, not in a separate table.

If you’re interested in learning more about MongoDB, download MongoDB and see for yourself how powerful it is to use and how easy it is to get started.