mongodb

The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js

By Valeri Karpov, Kernel Tools engineer at MongoDB and and co-founder of the Ascot Project

A few weeks ago, a friend of mine asked me for help with PostgreSQL. As someone who’s been blissfully SQL-­free for a year, I was quite curious to find out why he wasn’t just using MongoDB instead. It turns out that he thinks MongoDB is too difficult to use for a quick weekend hack, and this couldn’t be farther from the truth. I just finished my second 24 hour hackathon using Mongo and NodeJS (the FinTech Hackathon co­sponsored by 10gen) and can confidently say that there is no reason to use anything else for your next hackathon or REST API hack.

Keep reading

How to use MongoDB as a pure in-memory DB (Redis style)

The idea

There has been a growing interest in using MongoDB as an in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like:

  • a write-heavy cache in front of a slower RDBMS system
  • embedded systems
  • PCI compliant systems where no data should be persisted
  • unit testing where the database should be light and easily cleaned

That would be really neat indeed if it was possible: one could leverage the advanced querying / indexing capabilities of MongoDB without hitting the disk. As you probably know the disk IO (especially random) is the system bottleneck in 99% of cases, and if you are writing data you cannot avoid hitting the disk.

One sweet design choice of MongoDB is that it uses memory-mapped files to handle access to data files on disk. This means that MongoDB does not know the difference between RAM and disk, it just accesses bytes at offsets in giant arrays representing files and the OS takes care of the rest! It is this design decision that allows MongoDB to run in RAM with no modification.

How it is done

This is all achieved by using a special type of filesystem called tmpfs. Linux will make it appear as a regular FS but it is entirely located in RAM (unless it is larger than RAM in which case it can swap, which can be useful!). I have 32GB RAM on this server, let’s create a 16GB tmpfs:

# mkdir /ramdata
# mount -t tmpfs -o size=16000M tmpfs /ramdata/
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvde1             5905712   4973924    871792  86% /
none                  15344936         0  15344936   0% /dev/shm
tmpfs                 16384000         0  16384000   0% /ramdata

Now let’s start MongoDB with the appropriate settings. smallfiles and noprealloc should be used to reduce the amount of RAM wasted, and will not affect performance since it’s all RAM based. nojournal should be used since it does not make sense to have a journal in this context!

dbpath=/ramdata
nojournal = true
smallFiles = true
noprealloc = true

After starting MongoDB, you will find that it works just fine and the files are as expected in the FS:

# mongo
MongoDB shell version: 2.3.2
connecting to: test
> db.test.insert({a:1})
> db.test.find()
{ "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 }

# ls -l /ramdata/
total 65684
-rw-------. 1 root root 16777216 Apr 30 15:52 local.0
-rw-------. 1 root root 16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root        5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root 16777216 Apr 30 15:52 test.0
-rw-------. 1 root root 16777216 Apr 30 15:52 test.ns
drwxr-xr-x. 2 root root       40 Apr 30 15:52 _tmp

Now let’s add some data and make sure it behaves properly. We will create a 1KB document and add 4 million of them:

> str = ""

> aaa = "aaaaaaaaaa"
aaaaaaaaaa
> for (var i = 0; i < 100; ++i) { str += aaa; }

> for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});}
> db.foo.stats()
{
        "ns" : "test.foo",
        "count" : 4000000,
        "size" : 4544000160,
        "avgObjSize" : 1136.00004,
        "storageSize" : 5030768544,
        "numExtents" : 26,
        "nindexes" : 1,
        "lastExtentSize" : 536600560,
        "paddingFactor" : 1,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 129794000,
        "indexSizes" : {
                "_id_" : 129794000
        },
        "ok" : 1
}

The document average size is 1136 bytes and it takes up about 5GB of storage. The index on _id takes about 130MB. Now we need to verify something very important: is the data duplicated in RAM, existing both within MongoDB and the filesystem? Remember that MongoDB does not buffer any data within its own process, instead data is cached in the FS cache. Let’s drop the FS cache and see what is in RAM:

# echo 3 > /proc/sys/vm/drop_caches 
# free
             total       used       free     shared    buffers     cached
Mem:      30689876    6292780   24397096          0       1044    5817368
-/+ buffers/cache:     474368   30215508
Swap:            0          0          0

As you can see there is 6.3GB of used RAM of which 5.8GB is in FS cache (buffers). Why is there still 5.8GB of FS cache even after all caches were dropped?? The reason is that Linux is smart and it does not duplicate the pages between tmpfs and its cache… Bingo! That means your data exists with a single copy in RAM. Let’s access all documents and verify RAM usage is unchanged:

> db.foo.find().itcount()
4000000

# free
             total       used       free     shared    buffers     cached
Mem:      30689876    6327988   24361888          0       1324    5818012
-/+ buffers/cache:     508652   30181224
Swap:            0          0          0
# ls -l /ramdata/
total 5808780
-rw-------. 1 root root  16777216 Apr 30 15:52 local.0
-rw-------. 1 root root  16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root         5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root  16777216 Apr 30 16:00 test.0
-rw-------. 1 root root  33554432 Apr 30 16:00 test.1
-rw-------. 1 root root 536608768 Apr 30 16:02 test.10
-rw-------. 1 root root 536608768 Apr 30 16:03 test.11
-rw-------. 1 root root 536608768 Apr 30 16:03 test.12
-rw-------. 1 root root 536608768 Apr 30 16:04 test.13
-rw-------. 1 root root 536608768 Apr 30 16:04 test.14
-rw-------. 1 root root  67108864 Apr 30 16:00 test.2
-rw-------. 1 root root 134217728 Apr 30 16:00 test.3
-rw-------. 1 root root 268435456 Apr 30 16:00 test.4
-rw-------. 1 root root 536608768 Apr 30 16:01 test.5
-rw-------. 1 root root 536608768 Apr 30 16:01 test.6
-rw-------. 1 root root 536608768 Apr 30 16:04 test.7
-rw-------. 1 root root 536608768 Apr 30 16:03 test.8
-rw-------. 1 root root 536608768 Apr 30 16:02 test.9
-rw-------. 1 root root  16777216 Apr 30 15:52 test.ns
drwxr-xr-x. 2 root root        40 Apr 30 16:04 _tmp
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvde1             5905712   4973960    871756  86% /
none                  15344936         0  15344936   0% /dev/shm
tmpfs                 16384000   5808780  10575220  36% /ramdata

And that verifies it! :)

What about replication?

You probably want to use replication since a server loses its RAM data upon reboot! Using a standard replica set you will get automatic failover and more read capacity. If a server is rebooted MongoDB will automatically rebuild its data by pulling it from another server in the same replica set (resync). This should be fast enough even in cases with a lot of data and indices since all operations are RAM only :)

It is important to remember that write operations get written to a special collection called oplog which resides in the local database and takes 5% of the volume by default. In my case the oplog would take 5% of 16GB which is 800MB. In doubt, it is safer to choose a fixed oplog size using the oplogSize option. If a secondary server is down for a longer time than the oplog contains, it will have to be resynced. To set it to 1GB, use:

oplogSize = 1000

What about sharding?

Now that you have all the querying capabilities of MongoDB, what if you want to implement a large service with it? Well you can use sharding freely to implement a large scalable in-memory store. Still the config servers (that contain the chunk distribution) should be disk based since their activity is small and rebuilding a cluster from scratch is not fun.

What to watch for

RAM is a scarce resource, and in this case you definitely want the entire data set to fit in RAM. Even though tmpfs can resort to swapping the performance would drop dramatically. To make best use of the RAM you should consider:

  • usePowerOf2Sizes option to normalize the storage buckets
  • run a compact command or resync the node periodically.
  • use a schema design that is fairly normalized (avoid large document growth)

Conclusion

Sweet, you can now use MongoDB and all its features as an in-memory RAM-only store! Its performance should be pretty impressive: during the test with a single thread / core I was achieving 20k writes per second, and it should scale linearly over the number of cores.

NoSQL and Full Text Indexing: Two Trends

On one side:

  1. DataStax with Solr
  2. MapR with LucidWorks Search (nb: Solr)

and on the other side:

  1. Riak Searching: Solr-like but custom prioprietary implementation
  2. MongoDB text search: custom prioprietary implementation

I’m not going to argue about the pros and cons of each of these approaches, but I’m sure you already know which of these approaches I’m in favor of.

Original title and link: NoSQL and Full Text Indexing: Two Trends (NoSQL database©myNoSQL)

MongoDB を使ってみる
MongoDB は NoSQL と呼ばれるデータベースソフトウェアの一つ。 どうやら、概ね RDB (Relational Database) と KVS (Key Value Store) の中間にあたる機能とパフォーマンスを提供しているようだ。 つまり、RDB からリレーションやトランザクションなど幾つかの機能を省くことで KVS に迫る高いパフォーマンスを実現している。 今回は、その MongoDB を同梱のクライアントを通して使ってみることにする。

まずは MongoDB をインストールする。 OS X であれば Homebrew で入れるのが楽で良い感じ。
$ brew install mongodb

インストールしたら、まずは MongoDB のサーバを起動する。
$ mongod --config /usr/local/etc/mongod.conf

次に、クライアントとなるコマンド ‘mongo’ で接続する。
$ mongo

接続できたら ‘use’ コマンドで使用するデータベースを宣言する。
> use testdb
switched to db testdb

この時点では、まだ中身が何も無いので実際には何もできていないようだ。
> show dbs
admin  (empty)
local  0.078GB

次に、RDB のテーブルに相当する「コレクション」の中に、同じくレコードに相当する「ドキュメント」を作成する。 以下はデータベースの中に ‘users’ という名前のコレクションを作り、そこに ‘name’ と ‘age’ という要素を持ったドキュメントを追加している。 見ての通り、MongoDB で扱う内容は JSON (正しくは BSON) になっている。
> db.users.insert({"name": "Foo", "age": 10});
WriteResult({ "nInserted" : 1 })

この時点で中身ができたのでデータベースに内容が反映されている。
> show dbs
admin   (empty)
local   0.078GB
testdb  0.078GB

確認すると、新たに ‘users’ という名前のコレクションができている。 ‘system.indexes’ は、検索を高速化するためのインデックスに関する情報だろう。
> db.getCollectionNames()
[ "system.indexes", "users" ]

追加したドキュメントは find() で確認できる。 ドキュメントの識別子 (_id) が自動的に追加されている。 RDB で言うところのプライマリキーで、かつサロゲートキーかな。
> db.users.find()
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 10 }

動作確認用として、同様にドキュメントを追加していく。
> db.users.insert({"name": "Bar", "age": 20});
WriteResult({ "nInserted" : 1 })
> db.users.insert({"name": "Baz", "age": 30});
WriteResult({ "nInserted" : 1 })
> db.users.find()
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 10 }
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }
{ "_id" : ObjectId("544cedccc5c20be818a66323"), "name" : "Baz", "age" : 30 }

find() は SQL の ‘where’ 節のように結果を絞り込むことができる。 例えば名前に ‘Bar’ を持つものだけを選ぶにはこうする。
> db.users.find({"name": "Bar"});
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }

値は決め打ちではなく、各種の修飾子を使って柔軟に指定できる。 例えば年齢が 15 歳を越えるものを得るには $gt 修飾子を使ってこうすれば良い。
> db.users.find({age: {$gt: 15}});
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }
{ "_id" : ObjectId("544cedccc5c20be818a66323"), "name" : "Baz", "age" : 30 }

正規表現を使って名前が ‘B’ から始まるものを得るといったこともできる。
> db.users.find({ "name": { $regex: /^B/}})
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }
{ "_id" : ObjectId("544cedccc5c20be818a66323"), "name" : "Baz", "age" : 30 }

複数の条件を組み合わせるときは $or や $and 修飾子が使える。 年齢が 25 歳未満、または名前が ‘F’ から始まるユーザを選んでみる。
> db.users.find({ $or: [{ "age": { $lt: 25} }, { "name": { $regex: /^F/ }}]});
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 10 }
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }

find() でドキュメントの特定の要素だけを得たい場合には第二引数を指定する。 名前だけ取り出してみる。
> db.users.find({}, {"name": true, "_id": false})
{ "name" : "Foo" }
{ "name" : "Bar" }
{ "name" : "Baz" }

find() で選んだ内容を特定のルールにもとづいて並び替えてみる。 年齢を昇順で表示させてみよう。
> db.users.find().sort({"age": 1})
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 10 }
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }
{ "_id" : ObjectId("544cedccc5c20be818a66323"), "name" : "Baz", "age" : 30 }

降順の場合は -1 を指定する。
> db.users.find().sort({"age": -1})
{ "_id" : ObjectId("544cedccc5c20be818a66323"), "name" : "Baz", "age" : 30 }
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 20 }
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 10 }

sort() と limit() を組み合わせることでソート後の上位を取得できる。 例えば年齢が最も低いドキュメントを選び出すには、こうすれば良い。
> db.users.find().sort({"age": 1}).limit(1)
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 10 }

コンソールでは JavaScript 的な操作もできるので、以下のように計算を絡めて上位 n% の結果を得る、といったことも楽にできそう。
> var cnt = db.users.count()
> Math.floor(cnt / 2)
1 

次はドキュメントの内容を更新してみる。 例えば ‘Foo’ ユーザの年齢を 5 歳に変更する。 値を更新するには update() を使う。
> db.users.update({"name": "Foo"}, {$set: {"age": 5}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({"name": "Foo"})
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "name" : "Foo", "age" : 5 }

$set 修飾子は、既存のドキュメントのその要素だけを変更するのに使う。 もし、これを使わないと、選択したドキュメントがそれ自体に変わってしまう。
> db.users.update({"name": "Foo"}, {"age": 10})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({"name": "Foo"})
> db.users.find({"age": 10})
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "age" : 10 }

ちなみに update() はデフォルトで最初に該当する一つのドキュメントだけを更新する。 もし、すべてのドキュメントを更新したい場合には第三引数の ‘multi’ に true を指定する。 例えば、5 歳を越えるドキュメントの年齢を全て 100 歳に変更してみる。
> db.users.update({"age": {$gt: 5}}, {$set: {"age": 100}}, {"multi": true})
WriteResult({ "nMatched" : 2, "nUpserted" : 0, "nModified" : 2 })
> db.users.find()
{ "_id" : ObjectId("544cec8cc5c20be818a66321"), "age" : 5 }
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 100 }
{ "_id" : ObjectId("544cedccc5c20be818a66323"), "name" : "Baz", "age" : 100 }

update() には $set 以外にも便利な修飾子が用意されている。 例えば、$inc 修飾子を使うとドキュメントの内容をインクリメントできる。
> db.users.update({"name": "Bar"}, {$inc: {"age": 1}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({"name": "Bar"})
{ "_id" : ObjectId("544ceda3c5c20be818a66322"), "name" : "Bar", "age" : 101 }

不要になったドキュメントは remove() で削除できる。 例えば全てのドキュメントを削除するにはこうすれば良い。
> db.users.remove({})
WriteResult({ "nRemoved" : 3 })
> db.users.find()

もし、操作内容で分からないことがあれば help を見れば良い。
> help
> db.help()
基本的な操作に関してはこんなところかな。

正直、MongoDB の使いどころは意外と限られるような気がしている。 これは、アプリケーション側でリレーションを担保することが現実的でないという考えをぼく自身が持っていることに由来するけど、MongoDB はドキュメント間のリレーションや複数のドキュメントに渡るトランザクションをデータベース自体が提供していないという点が大きい。 RDB のようにあらかじめスキーマを決める必要のないスキーマレスという特徴も、実際には簡単に不整合が起こりうる温床となるわけで、そのゆるっとした印象とは裏腹に、システムが担保してくれない部分をユーザが厳格に管理することが求められるんじゃないだろうか。 とはいえ、リレーションやトランザクションを必要としない分野、ぱっと思いつく限りでは情報を累積的にストアしていくような用途にはマッチしていそう。 例えば、アプリケーションで発生したイベントなんかを保存していく先としては良さそうな気がしている。 次は各種言語バインディングから操作したいな。
Schema Design for Social Inboxes in MongoDB

Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application when determining the best design. In this post, we’ll explore three approaches for using MongoDB when creating social inboxes or message timelines.

If you’re building a social network, like Twitter for example, you need to design a schema that is efficient for users viewing their inbox, as well as users sending messages to all their followers. The whole point of social media, after all, is that you can connect in real time.

There are several design considerations for this kind of application:

  • The application needs to support a potentially large volume of reads and writes.
  • Reads and writes are not uniformly distributed across users. Some users post much more frequently than others, and some users have many, many more followers than others.
  • The application must provide a user experience that is instantaneous.
  • Edit 11/6: The application will have little to no user deletions of data (a follow up blog post will include information about user deletions and historical data)

Because we are designing an application that needs to support a large volume of reads and writes we will be using a sharded collection for the messages. All three designs include the concept of “fan out,” which refers to distributing the work across the shards in parallel:

  1. Fan out on Read
  2. Fan out on Write
  3. Fan out on Write with Buckets

Each approach presents trade-offs, and you should use the design that is best for your application’s requirements.

The first design you might consider is called Fan Out on Read. When a user sends a message, it is simply saved to the inbox collection. When any user views their own inbox, the application queries for all messages that include the user as a recipient. The messages are returned in descending date order so that users can see the most recent messages.

To implement this design, create a sharded collection called inbox, specifying the from field as the shard key, which represents the address sending the message. You can then add a compound index on the to field and the sent field. Once the document is saved into the inbox, the message is effectively sent to all the recipients. With this approach sending messages is very efficient.

Viewing an inbox, on the other hand, is less efficient. When a user views their inbox the application issues a find command based on the to field, sorted by sent. Because the inbox collection uses from as its shard key, messages are grouped by sender across the shards. In MongoDB queries that are not based on the shard key will be routed to all shards. Therefore, each inbox view will be routed to all shards in the system. As the system scales and many users go to view their inbox, all queries will be routed to all shards. This design does not scale as well as each query being routed to a single shard.

With the “Fan Out on Read” method, sending a message is very efficient, but viewing the inbox is less efficient.

Fan out on Read is very efficient for sending messages, but less efficient for reading messages. If the majority of your application consists of users sending messages, but very few go to read what anyone sends them — let’s call it an anti-social app — then this design might work well. However, for most social apps there are more requests by users to view their inbox than there are to send messages.

The Fan out on Write takes a different approach that is more optimized for viewing inboxes. This time, instead of sharding our inbox collection on the sender, we shard on the message recipient. In this way, when we go to view an inbox the queries can be routed to a single shard, which will scale very well. Our message document is the same, but now save a copy of the message for every recipient.

With the “Fan Out on Write” method, viewing the inbox is efficient, but sending messages consumes more resources.

In practice we might implement the saving of messages asynchronously. Imagine two celebrities quickly exchange messages at a high-profile event - the system could quickly be saturated with millions of writes. By saving a first copy of their message, then using a pool of background workers to write copies to all followers, we can ensure the two celebrities can exchange messages quickly, and that followers will soon have their own copies. Furthermore, we could maintain a last-viewed date on the user document to ensure they have accessed the system recently - zombie accounts probably shouldn’t get a copy of the message, and for users that haven’t accessed their account recently we could always resort to our first design - Fan out on Read - to repopulate their inbox. Subsequent requests would then be fast again.

At this point we have improved the design for viewing inboxes by routing each inbox view to a single shard. However, each message in the user’s inbox will produce a random read operation. If each inbox view produces 50 random reads, then it only takes a relatively modest number of concurrent users to potentially saturate the disks. Fortunately we can take advantage of the document data model to further optimize this design to be even more efficient.

Fan out on Write with Buckets refines the Fan Out on Write design by “bucketing” messages together into documents of 50 messages ordered by time. When a user views their inbox the request can be fulfilled by reading just a few documents of 50 messages each instead of performing many random reads. Because read time is dominated by seek time, reducing the number of seeks can provide a major performance improvement to the application. Another advantage to this approach is that there are fewer index entries.

To implement this design we create two collections, an inbox collection and a user collection. The inbox collection uses two fields for the shard key, owner and sequence, which holds the owner’s user id and sequence number (i.e. the id of 50-message “bucket” documents in their inbox). The user collection contains simple user documents for tracking the total number of messages in their inbox. Since we will probably need to show the total number of messages for a user in a variety of places in our application, this is a nice place to maintain the count instead of calculating for each request. Our message document is the same as in the prior examples.

To send a message we iterate through the list of recipients as we did in the Fan out on Write example, but we also take another step to increment the count of total messages in the inbox of the recipient, which is maintained on the user document. Once we know the count of messages, we know the “bucket” in which to add the latest message. As these messages reach the 50 item threshold, the sequence number increments and we begin to add messages to the next “bucket” document. The most recent messages will always be in the “bucket” document with the highest sequence number. Viewing the most recent 50 messages for a user’s inbox is at most two reads; viewing the most recent 100 messages is at most three reads.

Normally a user’s entire inbox will exist on a single shard. However, it is possible that a few user inboxes could be spread across two shards. Because our application will probably page through a user’s inbox, it is still likely that every query for these few users will be routed to a single shard.

Fan out on Write with Buckets is generally the most scalable approach of the these three designs for social inbox applications. Every design presents different trade-offs. In this case viewing a user’s inbox is very efficient, but writes are somewhat more complex, and more disk space is consumed. For many applications these are the right trade-offs to make.

Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:


Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

A pretty interesting post by John Page:

This blogpost describes a number of techniques, in MongoDB, for efficiently finding documents that have a number of similar attributes to a supplied query whilst not being an exact match. This concept of “Fuzzy” searching allows users to avoid the risks of failing to find important information due to slight differences in how was entered.

In the presented incarnation, these 4 solutions are MongoDB specific, but some of them could be easily generalized.

Original title and link: Efficient techniques for fuzzy and partial matching in MongoDB (NoSQL database©myNoSQL)

Crittercism: Scaling To Billions Of Requests Per Day On MongoDB

This is a guest post by Mike Chesnut, Director of Operations Engineering at Crittercism. This June, Mike will present at MongoDB World on how Crittercism scaled to 30,000 requests/second (and beyond) on MongoDB.

MongoDB is capable of scaling to meet your business needs — that is why its name is based on the word humongous. This doesn’t mean there aren’t some growing pains you’ll encounter along the way, of course. At Crittercism, we’ve had a huge amount of growth over the past 2 years and have hit some bumps in the road along the way, but we’ve also learned some important lessons that can hopefully be of use to others.

Background

Crittercism provides the world’s first and leading Mobile Application Performance Management (mAPM) solution. Our SDK is embedded in tens of thousands of applications, and used by nearly a billion users worldwide. We collect performance data such as error reporting, crash diagnostics details, network breadcrumbs, device/carrier/OS statistics, and user behavior. This data is largely unstructured and varies greatly among different applications, versions, devices, and usage patterns.

Storing all of this data in MongoDB allows us to collect raw information that may be of use in a variety of ways to our customers, while also providing the analytics necessary to summarize the data down to digestible, actionable metrics.

As our request volume has grown, so too has our data footprint; over the course of 18 months our daily request volume increased over 40x:

Our primary MongoDB cluster now houses over 20TB of data, and getting to this point has helped us learn a few things along the way.

Routing

The MongoDB documentation suggests that the most common topology is to include a router — a mongos process — on each client system. We started doing this and it worked well for quite a while.

As the number of front-end application servers in production grew from the order of 10s to several 100s, we found that we were creating heavy load via hundreds (or sometimes thousands) of connections between our mongos routers and our mongod shard servers. This meant that whenever chunk balancing occurred — something that is an integral part of maintaining a well-balanced, sharded MongoDB cluster — the chunk location information that is stored in the config database took a long time to propagate. This is because every mongos router needs to have a full picture of where in the cluster all of the chunks reside.

So what did we learn? We found that we could alleviate this issue by consolidating mongos routers onto a few hosts. Our production infrastructure is in AWS, so we deployed 2 mongos servers per availability zone. This gave us redundancy per AZ, as well as offered the shortest network path possible from the clients to the mongos routers. We were concerned about adding an additional hop to the request path, but using Chef to configure all of our clients to only talk to the mongos routers in their own AZ helped minimize this issue.

Making this topology change greatly reduced the number of open connections to our mongod shards, which we were able to measure using MMS, without a noticeable reduction in application performance. At the same time, there were several improvements to MongoDB that made both the mongos updates and the internal consistency checks more efficient in general. Combined with the new infrastructure this meant that we could now balance chunks throughout our cluster without causing performance problems while doing so.

Shard Replacement

Another scenario we’ve encountered is the need to dynamically replace mongod servers in order to migrate to larger shards. Again following the recommended best deployment practice, we deploy MongoDB onto server instances utilizing large RAID10 arrays running xfs. We use m2.4xlarge instances in AWS with 16 disks. We’ve used basic Linux mdadm for performance, but at the expense of flexibility in disk configuration. As a result when we are ready to allocate more capacity to our shards, we need to perform a migration procedure that can sometimes take several days. This not only means that we need to plan ahead appropriately, but also that we need to be aware of the full process in order to monitor it and react when things go wrong.

We start with a replica set where all replicas are at approximately the same disk utilization. We first create a new server instance with more disk allocated to it, and add it to this replica set with rs.add().

The new replica will enter the STARTUP2 state and remain there for a long time (in our case, usually 2-3 days) while it first clones data, then catches up via oplog replication and builds indexes. During this time, index builds will often stop the replication process (note that this behavior is set to change in MongoDB 2.6), and so the replication lag time does not strictly decrease the whole time — it will steadily decrease for a while, then an index build will cause it to pause and start to fall further behind. Once the index build completes the replication will resume. It’s worth noting that while index builds occur, mongostat and anything else that requires a read lock will be blocked as well.

Eventually the replica will enter the SECONDARY state and will be fully functional. At this point we can rs.stepDown() one of the old replicas, shut down the mongod process running on it, and then remove it from the replica set via rs.remove(), making the server ready for retirement.

We then repeat the process for each member of the replica set, until all have been replaced with the new instances with larger disks.

While this process can be time-consuming and somewhat tedious, it allows for a graceful way to grow our database footprint without any customer-facing impact.

Conclusion

Operating MongoDB at scale — as with any other technology — requires some knowledge that you can gain from documentation, and some that you have to learn from experience. By being willing to experiment with different strategies such as those shown above, you can discover flexibility that may not have been previously obvious. Consolidating the mongos router tier was a big win for Crittercism’s Ops team in terms of both performance and manageability, and developing the above described migration procedure has enabled us to continue to grow to meet our needs without affecting our service or our customers.

See how Crittercism, Stripe, Carfax, Intuit, Parse and Sailthru are building the next generation of applications at MongoDB World. Register now and join the MongoDB Community in New York City this June.
How to speed up MongoDB Map Reduce by 20x

Analytics is becoming an increasingly important topic with MongoDB since it is in use for more and more large critical projects. People are tired of using different software to do analytics (Hadoop being pretty involving), and they typically require a massive transfer of data that can be costly.

MongoDB offers 2 ways to analyze data in-place: Map Reduce and the Aggregation Framework. MR is extremely flexible and easy to take on. It works well with sharding and allows for a very large output. MR was heavily improved in MongoDB v2.4 by the JavaScript engine swap from Spider Monkey to V8. The chief complaint about it is that it is quite slow, especially compared to the Agg Framework (which uses C++). Let’s see if we can squeeze some juice out of it.

The exercise

Let’s insert 10 million documents containing a single integer value between 0 and 1 million. This means that on average 10 documents have the same value.

> for (var i = 0; i < 10000000; ++i){ db.uniques.insert({ dim0: Math.floor(Math.random()*1000000) });}
> db.uniques.findOne()
{ "_id" : ObjectId("51d3c386acd412e22c188dec"), "dim0" : 570859 }
> db.uniques.ensureIndex({dim0: 1})
> db.uniques.stats()
{
        "ns" : "test.uniques",
        "count" : 10000000,
        "size" : 360000052,
        "avgObjSize" : 36.0000052,
        "storageSize" : 582864896,
        "numExtents" : 18,
        "nindexes" : 2,
        "lastExtentSize" : 153874432,
        "paddingFactor" : 1,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 576040080,
        "indexSizes" : {
                "_id_" : 324456384,
                "dim0_1" : 251583696
        },
        "ok" : 1
}

From here we want to get the count of unique values. This can be done easily with the following MR job:

> db.runCommand(
{ mapreduce: "uniques", 
map: function () { emit(this.dim0, 1); }, 
reduce: function (key, values) { return Array.sum(values); }, 
out: "mrout" })
{
        "result" : "mrout",
        "timeMillis" : 1161960,
        "counts" : {
                "input" : 10000000,
                "emit" : 10000000,
                "reduce" : 1059138,
                "output" : 999961
        },
        "ok" : 1
}

As you can see in the output it takes about 1200 seconds (tested on EC2 M3 instance). There are 10 million maps, 1 million reduces, 999961 documents in output. The result looks like this:

> db.mrout.find()
{ "_id" : 1, "value" : 10 }
{ "_id" : 2, "value" : 5 }
{ "_id" : 3, "value" : 6 }
{ "_id" : 4, "value" : 10 }
{ "_id" : 5, "value" : 9 }
{ "_id" : 6, "value" : 12 }
{ "_id" : 7, "value" : 5 }
{ "_id" : 8, "value" : 16 }
{ "_id" : 9, "value" : 10 }
{ "_id" : 10, "value" : 13 }
...

Using sorting

I’ve outlined in a previous post how beneficial using a sort can be for MR. It is a very poorly understood feature. In this case, processing the input unsorted means that the MR engine will get the values in random order and will not have the opportunity to reduce at all in RAM. Instead it will have to write all the documents back to disk in a temporary collection, to later read them back in order and reduce. Let’s see if using a sort helps:

> db.runCommand(
{ mapreduce: "uniques", 
map: function () { emit(this.dim0, 1); }, 
reduce: function (key, values) { return Array.sum(values); }, 
out: "mrout", 
sort: {dim0: 1} })
{
        "result" : "mrout",
        "timeMillis" : 192589,
        "counts" : {
                "input" : 10000000,
                "emit" : 10000000,
                "reduce" : 1000372,
                "output" : 999961
        },
        "ok" : 1
}

That’s a big help indeed! We’re down to 192s which is already a 6x improvement. The number of reduces is about the same but now they are done in RAM before the results are written to disk.

Using multiple threads

MongoDB does not multithread a single MR job - it will only multithread multiple jobs. But with the multi-core CPUs it could be very advantageous to parallelize the job within a single server, Hadoop style. What we need is really to subdivide the input into several chunks and spin up one MR job for each chunk. Maybe the data set has an easy way to get split, but otherwise the splitVector command (not documented) enables you to very quickly find split points:

> db.runCommand({splitVector: "test.uniques", keyPattern: {dim0: 1}, maxChunkSizeBytes: 32000000})
{
    "timeMillis" : 6006,
	"splitKeys" : [
		{
			"dim0" : 18171
		},
		{
			"dim0" : 36378
		},
		{
			"dim0" : 54528
		},
		{
			"dim0" : 72717
		},
…
		{
			"dim0" : 963598
		},
		{
			"dim0" : 981805
		}
	],
	"ok" : 1
}

This command only takes about 5s to find the split points over 10m documents, that’s fast! So now we just need a way to create multiple MR jobs. From an application server it would be pretty easy using multiple threads and a query with $gt / $lt for the MR command. From the shell, one can use the ScopedThread object, which works as follows:

> var t = new ScopedThread(mapred, 963598, 981805)
> t.start()
> t.join()

So now we can put together some quick JS code which will spawn 4 threads (as many as cores), wait and display the results:

> var res = db.runCommand({splitVector: "test.uniques", keyPattern: {dim0: 1}, maxChunkSizeBytes: 32 *1024 * 1024 })
> var keys = res.splitKeys
> keys.length
39
> var mapred = function(min, max) { 
return db.runCommand({ mapreduce: "uniques", 
map: function () { emit(this.dim0, 1); }, 
reduce: function (key, values) { return Array.sum(values); }, 
out: "mrout" + min, 
sort: {dim0: 1}, 
query: { dim0: { $gte: min, $lt: max } } }) }
> var numThreads = 4
> var inc = Math.floor(keys.length / numThreads) + 1
> threads = []; for (var i = 0; i < numThreads; ++i) { var min = (i == 0) ? 0 : keys[i * inc].dim0; var max = (i * inc + inc >= keys.length) ? MaxKey : keys[i * inc + inc].dim0 ; print("min:" + min + " max:" + max); var t = new ScopedThread(mapred, min, max); threads.push(t); t.start() }
min:0 max:274736
min:274736 max:524997
min:524997 max:775025
min:775025 max:{ "$maxKey" : 1 }
connecting to: test
connecting to: test
connecting to: test
connecting to: test
> for (var i in threads) { var t = threads[i]; t.join(); printjson(t.returnData()); }
{ 
        "result" : "mrout0",
        "timeMillis" : 205790,
        "counts" : {
                "input" : 2750002,
                "emit" : 2750002,
                "reduce" : 274828,
                "output" : 274723
        },
        "ok" : 1
}
{ 
        "result" : "mrout274736",
        "timeMillis" : 189868,
        "counts" : {
                "input" : 2500013,
                "emit" : 2500013,
                "reduce" : 250364,
                "output" : 250255
        },
        "ok" : 1
} 
{
        "result" : "mrout524997",
        "timeMillis" : 191449,
        "counts" : {
                "input" : 2500014,
                "emit" : 2500014,
                "reduce" : 250120,
                "output" : 250019
        },
        "ok" : 1
}
{
        "result" : "mrout775025",
        "timeMillis" : 184945,
        "counts" : {
                "input" : 2249971,
                "emit" : 2249971,
                "reduce" : 225057,
                "output" : 224964
        },
        "ok" : 1
}

The 1st thread does a bit more than the other ones, but still it amounts to about 190s per thread… which means this is not faster than 1 thread! That is curious, since using ‘top’ you can see all cores working to some extent.

Using multiple databases

The issue is that there is too much lock contention between the threads. MR is not very altruistic when locking (it yields every 1000 reads), and since MR jobs does a lot of writing too, threads end up waiting on each other. Since MongoDB has individual locks per database, let’s try using a different output db for each thread:

> var mapred = function(min, max) { 
return db.runCommand({ mapreduce: "uniques", 
map: function () { emit(this.dim0, 1); }, 
reduce: function (key, values) { return Array.sum(values); }, 
out: { replace: "mrout" + min, db: "mrdb" + min }, 
sort: {dim0: 1}, 
query: { dim0: { $gte: min, $lt: max } } }) }
> threads = []; for (var i = 0; i < numThreads; ++i) { var min = (i == 0) ? 0 : keys[i * inc].dim0; var max = (i * inc + inc >= keys.length) ? MaxKey : keys[i * inc + inc].dim0 ; print("min:" + min + " max:" + max); var t = new ScopedThread(mapred, min, max); threads.push(t); t.start() }
min:0 max:274736
min:274736 max:524997
min:524997 max:775025
min:775025 max:{ "$maxKey" : 1 }
connecting to: test
connecting to: test
connecting to: test
connecting to: test
> for (var i in threads) { var t = threads[i]; t.join(); printjson(t.returnData()); }
...
{ 
        "result" : {
                "db" : "mrdb274736",
                "collection" : "mrout274736"
        },
        "timeMillis" : 105821,
        "counts" : {
                "input" : 2500013,
                "emit" : 2500013,
                "reduce" : 250364,
                "output" : 250255
        },
        "ok" : 1
}
...

That’s more like it! We are now down to 100s which means about 2x improvement compared to a single thread. Not as good as hoped but still good. Here I have 4 cores so I only get 2x, but an 8-core CPU will give you 4x etc.

Using the pure JavaScript mode

There is something very interesting that comes up when splitting the input data between threads: each thread now only has about 250,000 unique keys to output as opposed to 1m. This means that we can make use of the “pure JS mode” that can be turned on using jsMode:true. When on, MongoDB will not translate objects back and forth from JS to BSON as it is doing the processing, and instead it reduces all objects from an internal JS dictionary with a limit of 500,000 keys. Let’s see if that helps:

> var mapred = function(min, max) { 
return db.runCommand({ mapreduce: "uniques", 
map: function () { emit(this.dim0, 1); }, 
reduce: function (key, values) { return Array.sum(values); }, 
out: { replace: "mrout" + min, db: "mrdb" + min }, 
sort: {dim0: 1}, 
query: { dim0: { $gte: min, $lt: max } }, 
jsMode: true }) }
> threads = []; for (var i = 0; i < numThreads; ++i) { var min = (i == 0) ? 0 : keys[i * inc].dim0; var max = (i * inc + inc >= keys.length) ? MaxKey : keys[i * inc + inc].dim0 ; print("min:" + min + " max:" + max); var t = new ScopedThread(mapred, min, max); threads.push(t); t.start() }
min:0 max:274736
min:274736 max:524997
min:524997 max:775025
min:775025 max:{ "$maxKey" : 1 }
connecting to: test
connecting to: test
connecting to: test
connecting to: test
> for (var i in threads) { var t = threads[i]; t.join(); printjson(t.returnData()); }
...
{ 
        "result" : {
                "db" : "mrdb274736",
                "collection" : "mrout274736"
        },
        "timeMillis" : 70507,
        "counts" : {
                "input" : 2500013,
                "emit" : 2500013,
                "reduce" : 250156,
                "output" : 250255
        },
        "ok" : 1
}
...

We are now down to 70s, getting there! The jsMode can really help, especially when objects have many fields. Here there is a single number field and it still helped by 30%.

Improvement in MongoDB v2.6

Very early in v2.6 development we got rid of a piece of code that sets an optional “args” parameter of any JS function call. This was not standard, nor used, but it was kept for legacy reason (see SERVER-4654). Let’s pull MongoDB from the master Git repository, compile it and run the test again:

...
{ 
        "result" : {
                "db" : "mrdb274736",
                "collection" : "mrout274736"
        },
        "timeMillis" : 62785,
        "counts" : {
                "input" : 2500013,
                "emit" : 2500013,
                "reduce" : 250156,
                "output" : 250255
        },
        "ok" : 1
}
...

There is definitely an improvement there since we are down to 60s, so about 10-15%. This change also improved the overall heap consumption of the JS engine.

Conclusion

Looking back, we’ve started at 1200s and ended at 60s for the same MR job, which represents a 20x improvement! This improvement should be available to most use cases, even if some of the tricks are not ideal (e.g. using multiple output dbs / collections). Nevertheless this can give people ideas on how to speed up their MR jobs and hopefully some of those features will be made easier to use in the future. The following ticket will make ‘splitVector’ command more available, and this ticket will improve multiple MR jobs on the same database. Cheers!

A tuts+ guide to MongoDB for people familiar with SQL and relational databases:

We will start with mapping the basic relational concepts like table, row, column, etc and move to discuss indexing and joins. We will then look over the SQL queries and discuss their corresponding MongoDB database queries.

By the end of it you’ll probably not be able to convert your app to MongoDB, but at the next meetup or hackaton you’ll have an idea of what those Mongo guys are talking about.

Original title and link: Mapping relational databases terms and SQL to MongoDB (NoSQL database©myNoSQL)

Schema Design for Time Series Data in MongoDB

This is a post by Sandeep Parikh, Solutions Architect at MongoDB and Kelly Stirman, Director of Products at MongoDB.

Data as Ticker Tape

New York is famous for a lot of things, including ticker tape parades.

For decades the most popular way to track the price of stocks on Wall Street was through ticker tape, the earliest digital communication medium. Stocks and their values were transmitted via telegraph to a small device called a “ticker” that printed onto a thin roll of paper called “ticker tape.” While out of use for over 50 years, the idea of the ticker lives on in scrolling electronic tickers at brokerage walls and at the bottom of most news networks, sometimes two, three and four levels deep.

Today there are many sources of data that, like ticker tape, represent observations ordered over time. For example:

  • Financial markets generate prices (we still call them “stock ticks”).
  • Sensors measure temperature, barometric pressure, humidity and other environmental variables.
  • Industrial fleets such as ships, aircraft and trucks produce location, velocity, and operational metrics.
  • Status updates on social networks.
  • Calls, SMS messages and other signals from mobile devices.
  • Systems themselves write information to logs.

This data tends to be immutable, large in volume, ordered by time, and is primarily aggregated for access. It represents a history of what happened, and there are a number of use cases that involve analyzing this history to better predict what may happen in the future or to establish operational thresholds for the system.

Time Series Data and MongoDB

Time series data is a great fit for MongoDB. There are many examples of organizations using MongoDB to store and analyze time series data. Here are just a few:

  • Silver Spring Networks, the leading provider of smart grid infrastructure, analyzes utility meter data in MongoDB.
  • EnerNOC analyzes billions of energy data points per month to help utilities and private companies optimize their systems, ensure availability and reduce costs.
  • Square maintains a MongoDB-based open source tool called Cube for collecting timestamped events and deriving metrics.
  • Server Density uses MongoDB to collect server monitoring statistics.
  • Appboy, the leading platform for mobile relationship management, uses MongoDB to track and analyze billions of data points on user behavior.
  • Skyline Innovations, a solar energy company, stores and organizes meteorological data from commercial scale solar projects in MongoDB.
  • One of the world’s largest industrial equipment manufacturers stores sensor data from fleet vehicles to optimize fleet performance and minimize downtime.

In this post, we will take a closer look at how to model time series data in MongoDB by exploring the schema of a tool that has become very popular in the community: MongoDB Management Service (MMS). MMS helps users manage their MongoDB systems by providing monitoring, visualization and alerts on over 100 database metrics. Today the system monitors over 25k MongoDB servers across thousands of deployments. Every minute thousands of local MMS agents collect system metrics and ship the data back to MMS. The system processes over 5B events per day, and over 75,000 writes per second, all on less than 10 physical servers for the MongoDB tier.

Schema Design and Evolution

How do you store time series data in a database? In relational databases the answer is somewhat straightforward; you store each event as a row within a table. Let’s say you were monitoring the amount of system memory used per second. In that example you would have a table and rows that looked like the following:


If we map that storage approach to MongoDB, we would end up with one document per event:

{
  timestamp: ISODate("2013-10-10T23:06:37.000Z"),
  type: ”memory_used”,
  value: 1000000
},
{
  timestamp: ISODate("2013-10-10T23:06:38.000Z"),
  type: ”memory_used”,
  value: 15000000
}

While this approach is valid in MongoDB, it doesn’t take advantage of the expressive nature of the document model. Let’s take a closer look at how we can refine the model to provide better performance for reads and to improve storage efficiency.

The Document-Oriented Design

A better schema approach looks like the following, which is not the same as MMS but it will help to understand the key concepts. Let’s call it the document-oriented design:

{
  timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
  type: “memory_used”,
  values: {
    0: 999999,
    …  
    37: 1000000,
    38: 1500000,
    … 
    59: 2000000
  }
}

We store multiple readings in a single document: one document per minute. To further improve the efficiency of the schema, we can isolate repeating data structures. In the ```timestamp_minute``` field we capture the minute that identifies the document, and for each memory reading we store a new value in the ```values``` sub-document. Because we are storing one value per second, we can simply represent each second as fields 0 - 59.

More Updates than Inserts

In any system there may be tradeoffs regarding the efficiency of different operations, such as inserts and updates. For example, in some systems updates are implemented as copies of the original record written out to a new location, which requires updating of indexes as well. One of MongoDB’s core capabilities is the in-place update mechanism: field-level updates are managed in place as long as the size of the document does not grow significantly. By avoiding rewriting the entire document and index entries unnecessarily, far less disk I/O is performed. Because field-level updates are efficient, we can design for this advantage in our application: with the document-oriented design there are many more updates (one per second) than inserts (one per minute).

For example, if you wanted to maintain a count in your application, MongoDB provides a handy operator that increments or decrements a field. Instead of reading a value into your application, incrementing, then writing the value back to the database, you can simply increase the field using $inc:

```{ $inc: { pageviews: 1 } }```

This approach has a number of advantages: first, the increment operation is atomic - multiple threads can safely increment a field concurrently using $inc. Furthermore, this approach is more efficient for disk operations, requires less data to be sent over the network and requires fewer round trips by omitting the need for any reads. Those are three big wins that result in a more simple, more efficient and more scalable system. The same advantages apply to the use of the $set operator.

The document-oriented design has several benefits for writing and reading. As previously stated, writes can be much faster as field-level updates because instead of writing a full document we’re sending a much smaller delta update that can be modeled like so:

db.metrics.update(
  { 
    timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
    type: ”memory_used”
  }, 
  {$set: {“values.59”: 2000000 } }
)

With the document-oriented design reads are also much faster. If you needed an hour’s worth of measurements using the first approach you would need to read 3600 documents, whereas with this approach you would only need to read 60 documents. Reading fewer documents has the benefit of fewer disk seeks, and with any system fewer disk seeks usually results is significantly better performance.

A natural extension to this approach would be to have documents that span an entire hour, while still keeping the data resolution per second:

{
  timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
  type: “memory_used”,
  values: {
    0: 999999,
    1: 1000000, 
    …,
    3598: 1500000,
    3599: 2000000
  }
}

One benefit to this approach is that we can now access an hour’s worth of data using a single read. However, there is one significant downside: to update the last second of any given hour MongoDB would have to walk the entire length of the “values” object, taking 3600 steps to reach the end. We can further refine the model a bit to make this operation more efficient:

{
  timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
  type: “memory_used”,
  values: {
    0: { 0: 999999, 1: 999999, …, 59: 1000000 },
    1: { 0: 2000000, 1: 2000000, …, 59: 1000000 },
    …,
    58: { 0: 1600000, 1: 1200000, …, 59: 1100000 },
    59: { 0: 1300000, 1: 1400000, …, 59: 1500000 }
  }
}
db.metrics.update(
  { 
    timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
    type: “memory_used”
  }, 
  {$set: {“values.59.59”: 2000000 } }
)

MMS Implementation

In MMS users have flexibility to view monitoring data at varying levels of granularity. These controls appear at the top of the monitoring page:

These controls inform the schema design for MMS, and how the data needs to be displayed. In MMS, different resolutions have corresponding range requirements - for example, if you specify that you want to analyze monitoring data at the granularity of “1 hr” instead of “1 min” then the ranges also become less granular, changing from hours to days, weeks and months:

To satisfy this approach in a scalable manner and keep data retention easy to manage, MMS organizes monitoring data to be very efficient for reads by maintaining copies at varying degrees of granularity. The document model allows for efficient use of space, so the tradeoff is very reasonable, even for a system as large as MMS. As data ages out, collections that are associated with ranges of time are simply dropped, which is a very efficient operation. Collections are created to represent future ranges of time, and these will eventually be dropped as well. This cycle maintains a rolling window of history associated with the functionality provided by MMS.

In addition, to support the “avg/sec” display option the system also tracks the number of samples collected and the sum of all readings for each metric similar to the following example:

{
  timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”),
  num_samples: 58,
  total_samples: 108000000,
  type: “memory_used”,
  values: {
    0: 999999,
    …  
    37: 1000000,
    38: 1500000,
    … 
    59: 1800000
  }
}

The fields “num_samples” and “total_samples” are updated as new readings are applied to the document:

db.metrics.update(
  { 
    timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
    type: “memory_used”
  }, 
  {
    {$set: {“values.59”: 2000000 }},
    {$inc: {num_samples: 1, total_samples: 2000000 }}
  }
)

Computing the average/sec is straightforward and requires no counting or processing, just a single read to retrieve the data and a simple application-level operation to compute the average. Note that with this model we assume a consistent cadence of measurements - one per second - that we can simply aggregate at the top of the document to report a rolled-up average for the whole minute. Other models are possible that would support inconsistent measurements and flexible averages over different time frames.

Another optimization used in MMS is preallocating all documents for the upcoming time period; MMS never causes an existing document to grow or be moved on disk. A background task within the MMS application performs inserts of empty “shell” documents including the subdocument schema but with all zeroes for the upcoming time periods before they are recorded. With this approach fields are always incremented or set without ever growing the document in size, which eliminates the possibility of moving the document and the associated overhead. This is a major performance win and another example of ensuring in-place updates within the document-oriented design.

Conclusion

MongoDB offers many advantages for storing and analyzing time series data, whether it’s stock ticks, tweets or MongoDB metrics. If you are using MongoDB for time series data analysis, we want to hear about your use case. Please continue the conversation by commenting on this post with your story.

More Information

This post was updated in December 2014 to include additional resources and updated links.

Like what you see? Get MongoDB updates straight to your inbox

MongoDB intro and cheat sheet

These are my notes on the basics of MongoDB, the leading document-based NoSQL database. JavaScript gets in the database as well with this product and this is one of the reasons why it became so popular in the modern web development stack. Last year MongoDB acquired WiredTiger, a data management company that will power the new storage engine of the upcoming MongoDB 3 (although it will not be the default engine). This is particularly interesting as the performance benefits are expected to be quite impressive.

Forgive the formatting glitches and the overall unpolished state of these notes (always refer to the pretty good official documentation)!


Mongo shell (mongo) version has to match MongoDB itself (mongod) to avoid issues.Useful commands right off the bat:

  • help
  • help keys

Shell has a variable called db which has the value of the current database. > db test

The shell highlights in dark blue the corresponding opening or closing parenthesis/bracket/brace when the cursor is over a closing or opening parenthesis/bracket/brace respectively. If you get the matching wrong, the shell prints (…)

How to import data into MongoDB

One quick way if you have an existing file is:

mongoimport —type csv —headerline data.csv -d

databaseName

-c

collectionName

For an existing MongoDB dump you can use the mongorestore CLI utility.

  • CRUD operations

db.people.insert( javascript object) -people collection inside the current database /test in this case/-

“_id” field is the UID of the document, it’s a primary key (IT IS IMMUTABLE), gets automatically added if you don’t specify “_id”. The value gets constructed with the ObjectId type, taking into account the current time, an id for the machine contracting the object ID, the processID and a counter that’s global to the process constructing the objectID. So it is intended to be a globally unique id. The probability of collisions is close to 0.

The find method

(analogous to SELECT in SQL)

db.people.findOne() - selects a document randomly from the collection    db.people.findOne(WHERE parameter, what fields should come back) - e.g.          db.people.findOne({“name”: “john”})          db.people.findOne({“name”: “john”}, {“name”:true, “_id”:false})

    same for db.people.find()

conditionals

GREATER THAN: $gt     db.scores.find( {score: {$gt : 95} } )

LESS THAN: $lt     db.scores.find( {score: {$lt : 95} } )

86<=score<95 (order of conditionals doesn’t matter -the comma is like an AND clause-, this only happens if there are dollar signs in the query otherwise the BSON representation will be different and the order matters)     db.scores.find( {score: {$lt : 95, $gte: 86} } )

conditionals can be applied also to strings (case-sensitive lexicographic sorting based on UTF-8 bytes order so it’s only correct for POSIX locales /ASCII/ at least in MongoDB 2.6, might change in the future)

conditionals are strongly and dynamically typed: range comparisons don’t span data types (if you look for strings you don’t get back numbers).

$exists     db.people.find( {name: { $exists: true }} )

$type (specified as BSON numeric encoding, e.g. 2 for strings)     db.people.find( {name: { type: 2 }} )

Regular Expressions (uses the PCRE library) -usually not so efficient-     db.people.find( {name: { $regex: “a” }} )

Combining multiple queries

$or     db.scores.find( {$or:[  {“score”: { $lt:50 }} , {“score”: { $gt:80 }}  ] })

$and -used infrequently and usually less performant-

Watch out for parsing intricacies: db.scores.find( { score : { $gt : 50 }, score : { $lt : 60 } } ); the second occurrence of score replaces the first so this means “find all documents with score<60”

The find query also looks inside arrays automatically but without recursion (only the top level of the array is looked up). e.g. favorites is an array: db.whatever.find{favorites: “pretzels”}

$all     all of the elements, possibly more $in     at least one of the elements

Query nested documents

Use dot notation     Like the dollar sign $, the dot . is another special character in MongoDB

    Suppose a simple e-commerce product catalog called catalog with documents that look like this: { product : “Super Duper-o-phonic”,   price : 100000000000,   reviews : [ { user : “fred”, comment : “Great!” , rating : 5 },               { user : “tom” , comment : “I agree with Fred, somewhat!” , rating : 4 } ],   … }

Write a query that finds all products that cost more than 10,000 and that have a rating of 5 or better. db.catalog.find( {“price”:{$gt: 10000}, “reviews.rating”:{$gte:5}} )

Cursors

cur = db.people.find(); while (cur.hasNext())  printjson(cur.next());

cur.limit(5);     only return 5 documents

cur.sort( {name: -1} )     print in reverse order by name (ASCII lexicographic ordering)

cur.sort( {name: -1} ).limit(3).skip(2)     skip the first two and return only 3 results     The order should be Sort first, Skip second and Limit third

All these are server-side, processed by mongod

Counting

db.scores.count({“type”:”exam”})     Returns the number of results

Updating documents

db.people.update( { name: “Smith” } , { name : “Thompson”, salary: 50000})     finds document with name “Smith” and updates it with name “Thompson” and salary 50000. Be careful as this changes the whole document (if the selected document had other fields, those didn’t carry over the updated version). Wholesale replacement. If you don’t want to replace the entire document, you have to use update operators like the following (full list at

http://docs.mongodb.org/manual/reference/operator/update/#id1

):

$set     fine grain control on updates for fields

$inc     increment numeric values with defined increment step (if field does’t exist it will get created)

$unset     remove a field and its value     db.users.update( { “_id”:”jimmy”  }, { $unset: {interests:1}  }  )

Manipulating arrays

$push     db.friends.update( { _id : “Mike” }, { $push : { interests : “skydiving” } } );

$pop     db.friends.update( { _id : “Mike” }, { $pop : { interests : -1 } } );

$pushAll     adds more element to the array, can add duplicates     db.friends.update( { _id : “Mike” }, { $pushAll: { interests : [ “skydiving” , “skiing” ] } } );

$pull     removes the value from array no matter the location

$pullAll     removes any occurrence of any of the specified values from the array

$addToSet     acts like a push but if item already exists it does nothing     db.friends.update( { _id : “Mike” }, { $addToSet : { interests : “skydiving” } } );

Update and insert simultaneously

db.foo.update( { username : ‘bar’ }, { ‘$set’ : { ‘interests’: [ ‘cat’ , ‘dog’ ] } } , { upsert : true } );

results in: { “_id” : ObjectId(“507b78232e8dfde94c149949”), “interests” : [ “cat”, “dog” ], “username” : “bar” }

The empty WHERE-like selector matches everything (both in update and find). e.g. { }

By default the update method only updates the first matching document. To update all documents you have to provide the option { multi: true } (at least in the shell and JavaScript driver).

In MongoDB there’s a single thread for each operation being executed. However write operations that affect more than one document are multitasked. Write operations are not isolated transactions when talking about multiple documents. But atomicity is guaranteed for

individual

documents.

Delete documents

remove method

db.people.remove( { } )     removes all documents one by one. Doesn’t delete the indexes if present. Multi option not needed here.     db.people.remove( { score: {$lt:60}  }  )

db.people.drop()     removes all documents at once (faster)

The Node.js MongoDB driver

The ObjectId has a  different representation than the mongo shell

db.collection(‘grades’).findOne(query,callback);

Projections

Only the ‘grade’ field will be returned db.collection(‘grades’)find({}, {‘grade’:1, ‘_id’:0}, callback);

Order of operations

The driver will reorder methods as follows: Sort, Skip, Limit. Inside the Sort method, the order in which you specify the arrays will determine the sorting priority. That’s why in the sort method you have to use arrays (bracket notation) and not objects (brace notation), otherwise the ordering will be unpredictable.

Updates

In-place updates are done with the $set operator. You can’t mix $set with other “normal” fields as MongoDB doesn’t know if you meant to do an in-place update or a replacement. Replacement updates don’t reorder the fields as a result.

save

The save method on a collection takes the document as input, looks up the _id field and if there isn’t one, save will insert a new doc. If there is one, it will act as an upsert (with the query on the _id). It doesn’t reorder the fields as a result.

findAndModify

This method atomically updates and returns the (first) document. If instead you use update and then find, in between the two the document could have been modified by other clients. If you want the new (updated) document to be returned, var options = {‘new’ : true};

db.collection(‘test’).findAndModify(query, sort, update, options, callback);

the following calls to findAndModify will add the “dropped” field to the homework document with the lowest grade and call the given callback with the resulting document: db.collection(‘homeworks’).findAndModify({}, [[ ‘grade’ , 1 ]], { ‘$set’ : { ‘dropped’ : true } }, { ‘new’ : true }, callback);

remove

.remove(query, function(err, removed){} ); //where removed is the number of deleted docs

the following remove calls would definitely remove all the documents in the collection ‘foo’, regardless of its contents: db.collection(‘foo’).remove(callback); db.collection(‘foo’).remove({ ‘x’ : { ‘$nin’ : [] } }, callback); db.collection(‘foo’).remove({}, callback);

Query examples

Find the state with the lowest temperature having a wind direction between 180 and 360 degrees db.data.find( { “Wind Direction”: {$gte:180, $lte:360} }, {“State”:1, “_id”:0}   ).sort({“Temperature”:1}).limit(1)

  • Schema design

In the relational world, you design the db in a way that is application agnostic, using the third normal form. In MongoDB, the design is application-centric and schema choices depend on the use case and data access patterns.

Mongo doesn’t natively support joins, constraints and transactions (there is support for atomicity though).

Embed the data and pre-join when you can.

There are no foreign-key constraints so you have to check for consistency inside your app. You can mostly solve this by embedding documents.

The most common approaches for living without transactions are:

  1. restructure your app’s code so that you’re working within a single document, to take advantage of MongoDB’s atomic operations within the document.
  2. implement locking in your app’s code.
  3. just tolerate a little bit of inconsistency (common across popular web apps that don’t deal with public safety, banking, etc.)

Consider when deciding whether to embed or not:

  • Frequency of access
  • size of items, exceeding the 16mb document limit

For one to many relationships -whenever the many is large- (like city:person) it’s usually best to use multiple collections. Thus having two collections, people and city.

If it’s the “one to few” kind of relationship (like in blog post:comments), embedding documents should be best. Embedding is ok from the many to the one side of the relationship.

For a many to many relationship, it’s usually a “few to few” kind of relation. So for a books:authors relation, the best would be to create two documents, linking the two (a books array in the authors document for example). You could embed the book information inside the authors collection but this could lead to data duplication and update anomalies.

In MongoDB the multikey index is what makes embedding documents efficient.

Especially when data is stored on high-latency spinning disks, embedding a document improves read performance by exploiting locality.

How to represent trees in MongoDB

Given the following typical document for a e-commerce category hierarchy collection called categories:

{ _id: 34, name : “Snorkeling”, parent_id: 12, ancestors: [12, 35, 90] }

This query will find all descendants of the snorkeling category: db.categories.find({ancestors:34})

Performance

You don’t want to do a full table scan, or in MongoDB’s case a collection scan, as it kills performance. That’s where indexing comes in. Indexes in MongoDB are ordered lists of keys. Although this speeds up reads, it slows down writes because indexes need to be updated when inserting new documents. So you want to create indexes on what you’re most likely to query. If for example you have an index on (a, b, c) you can query on a or a,b but not on b or c. If you query on a,c it ignores c. Every collection by default has an index on _id

To create an index: db.yourcollection.ensureIndex({field:1}); //1 means the order is ascending

To find an index: db.yourcollection.indexes.find();

To see a list of current indexes: db.yourcollection.getIndexes();

To drop (delete) an index: db.yourcollection.dropIndex({‘field’:1});

To check wether a certain query is leveraging an index you can append .explain() to the query.

Multikey indexes

Indexes on arrays are multikey indexes; you can’t create combined indexes where more than one field is an array.

Suppose we have a collection foo that has an index created as follows: db.foo.ensureIndex({a:1, b:1})

db.foo.insert({a:[“apples”,”oranges”], b:”grapes”}) db.foo.insert({a:”grapes”, b:”oranges”}) db.foo.insert({a:”grapes”, b:[8,9,10]}) db.foo.insert({a:[1,2,3], b:[5,6,7]})

Only the first three are valid inserts (because of the multikey index restriction). Note that multikey indexes can be larger than the original documents, thus taking a significant amount of space. As a consequence of this the overhead for updates can be greater as well.

Unique indexes

db.yourcollection.ensureIndex({field:1}, {unique:true})

This ensures that the index is unique and no two identical items can be present in the indexed field. If you try to insert a duplicate item you get a duplicate key error back.

Sparse indexes

Imagine your collection is made of the following documents: {a:1, b:2} {a:3, b:5, c:6} {a:4, b:5} The first and third documents have the c key equal to null so you can’t create a unique index on c, because of the multikey index restriction. That’s where sparse indexes come into play. You can create it with: db.yourcollection.ensureIndex({field:1}, {unique:true, sparse:true})

If you sort on that index, documents missing that field will not be returned. Because of this, since MongoDB 2.6 the sparse index is not used by default when sorting. You can force the query by appending .hint({field:1}) The hint command tells the database what index to use. To tell MongoDB not to use an index but the usual cursor, you can use .hint({$natural:1})

Index creation

By default index creation happens in the foreground: faster but it blocks writers. For production systems, you can run it in the background which is non-blocking but slower.

To explore the query plan you can append .explain() to the query. For example to see whether or not the index was used, look at the “cursor” field in the output. If it says Basic Cursor then no index was used, otherwise BtreeCursor means MongoDB leveraged an index to perform the query. MongoDB can also use the index for just part of the query plan e.g. just for the sort command and not for the find command itself. The

nscanned

reports the number of scanned index items while the

nscannedObjects

is the number of scanned documents.

Example output of .explain() where the query scanned 10,000,000 documents (nScannedObjects field), returning 100,000 (n field) in 5.2 seconds (full scan was performed as no index was used): “cursor” : “BasicCursor”, “isMultiKey” : false, “n” : 100000, “nscannedObjects” : 10000000, “nscanned” : 10000000, “nscannedObjectsAllPlans” : 10000000, “nscannedAllPlans” : 10000000, “scanAndOrder” : false, “indexOnly” : false, “nYields” : 7, “nChunkSkips” : 0, “millis” : 5151, “indexBounds” : {

}

Suppose an index was created with: db.foo.ensureIndex({a:1, b:1, c:1}) Then the following queries will use the index: db.foo.find({a:3}) db.foo.find({c:1}).sort({a:1, b:1}) //only on the sort part But the following queries will not: db.foo.find({b:3, c:4}) db.foo.find({c:1}).sort({a:-1, b:1}) The last one is because it’s mixing ascending and descending order for two fields. Note that the query db.foo.find({c:1}).sort({a:-1}) would have worked on the sort part because MongoDB will just reverse the index automatically.

For next-level performance, indexes should be kept in memory. Your collection size and stats can be displayed using: db.yourcollection.stats() The index size can be displayed using: db.yourcollection.totalIndexSize()

Index selectivity is important for performance, the more selective an index is the better. In practical terms this means creating the index on the field that has the biggest value range.

Geospatial indexes

Prerequisite is to have a field that stores coordinates. E.g. ‘location’: [x,y] To create a 2D geospatial Index in MongoDB you can use: ensureIndex({‘location’:’2d’, type:1}) Type 1 is to indicate a compound index (optional).

Too query it you can use, with x being the longitude and y the latitude: db.yourcollection.find({location:{$near:[x,y]}}) The db will output results in increasing distance. You can add .limit(10) to return just the first 10 results.

MongoDB uses (part of) the GeoJSON format. GeoJSON is used inside your location field (the name is arbitrary). To create a 3D geospatial index you can use (with type optional and assumes the points are on the surface): ensureIndex({‘location’:’2dsphere’, type:1})

Example With a sample document like this: { “_id” : { “$oid” : “535471aaf28b4d8ee1e1c86f” }, “store_id” : 8, “loc” : { “type” : “Point”, “coordinates” : [ -37.47891236119904, 4.488667018711567 ] } } The following query will find docs within 1,000,000 meters of the location at longitude=-130 and latitude=39 db.stores.find({ loc:{ $near: { $geometry: { type: “Point”, coordinates: [-130, 39]}, $maxDistance:1000000 } } })

Full text search

It is available since MongoDB 2.6. To enable it (one the words field in this case) you have to add an index using: db.yourcollection.ensureIndex({‘words’:’text’})

To perform a query: db.yourcollection.find({$text:{$search:’your keyword’}})

The search is case insensitive, applies a logical OR operator and ignores some common stopwords (like

the

). It is possible to .sort() the results using the $meta operator on textScore, the results are ranked so the top result is usually what you’re looking for.

Logging and Profiling

MongoDB automatically logs slow queries (>100ms).

There are three levels for the profiler. Level 0 means the profiler is off, level 1 means log slow queries, level 2 means log all queries. For example to set level 0 you can add the following while starting mongod: —profile 0 Add the following to specify the ms (logs only queries above the specified amount): —slowms 5

To see what profiling level is being used: db.getProfilingLevel()

With the slowms parameter as well: db.getProfilingStatus()

To set the profiling level to 1 and

slowms

parameter to 4 (queries that take above 4ms to complete): db.setProfilingLevel(1,4)

To see the profiling: db.system.profile.find()

Example This query looks in the system profile collection for all queries that took longer than one second, ordered by timestamp descending: db.system.profile.find({millis:{$gt:1000}}).sort(timestamp)

Mongotop and Mongostat

The mongotop command is similar to the UNIX

top

so it shows stats on active processes, etc.

Mongostat is similar to

iostat

You can look at the “idx miss” column to see whether or not your index is in memory (what percent of your index is missing from memory).

Sharding

It means splitting up a collection among multiple servers (horizontal scaling). In MongoDB this is achieved through a router called

mongos

that coordinates between several mongod instances or replica sets.

Mongos

is usually hosted on the same machine as your application and you also have several

mongos

instances. Your app then communicates with

mongos

using a shard key (one or more fields of your doc). Insert must include the whole shard key. For the update, remove and find commands, if mongos doesn’t receive a shard key then it’s going to broadcast to all the shards. If you specify it you’ll get better performance as only one of the servers/replica sets will be utilized.

  • The aggregation framework (GROUP BY queries)

An aggregation query that will find the number of products by category of a collection that has the form: { “_id” : ObjectId(“50b1aa983b3d0043b51b2c52”), “name” : “Nexus 7”, “category” : “Tablets”, “manufacturer” : “Google”, “price” : 199 }

Query:

db.products.aggregate([{$group:{“_id”:”$category”, “num_products”:{“$sum”:1}}}])

The aggregation framework has a pipeline of operations, much like UNIX pipelines. Match, Group, Skip, Limit, Sort, Project, unwind are all part of it. You can add up pipeline stages in what’s called a double $group stage. It can limit performance on a sharded environment so in that case you might want to take the data out of MongoDB and perform calculations in Hadoop for large datasets.

The group operation is similar to an upset as it creates new documents if they don’t exist or it updates existing documents (num_products count for example), running through the input collection one doc at a time.

You can create a compound _id key that has many custom fields as you want. Having a documents inside the _id key is just fine, as long as it’s unique in the collection.

$sum operator

Suppose we have a collection of populations by postal code. The postal codes in are in the _id field, and are therefore unique. Documents look like this:

{ “city” : “CLANTON”, “loc” : [ -86.642472, 32.835532 ], “pop” : 13990, “state” : “AL”, “_id” : “35045” }

The following is an aggregation query to sum up the population (pop) by state and put the result in a field called population db.zips.aggregate([{“$group”:{“_id”:”$state”, “population”:{$sum:”$pop”}}}])

$avg operator

As before but averaging the population for every state: db.zips.aggregate([{“$group”:{“_id”:”$state”, “population”:{$avg:”$pop”}}}])

$addToSet operator

db.zips.aggregate([{“$group”:{“_id”:”$city”, “postal_codes”:{“$addToSet”:”$_id”}}}])

Returns the postal codes that cover each city. “postal_codes” will be an array. “_id” and “$_id” are different. $_id here is used to go through the id values of the source documents (the _id field contain the zip code). Those docs have the form: { “city” : “PALO ALTO”, “loc” : [ -122.149685, 37.444324 ], “pop” : 15965, “state” : “CA”, “_id” : “94301” }

$push operator

This is like addToSet but it doesn’t guarantee that the item will be added only once. It gets pushed anyway.

$max and $min operators

db.zips.aggregate([{“$group”:{“_id”:”$state”, “pop”:{$max:”$pop”}}}])

Double $group stages

db.fun.aggregate([{$group:{_id:{a:”$a”, b:”$b”}, c:{$max:”$c”}}}, {$group:{_id:”$_id.a”, c:{$min:”$c”}}}])

The second group query is applied to the output of the first group stage, specifically on the a field of id (_id:”$_id.a”).

$project operator

This is essentially a filtering stage to preprocess your data and/or save time and memory for large queries but it’s mostly used to reshape the query results (changing a field name to lower case etc.) db.zips.aggregate([{$project:{_id:0, city:{$toLower:”$city”}, pop:1, state:1, zip:”$_id”}}])

$match operator

A single match phase that filters for zipcodes with greater than 100,000 people: db.zips.aggregate([{$match:{pop:{$gt:100000}}}])

$sort

By default it uses a memory based sorting method but it has a limit of 100MB for any given pipeline stage, for larger sets you have to specify disk based sort. You can sort before or after the grouping stage.

$skip and $limit

Unlike the find queries, in the aggregation framework order matters. So here you usually skip first and limit second.

$unwind

When you have “prejoined” data, like in an array, that you want to unroll in order to perform operations (like grouping operations), you can use unwind. The $push operator will reverse the effects of unwind. You can also do a double unwind if there more than one array (the reverse is two $push operations).

Application engineering

Fault-tolerance and availability in MongoDB is achieved through a replica set of 3 or more nodes. When the primary node goes down, a new primary gets elected among the remaining nodes. When the old primary comes back online, it will now act as a secondary node. Types of replica set nodes:

  • Regular
  • Arbiter. This node gets to vote in a leader election and has no data on it. You can use one node as an arbiter in a 3-nodes replica set if you want to use the two more powerful servers as regular working nodes.
  • Delayed. It cannot become the primary node (priority set to 0).
  • Hidden. It cannot become the primary node (priority set to 0).

Write consistency

There’s only a single primary at any given time. Your writes always go to your primary. In the default configuration also reads go to the primary and this means you have strong consistency of reads with respect to writes (you can’t read stale data). You can allow for reads to go to secondary nodes but this means you may read stale data (the lag between two nodes is not guaranteed as the replication is asynchronous). When a fail occurs, you can’t write until the new leader (the new primary) is elected. So MongoDB does not offer eventual consistency (popular in other NoSQL databases) in its default configuration. The following command will allow you to read from a secondary, when issued from the mongo shell: rs.slaveOk()

The oplog

MongoDB’s oplog is not a file but a capped collection (a fixes size collection that loops when it fills up), located in the ‘local’ database and named ‘epilog.rs’. It is being used to support replication. It contrails a trail of operations being done against the replica set so that the secondaries can query the primary on the oplog to sync changes.

Sharding

This is the way MongoDB is scaled horizontally. Shards (individual servers or most often replica sets) are meant to split a collection’s data. Queries will be distributed among the shards via the

mongos

router.  At least one config server (usually 3 in production) is needed as well. Sharding here is done using a range-based approach with a shard key. The has to be an index that starts with the shard key (not a multi-key or unique index). The shard key is immutable and Mongo can not enforce unique indexes on a sharded collection other than the shard key itself. Any update that does not contain the shard key will be sent to all shards. Tips on how to choose a shard key: It should have sufficient cardinality (lots of values). It shouldn’t be something that increases monotonically.