MongoDB Schema Design and Common Practices

Installation

Exhaustive documentation: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

Mongo executables will be installed into /use/bin/, database files will under /data/db/

Log file location: /var/log/mongodb/mongodb.log

Terminology

Adminstration

sudo service mongodb start/stop/restart or simply issue mongod to start MongoDB process

mongo to enter MongoDB Console

> show dbs  #Show all databases
> db             #Show current db
> db.help()   #View commands related with db

# "Create" a new database named foo_db
> use foo_db  # Mongo will create this DB virutally, if we save anything into any collections, it will really creates the db.
# Create a new document into a collection (implicitly)
> db.user_profiles.save({ first_name: "Wayne", last_name: "Ye" })
# Explicitly create a new collection ()
db.createCollection("user_profile")
# Query the collection
> db.user_profiles.find()
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "first_name" : "Wayne", "last_name" : "Ye" }
> db.user_profiles.save({ first_name: "Wendy", last_name: "Shen", gender: "Female" })
> db.user_profiles.find()
# Update
> db.collection.update( { field: value1 }, { $set: { field1: value2 } } ); 
# View status
> db.stats()
> db.mycol.stats()

# Query subdocument using Dot Notation
> db.demo.insert({ "Items": [ { "Name": "Milk Powder", "Price": 9.9 }, { "Name": "Toy Car", "Price": 26 } ] })
> db.demo.find({ "Items.Price": { $gt: 20 } })
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "Items" : [  {  "Name" : "Milk Powder",  "Price" : 9.9 },  {  "Name" : "Toy Car",  "Price" : 26 } ] }

Batch administration from JavaScript
http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/
http://docs.mongodb.org/manual/core/server-side-javascript/#running-js-scripts-in-mongo-on-mongod-host
http://docs.mongodb.org/manual/tutorial/store-javascript-function-on-server/
http://docs.mongodb.org/manual/reference/method/ (Mongo shell JS references)

mongo localhost:27017/mydb db_schema.js

load("scripts/myjstest.js") OR load("/data/db/scripts/myjstest.js")

Schema Design

Embedding (de-normalize data)

Store two related pieces of data in a single document.

When:

There is a "contains" relationship between entities.
There is a "one-to-many" relationship, and the "many" objects always appear inline with the "one".

Example 1: Blog with comments

Denormalized blog with comments

db.blogs.insert({
  _id: 1,
  title: "Investigation on MongoDB",
  content: "some investigation contents",
  post_date: Date.now(),
  permalink: "http://foo.bar/investigation_on_mongodb",
  comments: [
    { content: "Gorgeous post!!!", nickname: "Scott", email: "foo@bar.org", timestamp: "1377742184305" },
    { content: "Splendid article!!!", nickname: "Guthrie", email: "foo@bar.org", timestamp: "1377742184305" }
  ]}
               );

Example 2: Dishes and Cheves

Normalized Dishes and Cheves

db.dishes.insert({
  _id: 1,
  name: "Kong Bao Ji Ding",
  price: 5.5,
  rate: 4.5,
  cheves: [ "Flora Zhang", "Cristina Wang" ]
  }
);
  
db.cheves.insert({
  _id: 1,
  name: "Flora Zhang",
  age: 32,
  avatar: "http://www.gravatar.com/avatar.php?gravatar_id=dc654756c7c", 
  dishes: [ "Kong Bao Ji Ding", "Knight Zhang Beef", "Ma Po Tou Fu" ]
  }
);

Benefits:
Better performance for read operations
Request and retrieve related data in a single database operation.

Referencing (Normalize-data)

store references between two documents to indicate a relationship between the data represented in each document.

When:

when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
to represent more complex many-to-many relationships.
to model large hierarchical data sets.

Benefits:

Separation of Concerns
Data model independent of logic

Drawback:
Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server.

Example 3: Books and publisher

Books and publishers

db.books.insert({
  _id: 1,
  name: "MongoDB Applied Design Patterns",
  price: 35,
  rate: 5,
  author: "Rick Copeland",
  ISBN: "1449340040",
  publisher_id: 1,
  reviews: [
    { isUseful: true, content: "Cool book!", reviewer: "Dick", timestamp: "1377742184305" },
    { isUseful: true, content: "Cool book!", reviewer: "Xiaoshen", timestamp: "1377742184305" }
  ]
  }
);
  
db.publishers.insert({
  _id: 1,
  name: "Packtpub INC",
  address: "2nd Floor, Livery Place 35 Livery Street Birmingham",
  telephone: "+44 0121 265 6484",
  }
);

Advanced Features

Multikey-Indexes

Mongo supports indexing subdocument's key, consider the above "Books and Publishers" collection, Mongo can index the reviewer key by telling him this:

db.books.ensureIndex({ "reviews.reviewer": 1 })

Aggregation framework

A MongoDB aggregation is a series of special operators applied to a collection. An operator is a JavaScript object with a single property, the operator name, which value is an option object. The core point of aggregation framework is the aggregation pipeline which is a framework for data aggregation modeled on the concept of data processing pipelines.

Aggregation was introduced in Mongo version 2.2, below is a table of comparison between Mongo and traditional relational DB from the aspect of aggregation functionalities:

SQL Terms, Functions, and Concepts	MongoDB Aggregation Operators
WHERE	$match
GROUP BY	$group
HAVING	$match
SELECT	$project
ORDER BY	$sort
LIMIT	$limit
SUM()	$sum
COUNT()	$sum
join	No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document.

For example, still using the above "Books and Publishers" example, image I want to query "a specific reviewer with the book(s) he/she reviewed", I can do this:

> > db.books.aggregate({ $unwind: "$reviews" }, { $match: { "reviews.reviewer": "Xiaoshen"} })
{
"result" : [
  {
   "_id" : 1,
   "name" : "MongoDB Applied Design Patterns",
   "price" : 35,
   "rate" : 5,
   "author" : "Rick Copeland",
   "ISBN" : "1449340040",
   "publisher_id" : 1,
   "reviews" : {
    "isUseful" : true,
    "content" : "Cool book!",
    "reviewer" : "Xiaoshen",
    "timestamp" : "1377742184305"
   }
  }
],
"ok" : 1
}

Aggregation introduction: http://docs.mongodb.org/manual/applications/aggregation/

One caveat: Aggregation is running upon JavaScript VM, which means - V8 after MongoDB version 2.4, although V8 is deadly fast, it cannot compete with native compiled/optimized C/C++ implementation, refer:
http://stackoverflow.com/questions/2599943/mongodbs-performance-on-aggregation-queries

Common Practices

Denormalize data when frequently read together (one-to-one, one-to-many)
Normalize data when where are separated queries happened frequently for both entities; or when there are too many data duplications
Reduce collection size by always using short field names as a convention. This will help you save memory over time.
Avoid using DBRef! Why
Always test queries with .explain() to check that you’re hitting the right index.