MongoDB Schema Design and Common Practices

Installation

Exhaustive documentation: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

Mongo executables will be installed into /use/bin/, database files will under /data/db/

Log file location: /var/log/mongodb/mongodb.log

Terminology


Adminstration

sudo service mongodb start/stop/restart or simply issue mongod to start MongoDB process
mongo to enter MongoDB Console
> show dbs  #Show all databases
> db             #Show current db
> db.help()   #View commands related with db

# "Create" a new database named foo_db
> use foo_db  # Mongo will create this DB virutally, if we save anything into any collections, it will really creates the db.
# Create a new document into a collection (implicitly)
> db.user_profiles.save({ first_name: "Wayne", last_name: "Ye" })
# Explicitly create a new collection ()
db.createCollection("user_profile")
# Query the collection
> db.user_profiles.find()
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "first_name" : "Wayne", "last_name" : "Ye" }
> db.user_profiles.save({ first_name: "Wendy", last_name: "Shen", gender: "Female" })
> db.user_profiles.find()
# Update
> db.collection.update( { field: value1 }, { $set: { field1: value2 } } );
# View status
> db.stats()
> db.mycol.stats()
# Query subdocument using Dot Notation
> db.demo.insert({ "Items": [ { "Name": "Milk Powder", "Price": 9.9 }, { "Name": "Toy Car", "Price": 26 } ] })
> db.demo.find({ "Items.Price": { $gt: 20 } })
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "Items" : [  {  "Name" : "Milk Powder",  "Price" : 9.9 },  {  "Name" : "Toy Car",  "Price" : 26 } ] }
 
Batch administration from JavaScript
http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/
http://docs.mongodb.org/manual/core/server-side-javascript/#running-js-scripts-in-mongo-on-mongod-host
http://docs.mongodb.org/manual/tutorial/store-javascript-function-on-server/ 
http://docs.mongodb.org/manual/reference/method/  (Mongo shell JS references)

mongo localhost:27017/mydb db_schema.js

Or

load("scripts/myjstest.js") OR load("/data/db/scripts/myjstest.js")

Schema Design

Embedding (de-normalize data)

Store two related pieces of data in a single document.

When:

  • There is a "contains" relationship between entities.
  • There is a "one-to-many" relationship, and the "many" objects always appear inline with the "one".
Example 1: Blog with comments

Denormalized blog with comments

db.blogs.insert({
 _id: 1,
 title: "Investigation on MongoDB",
 content: "some investigation contents",
 post_date: Date.now(),
 permalink: "http://foo.bar/investigation_on_mongodb",
 comments: [
   { content: "Gorgeous post!!!", nickname: "Scott", email: "foo@bar.org", timestamp: "1377742184305" },
   { content: "Splendid article!!!", nickname: "Guthrie", email: "foo@bar.org", timestamp: "1377742184305" }
 ]}
              );
Example 2: Dishes and Cheves

Normalized Dishes and Cheves

db.dishes.insert({
 _id: 1,
 name: "Kong Bao Ji Ding",
 price: 5.5,
 rate: 4.5,
 cheves: [ "Flora Zhang", "Cristina Wang" ]
 }
);
 
db.cheves.insert({
 _id: 1,
 name: "Flora Zhang",
 age: 32,
 avatar: "http://www.gravatar.com/avatar.php?gravatar_id=dc654756c7c",
 dishes: [ "Kong Bao Ji Ding", "Knight Zhang Beef", "Ma Po Tou Fu" ]
 }
);

 

Benefits:
Better performance for read operations
Request and retrieve related data in a single database operation.

Referencing (Normalize-data)

store references between two documents to indicate a relationship between the data represented in each document.

When:

  • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.

Benefits:

  • Separation of Concerns
  • Data model independent of logic

Drawback:
Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server. 

Example 3: Books and publisher

Books and publishers

db.books.insert({
 _id: 1,
 name: "MongoDB Applied Design Patterns",
 price: 35,
 rate: 5,
 author: "Rick Copeland",
 ISBN: "1449340040",
 publisher_id: 1,
 reviews: [
   { isUseful: true, content: "Cool book!", reviewer: "Dick", timestamp: "1377742184305" },
   { isUseful: true, content: "Cool book!", reviewer: "Xiaoshen", timestamp: "1377742184305" }
 ]
 }
);
  
db.publishers.insert({
 _id: 1,
 name: "Packtpub INC",
 address: "2nd Floor, Livery Place 35 Livery Street Birmingham",
 telephone: "+44 0121 265 6484",
 }
);

Advanced Features

Multikey-Indexes

Mongo supports indexing subdocument's key, consider the above "Books and Publishers" collection, Mongo can index the reviewer key by telling him this:

db.books.ensureIndex({ "reviews.reviewer": 1 })

Aggregation framework

A MongoDB aggregation is a series of special operators applied to a collection. An operator is a JavaScript object with a single property, the operator name, which value is an option object. The core point of aggregation framework is the aggregation pipeline which is a framework for data aggregation modeled on the concept of data processing pipelines.

Aggregation was introduced in Mongo version 2.2, below is a table of comparison between Mongo and traditional relational DB from the aspect of aggregation functionalities:

SQL Terms, Functions, and Concepts MongoDB Aggregation Operators
WHERE $match
GROUP BY $group
HAVING $match
SELECT $project
ORDER BY $sort
LIMIT $limit
SUM() $sum
COUNT() $sum
join No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document.

For example, still using the above "Books and Publishers" example, image I want to query "a specific reviewer with the book(s) he/she reviewed", I can do this:

> > db.books.aggregate({ $unwind: "$reviews" }, { $match: { "reviews.reviewer": "Xiaoshen"} })
{
 "result" : [
  {
   "_id" : 1,
   "name" : "MongoDB Applied Design Patterns",
   "price" : 35,
   "rate" : 5,
   "author" : "Rick Copeland",
   "ISBN" : "1449340040",
   "publisher_id" : 1,
   "reviews" : {
    "isUseful" : true,
    "content" : "Cool book!",
    "reviewer" : "Xiaoshen",
    "timestamp" : "1377742184305"
   }
  }
 ],
 "ok" : 1
}

Aggregation introduction: http://docs.mongodb.org/manual/applications/aggregation/

One caveat: Aggregation is running upon JavaScript VM, which means - V8 after MongoDB version 2.4, although V8 is deadly fast, it cannot compete with native compiled/optimized C/C++ implementation, refer:
http://stackoverflow.com/questions/2599943/mongodbs-performance-on-aggregation-queries

Common Practices 

  • Denormalize data when frequently read together (one-to-one, one-to-many)
  • Normalize data when where are separated queries happened frequently for both entities; or when there are too many data duplications
  • Reduce collection size by always using short field names as a convention. This will help you save memory over time.
  • Avoid using DBRef! Why
  • Always test queries with .explain() to check that you’re hitting the right index.

Useful resources

The ultimate manual
http://docs.mongodb.org/manual/ 

Greate article explains differences between MongoDB and other famous Relational DBs:
http://docs.mongodb.org/manual/reference/sql-comparison/

Data Modeling Considerations for MongoDB Applications
http://docs.mongodb.org/manual/core/data-modeling/MongoDB-sharding-guide.pdf 

Serialize Documents with the CSharp Driver
http://docs.mongodb.org/ecosystem/tutorial/serialize-documents-with-the-csharp-driver/ 

Schema Design --Indexes!!
http://www.slideshare.net/jrosoff/mongodb-advanced-schema-design-inboxes 

Sharding and Mongo DB
http://docs.mongodb.org/master/MongoDB-sharding-guide.pdf

MongoDB Operations Best Practices
http://info.10gen.com/rs/10gen/images/10gen-MongoDB_Operations_Best_Practices.pdf 

Sharding and Replica Sets Illustrated
http://www.kchodorow.com/blog/2010/08/09/sharding-and-replica-sets-illustrated/

Tags:

Categories:

Updated:

Leave a comment