MongoDB Schema Design and Common Practices
Installation
Exhaustive documentation: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
Mongo executables will be installed into /use/bin/, database files will under /data/db/
Log file location: /var/log/mongodb/mongodb.log
Terminology
Adminstration
> db #Show current db
> db.help() #View commands related with db
# "Create" a new database named foo_db
> use foo_db # Mongo will create this DB virutally, if we save anything into any collections, it will really creates the db.
# Create a new document into a collection (implicitly)
> db.user_profiles.save({ first_name: "Wayne", last_name: "Ye" })
# Explicitly create a new collection ()
db.createCollection("user_profile")
# Query the collection
> db.user_profiles.find()
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "first_name" : "Wayne", "last_name" : "Ye" }
> db.user_profiles.save({ first_name: "Wendy", last_name: "Shen", gender: "Female" })
> db.user_profiles.find()
# Update
> db.collection.update( { field: value1 }, { $set: { field1: value2 } } );
# View status
> db.stats()
> db.mycol.stats()
> db.demo.insert({ "Items": [ { "Name": "Milk Powder", "Price": 9.9 }, { "Name": "Toy Car", "Price": 26 } ] })
> db.demo.find({ "Items.Price": { $gt: 20 } })
{ "_id" : ObjectId("5216fa545b4a83d66587d397"), "Items" : [ { "Name" : "Milk Powder", "Price" : 9.9 }, { "Name" : "Toy Car", "Price" : 26 } ] }
Batch administration from JavaScript
http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/
http://docs.mongodb.org/manual/core/server-side-javascript/#running-js-scripts-in-mongo-on-mongod-host
http://docs.mongodb.org/manual/tutorial/store-javascript-function-on-server/
http://docs.mongodb.org/manual/reference/method/ (Mongo shell JS references)
mongo localhost:27017/mydb db_schema.js
Or
load("scripts/myjstest.js") OR load("/data/db/scripts/myjstest.js")
Schema Design
Embedding (de-normalize data)
Store two related pieces of data in a single document.
When:
- There is a "contains" relationship between entities.
- There is a "one-to-many" relationship, and the "many" objects always appear inline with the "one".
Example 1: Blog with comments
Denormalized blog with comments
_id: 1,
title: "Investigation on MongoDB",
content: "some investigation contents",
post_date: Date.now(),
permalink: "http://foo.bar/investigation_on_mongodb",
comments: [
{ content: "Gorgeous post!!!", nickname: "Scott", email: "foo@bar.org", timestamp: "1377742184305" },
{ content: "Splendid article!!!", nickname: "Guthrie", email: "foo@bar.org", timestamp: "1377742184305" }
]}
);
Example 2: Dishes and Cheves
Normalized Dishes and Cheves
_id: 1,
name: "Kong Bao Ji Ding",
price: 5.5,
rate: 4.5,
cheves: [ "Flora Zhang", "Cristina Wang" ]
}
);
db.cheves.insert({
_id: 1,
name: "Flora Zhang",
age: 32,
avatar: "http://www.gravatar.com/avatar.php?gravatar_id=dc654756c7c",
dishes: [ "Kong Bao Ji Ding", "Knight Zhang Beef", "Ma Po Tou Fu" ]
}
);
Benefits:
Better performance for read operations
Request and retrieve related data in a single database operation.
Referencing (Normalize-data)
store references between two documents to indicate a relationship between the data represented in each document.
When:
- when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
- to represent more complex many-to-many relationships.
- to model large hierarchical data sets.
Benefits:
- Separation of Concerns
-
Data model independent of logic
Drawback:
Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server.
Example 3: Books and publisher
Books and publishers
_id: 1,
name: "MongoDB Applied Design Patterns",
price: 35,
rate: 5,
author: "Rick Copeland",
ISBN: "1449340040",
publisher_id: 1,
reviews: [
{ isUseful: true, content: "Cool book!", reviewer: "Dick", timestamp: "1377742184305" },
{ isUseful: true, content: "Cool book!", reviewer: "Xiaoshen", timestamp: "1377742184305" }
]
}
);
db.publishers.insert({
_id: 1,
name: "Packtpub INC",
address: "2nd Floor, Livery Place 35 Livery Street Birmingham",
telephone: "+44 0121 265 6484",
}
);
Advanced Features
Multikey-Indexes
Mongo supports indexing subdocument's key, consider the above "Books and Publishers" collection, Mongo can index the reviewer key by telling him this:
Aggregation framework
A MongoDB aggregation is a series of special operators applied to a collection. An operator is a JavaScript object with a single property, the operator name, which value is an option object. The core point of aggregation framework is the aggregation pipeline which is a framework for data aggregation modeled on the concept of data processing pipelines.
Aggregation was introduced in Mongo version 2.2, below is a table of comparison between Mongo and traditional relational DB from the aspect of aggregation functionalities:
SQL Terms, Functions, and Concepts | MongoDB Aggregation Operators |
---|---|
WHERE | $match |
GROUP BY | $group |
HAVING | $match |
SELECT | $project |
ORDER BY | $sort |
LIMIT | $limit |
SUM() | $sum |
COUNT() | $sum |
join | No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document. |
For example, still using the above "Books and Publishers" example, image I want to query "a specific reviewer with the book(s) he/she reviewed", I can do this:
> > db.books.aggregate({ $unwind: "$reviews" }, { $match: { "reviews.reviewer": "Xiaoshen"} })
{
"result" : [
{
"_id" : 1,
"name" : "MongoDB Applied Design Patterns",
"price" : 35,
"rate" : 5,
"author" : "Rick Copeland",
"ISBN" : "1449340040",
"publisher_id" : 1,
"reviews" : {
"isUseful" : true,
"content" : "Cool book!",
"reviewer" : "Xiaoshen",
"timestamp" : "1377742184305"
}
}
],
"ok" : 1
}
Aggregation introduction: http://docs.mongodb.org/manual/applications/aggregation/
One caveat: Aggregation is running upon JavaScript VM, which means - V8 after MongoDB version 2.4, although V8 is deadly fast, it cannot compete with native compiled/optimized C/C++ implementation, refer:
http://stackoverflow.com/questions/2599943/mongodbs-performance-on-aggregation-queries
Common Practices
- Denormalize data when frequently read together (one-to-one, one-to-many)
- Normalize data when where are separated queries happened frequently for both entities; or when there are too many data duplications
- Reduce collection size by always using short field names as a convention. This will help you save memory over time.
- Avoid using DBRef! Why
- Always test queries with .explain() to check that you’re hitting the right index.
Useful resources
The ultimate manual
http://docs.mongodb.org/manual/
Greate article explains differences between MongoDB and other famous Relational DBs:
http://docs.mongodb.org/manual/reference/sql-comparison/
Data Modeling Considerations for MongoDB Applications
http://docs.mongodb.org/manual/core/data-modeling/MongoDB-sharding-guide.pdf
Serialize Documents with the CSharp Driver
http://docs.mongodb.org/ecosystem/tutorial/serialize-documents-with-the-csharp-driver/
Schema Design --Indexes!!
http://www.slideshare.net/jrosoff/mongodb-advanced-schema-design-inboxes
Sharding and Mongo DB
http://docs.mongodb.org/master/MongoDB-sharding-guide.pdf
MongoDB Operations Best Practices
http://info.10gen.com/rs/10gen/images/10gen-MongoDB_Operations_Best_Practices.pdf
Sharding and Replica Sets Illustrated
http://www.kchodorow.com/blog/2010/08/09/sharding-and-replica-sets-illustrated/
Leave a comment