Data Modeling
Good schema design is the foundation of performant applications. Today you learn to model relationships, embed vs reference data, and implement robust validation.
Embedding vs Referencing
In MongoDB, you have two main ways to represent relationships between data:
When to Embed (Denormalize)
- Data is frequently accessed together
- One-to-few relationships (1 user : few addresses)
- Data does not change frequently
- You need atomic updates on related data
When to Reference (Normalize)
- Data is accessed independently
- One-to-many or many-to-many relationships
- Data changes frequently
- Document size would exceed 16MB limit
Embedding Documents
Embedding stores related data in the same document. Here's a user with embedded addresses:
const mongoose = require('mongoose');
// Define embedded address schema (sub-document)
const addressSchema = new mongoose.Schema({
// Street address line
street: { type: String, required: true },
// City name
city: { type: String, required: true },
// State or province
state: String,
// Postal/ZIP code
zipCode: { type: String, required: true },
// Country with default value
country: { type: String, default: 'USA' },
// Whether this is the primary address
isPrimary: { type: Boolean, default: false }
});
// Main user schema with embedded addresses
const userSchema = new mongoose.Schema({
name: { type: String, required: true },
email: { type: String, required: true, unique: true },
// Array of embedded address documents
// Each element follows addressSchema structure
addresses: [addressSchema],
// Timestamp when user was created
createdAt: { type: Date, default: Date.now }
});
module.exports = mongoose.model('User', userSchema);
// Create user with embedded addresses
const user = await User.create({
name: 'John Doe',
email: 'john@example.com',
addresses: [
{
street: '123 Main St',
city: 'New York',
state: 'NY',
zipCode: '10001',
isPrimary: true
},
{
street: '456 Oak Ave',
city: 'Los Angeles',
state: 'CA',
zipCode: '90001'
}
]
});
// Add new address to existing user
await User.findByIdAndUpdate(userId, {
$push: {
addresses: {
street: '789 Pine Rd',
city: 'Chicago',
state: 'IL',
zipCode: '60601'
}
}
});
// Update specific embedded document by its _id
await User.findOneAndUpdate(
{ _id: userId, 'addresses._id': addressId },
{ $set: { 'addresses.$.city': 'Brooklyn' } }
);
// Remove embedded document
await User.findByIdAndUpdate(userId, {
$pull: { addresses: { _id: addressId } }
});
Referencing Documents
References store only the ObjectId, requiring a separate query to fetch related data:
const mongoose = require('mongoose');
const postSchema = new mongoose.Schema({
// Post title - required field
title: { type: String, required: true },
// Post content body
content: { type: String, required: true },
// Reference to User model - stores ObjectId
// 'ref' tells Mongoose which model to use for population
author: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true
},
// Array of references for many-to-many relationship
// A post can have multiple tags
tags: [{
type: mongoose.Schema.Types.ObjectId,
ref: 'Tag'
}],
// Comments as embedded documents (hybrid approach)
comments: [{
// Reference to user who made the comment
user: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
// Comment text content
text: { type: String, required: true },
// When the comment was created
createdAt: { type: Date, default: Date.now }
}],
createdAt: { type: Date, default: Date.now }
});
module.exports = mongoose.model('Post', postSchema);
Population - Fetching Referenced Data
Population automatically replaces ObjectId references with actual documents:
// Basic population - replace author ObjectId with user document
const post = await Post.findById(postId).populate('author');
// Now post.author is the full user object, not just an ID
console.log(post.author.name); // 'John Doe'
// Populate with field selection - only get specific fields
const post = await Post.findById(postId)
.populate('author', 'name email'); // only name and email
// Populate multiple fields at once
const post = await Post.findById(postId)
.populate('author', 'name')
.populate('tags', 'name color');
// Nested population - populate references within populated docs
const post = await Post.findById(postId)
.populate({
path: 'comments.user', // populate user in each comment
select: 'name avatar' // only these fields
});
// Advanced population with match and options
const posts = await Post.find()
.populate({
path: 'author',
match: { isActive: true }, // only populate active authors
select: 'name email -_id' // exclude _id
});
// Populate all posts and sort by author name
const posts = await Post.find()
.populate({
path: 'author',
options: { sort: { name: 1 } }
});
Virtual Population
Virtuals allow you to populate without storing references in both directions:
const mongoose = require('mongoose');
const authorSchema = new mongoose.Schema({
name: { type: String, required: true },
email: { type: String, required: true }
}, {
// Enable virtuals in JSON and Object output
toJSON: { virtuals: true },
toObject: { virtuals: true }
});
// Virtual field 'posts' - not stored in database
// Dynamically populated based on Post.author reference
authorSchema.virtual('posts', {
ref: 'Post', // Model to populate from
localField: '_id', // Field in Author
foreignField: 'author' // Field in Post that references Author
});
module.exports = mongoose.model('Author', authorSchema);
// Usage: Get author with all their posts
const author = await Author.findById(authorId)
.populate('posts'); // posts is virtual, populated dynamically
console.log(author.posts); // Array of all posts by this author
Schema Validation
Mongoose provides powerful built-in and custom validation:
const productSchema = new mongoose.Schema({
// Name: required with length constraints
name: {
type: String,
required: [true, 'Product name is required'],
minlength: [2, 'Name must be at least 2 characters'],
maxlength: [100, 'Name cannot exceed 100 characters'],
trim: true
},
// Price: required with minimum value
price: {
type: Number,
required: [true, 'Price is required'],
min: [0, 'Price cannot be negative']
},
// Sale price: custom validator to ensure it's less than price
salePrice: {
type: Number,
validate: {
// Validator function - 'this' refers to document
validator: function(value) {
// Only validate if salePrice is provided
return value == null || value < this.price;
},
message: 'Sale price must be less than regular price'
}
},
// Category: must be one of predefined values
category: {
type: String,
enum: {
values: ['electronics', 'clothing', 'books', 'home'],
message: '{VALUE} is not a valid category'
}
},
// Email: regex pattern validation
contactEmail: {
type: String,
match: [
/^\S+@\S+\.\S+$/,
'Please provide a valid email address'
]
},
// SKU: custom async validator (check uniqueness)
sku: {
type: String,
validate: {
validator: async function(value) {
// Check if SKU already exists (excluding current doc)
const count = await mongoose.model('Product')
.countDocuments({ sku: value, _id: { $ne: this._id } });
return count === 0;
},
message: 'SKU must be unique'
}
},
// Stock: integer validation using custom validator
stock: {
type: Number,
default: 0,
validate: {
validator: Number.isInteger,
message: 'Stock must be a whole number'
}
}
});
Middleware (Hooks)
Middleware functions run at specific points in the document lifecycle:
const userSchema = new mongoose.Schema({
name: String,
email: String,
password: String,
slug: String,
updatedAt: Date
});
// PRE middleware - runs BEFORE the operation
// 'save' hook runs before document.save()
userSchema.pre('save', function(next) {
// 'this' refers to the document being saved
// Generate slug from name
this.slug = this.name.toLowerCase().replace(/\s+/g, '-');
// Update timestamp
this.updatedAt = new Date();
// Call next() to continue to save
next();
});
// PRE middleware for find queries
// Runs before any find operation
userSchema.pre(/^find/, function(next) {
// 'this' is the query object
// Automatically exclude soft-deleted documents
this.find({ isDeleted: { $ne: true } });
next();
});
// POST middleware - runs AFTER the operation
userSchema.post('save', function(doc, next) {
// 'doc' is the saved document
console.log(`User ${doc.name} was saved`);
next();
});
// PRE middleware for remove - cleanup related data
userSchema.pre('remove', async function(next) {
// Delete all posts by this user when user is deleted
await mongoose.model('Post').deleteMany({ author: this._id });
next();
});
Video Resources
Knowledge Check
1. When should you embed documents rather than use references?
2. What does the populate() method do?
3. When does a 'pre save' middleware run?
Build a Blog Data Model
Design a complete blog system with proper relationships:
- Author model with name, email, bio, and virtual 'posts' field
- Post model with title, content, author (ref), tags (refs), and embedded comments
- Tag model with name and color
- Add pre-save middleware to generate slugs from titles
- Create an API endpoint that returns posts with populated author and tags