01How The Internet Works 02Your First Node.js Server 03HTTP Methods and JSON 04Express.js Framework 05Building REST APIs 06MongoDB Fundamentals 07Mongoose ODM 08Data Modeling 09Authentication Basics 10JWT Implementation

Settings

Theme
Sand
Cloud
Midnight
Forest
Sunset
Purple
Ocean
Crimson
Font
Merriweather
Inter
JetBrains
Space Grotesk
Fira Code
Playfair
Font Size
100%
Bookmark
No bookmark set
Day 8

Data Modeling

Good schema design is the foundation of performant applications. Today you learn to model relationships, embed vs reference data, and implement robust validation.

Embedding vs Referencing

In MongoDB, you have two main ways to represent relationships between data:

When to Embed (Denormalize)

  • Data is frequently accessed together
  • One-to-few relationships (1 user : few addresses)
  • Data does not change frequently
  • You need atomic updates on related data

When to Reference (Normalize)

  • Data is accessed independently
  • One-to-many or many-to-many relationships
  • Data changes frequently
  • Document size would exceed 16MB limit

Embedding Documents

Embedding stores related data in the same document. Here's a user with embedded addresses:

models/UserWithAddresses.js
const mongoose = require('mongoose');

// Define embedded address schema (sub-document)
const addressSchema = new mongoose.Schema({
  // Street address line
  street: { type: String, required: true },
  // City name
  city: { type: String, required: true },
  // State or province
  state: String,
  // Postal/ZIP code
  zipCode: { type: String, required: true },
  // Country with default value
  country: { type: String, default: 'USA' },
  // Whether this is the primary address
  isPrimary: { type: Boolean, default: false }
});

// Main user schema with embedded addresses
const userSchema = new mongoose.Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true },
  // Array of embedded address documents
  // Each element follows addressSchema structure
  addresses: [addressSchema],
  // Timestamp when user was created
  createdAt: { type: Date, default: Date.now }
});

module.exports = mongoose.model('User', userSchema);
Working with Embedded Documents
// Create user with embedded addresses
const user = await User.create({
  name: 'John Doe',
  email: 'john@example.com',
  addresses: [
    {
      street: '123 Main St',
      city: 'New York',
      state: 'NY',
      zipCode: '10001',
      isPrimary: true
    },
    {
      street: '456 Oak Ave',
      city: 'Los Angeles',
      state: 'CA',
      zipCode: '90001'
    }
  ]
});

// Add new address to existing user
await User.findByIdAndUpdate(userId, {
  $push: {
    addresses: {
      street: '789 Pine Rd',
      city: 'Chicago',
      state: 'IL',
      zipCode: '60601'
    }
  }
});

// Update specific embedded document by its _id
await User.findOneAndUpdate(
  { _id: userId, 'addresses._id': addressId },
  { $set: { 'addresses.$.city': 'Brooklyn' } }
);

// Remove embedded document
await User.findByIdAndUpdate(userId, {
  $pull: { addresses: { _id: addressId } }
});

Referencing Documents

References store only the ObjectId, requiring a separate query to fetch related data:

models/Post.js
const mongoose = require('mongoose');

const postSchema = new mongoose.Schema({
  // Post title - required field
  title: { type: String, required: true },
  // Post content body
  content: { type: String, required: true },
  // Reference to User model - stores ObjectId
  // 'ref' tells Mongoose which model to use for population
  author: {
    type: mongoose.Schema.Types.ObjectId,
    ref: 'User',
    required: true
  },
  // Array of references for many-to-many relationship
  // A post can have multiple tags
  tags: [{
    type: mongoose.Schema.Types.ObjectId,
    ref: 'Tag'
  }],
  // Comments as embedded documents (hybrid approach)
  comments: [{
    // Reference to user who made the comment
    user: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
    // Comment text content
    text: { type: String, required: true },
    // When the comment was created
    createdAt: { type: Date, default: Date.now }
  }],
  createdAt: { type: Date, default: Date.now }
});

module.exports = mongoose.model('Post', postSchema);

Population - Fetching Referenced Data

Population automatically replaces ObjectId references with actual documents:

Using populate()
// Basic population - replace author ObjectId with user document
const post = await Post.findById(postId).populate('author');
// Now post.author is the full user object, not just an ID
console.log(post.author.name);  // 'John Doe'

// Populate with field selection - only get specific fields
const post = await Post.findById(postId)
  .populate('author', 'name email');  // only name and email

// Populate multiple fields at once
const post = await Post.findById(postId)
  .populate('author', 'name')
  .populate('tags', 'name color');

// Nested population - populate references within populated docs
const post = await Post.findById(postId)
  .populate({
    path: 'comments.user',       // populate user in each comment
    select: 'name avatar'        // only these fields
  });

// Advanced population with match and options
const posts = await Post.find()
  .populate({
    path: 'author',
    match: { isActive: true },   // only populate active authors
    select: 'name email -_id'   // exclude _id
  });

// Populate all posts and sort by author name
const posts = await Post.find()
  .populate({
    path: 'author',
    options: { sort: { name: 1 } }
  });

Virtual Population

Virtuals allow you to populate without storing references in both directions:

models/Author.js
const mongoose = require('mongoose');

const authorSchema = new mongoose.Schema({
  name: { type: String, required: true },
  email: { type: String, required: true }
}, {
  // Enable virtuals in JSON and Object output
  toJSON: { virtuals: true },
  toObject: { virtuals: true }
});

// Virtual field 'posts' - not stored in database
// Dynamically populated based on Post.author reference
authorSchema.virtual('posts', {
  ref: 'Post',              // Model to populate from
  localField: '_id',        // Field in Author
  foreignField: 'author'    // Field in Post that references Author
});

module.exports = mongoose.model('Author', authorSchema);

// Usage: Get author with all their posts
const author = await Author.findById(authorId)
  .populate('posts');  // posts is virtual, populated dynamically

console.log(author.posts);  // Array of all posts by this author

Schema Validation

Mongoose provides powerful built-in and custom validation:

Comprehensive Validation
const productSchema = new mongoose.Schema({
  // Name: required with length constraints
  name: {
    type: String,
    required: [true, 'Product name is required'],
    minlength: [2, 'Name must be at least 2 characters'],
    maxlength: [100, 'Name cannot exceed 100 characters'],
    trim: true
  },

  // Price: required with minimum value
  price: {
    type: Number,
    required: [true, 'Price is required'],
    min: [0, 'Price cannot be negative']
  },

  // Sale price: custom validator to ensure it's less than price
  salePrice: {
    type: Number,
    validate: {
      // Validator function - 'this' refers to document
      validator: function(value) {
        // Only validate if salePrice is provided
        return value == null || value < this.price;
      },
      message: 'Sale price must be less than regular price'
    }
  },

  // Category: must be one of predefined values
  category: {
    type: String,
    enum: {
      values: ['electronics', 'clothing', 'books', 'home'],
      message: '{VALUE} is not a valid category'
    }
  },

  // Email: regex pattern validation
  contactEmail: {
    type: String,
    match: [
      /^\S+@\S+\.\S+$/,
      'Please provide a valid email address'
    ]
  },

  // SKU: custom async validator (check uniqueness)
  sku: {
    type: String,
    validate: {
      validator: async function(value) {
        // Check if SKU already exists (excluding current doc)
        const count = await mongoose.model('Product')
          .countDocuments({ sku: value, _id: { $ne: this._id } });
        return count === 0;
      },
      message: 'SKU must be unique'
    }
  },

  // Stock: integer validation using custom validator
  stock: {
    type: Number,
    default: 0,
    validate: {
      validator: Number.isInteger,
      message: 'Stock must be a whole number'
    }
  }
});

Middleware (Hooks)

Middleware functions run at specific points in the document lifecycle:

Schema Middleware
const userSchema = new mongoose.Schema({
  name: String,
  email: String,
  password: String,
  slug: String,
  updatedAt: Date
});

// PRE middleware - runs BEFORE the operation
// 'save' hook runs before document.save()
userSchema.pre('save', function(next) {
  // 'this' refers to the document being saved
  // Generate slug from name
  this.slug = this.name.toLowerCase().replace(/\s+/g, '-');
  // Update timestamp
  this.updatedAt = new Date();
  // Call next() to continue to save
  next();
});

// PRE middleware for find queries
// Runs before any find operation
userSchema.pre(/^find/, function(next) {
  // 'this' is the query object
  // Automatically exclude soft-deleted documents
  this.find({ isDeleted: { $ne: true } });
  next();
});

// POST middleware - runs AFTER the operation
userSchema.post('save', function(doc, next) {
  // 'doc' is the saved document
  console.log(`User ${doc.name} was saved`);
  next();
});

// PRE middleware for remove - cleanup related data
userSchema.pre('remove', async function(next) {
  // Delete all posts by this user when user is deleted
  await mongoose.model('Post').deleteMany({ author: this._id });
  next();
});

Video Resources

Knowledge Check

1. When should you embed documents rather than use references?

AWhen you have many-to-many relationships
BWhen related data changes frequently
CWhen data is frequently accessed together and rarely changes
DWhen documents might exceed 16MB

2. What does the populate() method do?

ACreates new documents in referenced collections
BReplaces ObjectId references with actual documents
CEmbeds documents directly into the schema
DValidates referenced documents exist

3. When does a 'pre save' middleware run?

ABefore the document is saved to the database
BAfter the document is saved to the database
CWhen the schema is defined
DWhen the model is created
Practice Project

Build a Blog Data Model

Design a complete blog system with proper relationships:

  • Author model with name, email, bio, and virtual 'posts' field
  • Post model with title, content, author (ref), tags (refs), and embedded comments
  • Tag model with name and color
  • Add pre-save middleware to generate slugs from titles
  • Create an API endpoint that returns posts with populated author and tags