Day 8: Data Modeling | Backend Mastery

Day 8

Data Modeling

Good schema design is the foundation of performant applications. Today you learn to model relationships, embed vs reference data, and implement robust validation.

Embedding vs Referencing

In MongoDB, you have two main ways to represent relationships between data:

When to Embed (Denormalize)

Data is frequently accessed together
One-to-few relationships (1 user : few addresses)
Data does not change frequently
You need atomic updates on related data

When to Reference (Normalize)

Data is accessed independently
One-to-many or many-to-many relationships
Data changes frequently
Document size would exceed 16MB limit

Embedding Documents

Embedding stores related data in the same document. Here's a user with embedded addresses:

models/UserWithAddresses.js

const mongoose = require('mongoose');

// Define embedded address schema (sub-document)
const addressSchema = new mongoose.Schema({
  // Street address line
  street: { type: String, required: true },
  // City name
  city: { type: String, required: true },
  // State or province
  state: String,
  // Postal/ZIP code
  zipCode: { type: String, required: true },
  // Country with default value
  country: { type: String, default: 'USA' },
  // Whether this is the primary address
  isPrimary: { type: Boolean, default: false }
});

// Main user schema with embedded addresses
const userSchema = new mongoose.Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true },
  // Array of embedded address documents
  // Each element follows addressSchema structure
  addresses: [addressSchema],
  // Timestamp when user was created
  createdAt: { type: Date, default: Date.now }
});

module.exports = mongoose.model('User', userSchema);

Working with Embedded Documents

// Create user with embedded addresses
const user = await User.create({
  name: 'John Doe',
  email: 'john@example.com',
  addresses: [
    {
      street: '123 Main St',
      city: 'New York',
      state: 'NY',
      zipCode: '10001',
      isPrimary: true
    },
    {
      street: '456 Oak Ave',
      city: 'Los Angeles',
      state: 'CA',
      zipCode: '90001'
    }
  ]
});

// Add new address to existing user
await User.findByIdAndUpdate(userId, {
  $push: {
    addresses: {
      street: '789 Pine Rd',
      city: 'Chicago',
      state: 'IL',
      zipCode: '60601'
    }
  }
});

// Update specific embedded document by its _id
await User.findOneAndUpdate(
  { _id: userId, 'addresses._id': addressId },
  { $set: { 'addresses.$.city': 'Brooklyn' } }
);

// Remove embedded document
await User.findByIdAndUpdate(userId, {
  $pull: { addresses: { _id: addressId } }
});

Referencing Documents

References store only the ObjectId, requiring a separate query to fetch related data:

models/Post.js

const mongoose = require('mongoose');

const postSchema = new mongoose.Schema({
  // Post title - required field
  title: { type: String, required: true },
  // Post content body
  content: { type: String, required: true },
  // Reference to User model - stores ObjectId
  // 'ref' tells Mongoose which model to use for population
  author: {
    type: mongoose.Schema.Types.ObjectId,
    ref: 'User',
    required: true
  },
  // Array of references for many-to-many relationship
  // A post can have multiple tags
  tags: [{
    type: mongoose.Schema.Types.ObjectId,
    ref: 'Tag'
  }],
  // Comments as embedded documents (hybrid approach)
  comments: [{
    // Reference to user who made the comment
    user: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
    // Comment text content
    text: { type: String, required: true },
    // When the comment was created
    createdAt: { type: Date, default: Date.now }
  }],
  createdAt: { type: Date, default: Date.now }
});

module.exports = mongoose.model('Post', postSchema);

Population - Fetching Referenced Data

Population automatically replaces ObjectId references with actual documents:

Using populate()

// Basic population - replace author ObjectId with user document
const post = await Post.findById(postId).populate('author');
// Now post.author is the full user object, not just an ID
console.log(post.author.name);  // 'John Doe'

// Populate with field selection - only get specific fields
const post = await Post.findById(postId)
  .populate('author', 'name email');  // only name and email

// Populate multiple fields at once
const post = await Post.findById(postId)
  .populate('author', 'name')
  .populate('tags', 'name color');

// Nested population - populate references within populated docs
const post = await Post.findById(postId)
  .populate({
    path: 'comments.user',       // populate user in each comment
    select: 'name avatar'        // only these fields
  });

// Advanced population with match and options
const posts = await Post.find()
  .populate({
    path: 'author',
    match: { isActive: true },   // only populate active authors
    select: 'name email -_id'   // exclude _id
  });

// Populate all posts and sort by author name
const posts = await Post.find()
  .populate({
    path: 'author',
    options: { sort: { name: 1 } }
  });

Virtual Population

Virtuals allow you to populate without storing references in both directions:

models/Author.js

const mongoose = require('mongoose');

const authorSchema = new mongoose.Schema({
  name: { type: String, required: true },
  email: { type: String, required: true }
}, {
  // Enable virtuals in JSON and Object output
  toJSON: { virtuals: true },
  toObject: { virtuals: true }
});

// Virtual field 'posts' - not stored in database
// Dynamically populated based on Post.author reference
authorSchema.virtual('posts', {
  ref: 'Post',              // Model to populate from
  localField: '_id',        // Field in Author
  foreignField: 'author'    // Field in Post that references Author
});

module.exports = mongoose.model('Author', authorSchema);

// Usage: Get author with all their posts
const author = await Author.findById(authorId)
  .populate('posts');  // posts is virtual, populated dynamically

console.log(author.posts);  // Array of all posts by this author

Schema Validation

Mongoose provides powerful built-in and custom validation:

Comprehensive Validation

const productSchema = new mongoose.Schema({
  // Name: required with length constraints
  name: {
    type: String,
    required: [true, 'Product name is required'],
    minlength: [2, 'Name must be at least 2 characters'],
    maxlength: [100, 'Name cannot exceed 100 characters'],
    trim: true
  },

  // Price: required with minimum value
  price: {
    type: Number,
    required: [true, 'Price is required'],
    min: [0, 'Price cannot be negative']
  },

  // Sale price: custom validator to ensure it's less than price
  salePrice: {
    type: Number,
    validate: {
      // Validator function - 'this' refers to document
      validator: function(value) {
        // Only validate if salePrice is provided
        return value == null || value < this.price;
      },
      message: 'Sale price must be less than regular price'
    }
  },

  // Category: must be one of predefined values
  category: {
    type: String,
    enum: {
      values: ['electronics', 'clothing', 'books', 'home'],
      message: '{VALUE} is not a valid category'
    }
  },

  // Email: regex pattern validation
  contactEmail: {
    type: String,
    match: [
      /^\S+@\S+\.\S+$/,
      'Please provide a valid email address'
    ]
  },

  // SKU: custom async validator (check uniqueness)
  sku: {
    type: String,
    validate: {
      validator: async function(value) {
        // Check if SKU already exists (excluding current doc)
        const count = await mongoose.model('Product')
          .countDocuments({ sku: value, _id: { $ne: this._id } });
        return count === 0;
      },
      message: 'SKU must be unique'
    }
  },

  // Stock: integer validation using custom validator
  stock: {
    type: Number,
    default: 0,
    validate: {
      validator: Number.isInteger,
      message: 'Stock must be a whole number'
    }
  }
});

Middleware (Hooks)

Middleware functions run at specific points in the document lifecycle:

Schema Middleware

const userSchema = new mongoose.Schema({
  name: String,
  email: String,
  password: String,
  slug: String,
  updatedAt: Date
});

// PRE middleware - runs BEFORE the operation
// 'save' hook runs before document.save()
userSchema.pre('save', function(next) {
  // 'this' refers to the document being saved
  // Generate slug from name
  this.slug = this.name.toLowerCase().replace(/\s+/g, '-');
  // Update timestamp
  this.updatedAt = new Date();
  // Call next() to continue to save
  next();
});

// PRE middleware for find queries
// Runs before any find operation
userSchema.pre(/^find/, function(next) {
  // 'this' is the query object
  // Automatically exclude soft-deleted documents
  this.find({ isDeleted: { $ne: true } });
  next();
});

// POST middleware - runs AFTER the operation
userSchema.post('save', function(doc, next) {
  // 'doc' is the saved document
  console.log(`User ${doc.name} was saved`);
  next();
});

// PRE middleware for remove - cleanup related data
userSchema.pre('remove', async function(next) {
  // Delete all posts by this user when user is deleted
  await mongoose.model('Post').deleteMany({ author: this._id });
  next();
});

Video Resources

MongoDB Schema Design Best Practices

MongoDB

Mongoose Population Explained

Academind

Knowledge Check

1. When should you embed documents rather than use references?

AWhen you have many-to-many relationships

BWhen related data changes frequently

CWhen data is frequently accessed together and rarely changes

DWhen documents might exceed 16MB

Correct! Embedding is ideal when related data is accessed together (reducing queries) and doesn't change often (avoiding update complexity).

Not quite. Embed when data is frequently accessed together and rarely changes independently. Reference when data changes often or has many-to-many relationships.

2. What does the populate() method do?

ACreates new documents in referenced collections

BReplaces ObjectId references with actual documents

CEmbeds documents directly into the schema

DValidates referenced documents exist

Correct! populate() performs a separate query to fetch referenced documents and replaces ObjectId fields with the full document data.

Not quite. populate() replaces ObjectId references with actual documents from the referenced collection, essentially joining data from different collections.

3. When does a 'pre save' middleware run?

ABefore the document is saved to the database

BAfter the document is saved to the database

CWhen the schema is defined

DWhen the model is created

Correct! Pre middleware runs before the specified operation. 'pre save' runs before document.save(), allowing you to modify data or perform validation.

Not quite. 'pre' middleware runs BEFORE the operation. 'pre save' executes before saving, commonly used for data transformation, hashing passwords, or generating slugs.

Practice Project

Build a Blog Data Model

Design a complete blog system with proper relationships:

Author model with name, email, bio, and virtual 'posts' field
Post model with title, content, author (ref), tags (refs), and embedded comments
Tag model with name and color
Add pre-save middleware to generate slugs from titles
Create an API endpoint that returns posts with populated author and tags

Settings