Lecture 2: Introduction to MongoDB and the document model#

Learning objectives#

By the end of this lecture, students should be able to:

  • Explain the data structure of a document model and its key features

  • Explain how data are stored in BSON format

  • Define a collection in a document model

Slides#

Note

Download a PDF version here

Supplemental materials#

MongoDB stores data in a document-oriented model, which is quite different from the table-based model of traditional SQL databases.

Data Storage in MongoDB#

  1. Documents and Collections:

    • Documents: In MongoDB, data is stored in documents, which are similar to JSON objects. Each document consists of key-value pairs, where the keys are strings, and the values can be various data types, including strings, numbers, arrays, and even nested documents.

    • Collections: Documents are grouped into collections, which are analogous to tables in SQL databases. However, unlike SQL tables, collections do not enforce a fixed schema, allowing each document to have a different structure.

    Example:

    {
       "_id": ObjectId("507f1f77bcf86cd799439011"),
       "name": "John Doe",
       "age": 29,
       "address": {
          "street": "123 Main St",
          "city": "New York",
          "zip": "10001"
       },
       "hobbies": ["reading", "travelling"]
    }
    

    In this document, the address field is a nested document, and the hobbies field is an array.

  2. Flexibility and Schema:

    • MongoDB does not require a predefined schema for documents within a collection. This allows different documents in the same collection to have different fields, which provides flexibility in handling diverse data.

    • SQL: In contrast, SQL databases require a rigid schema where each row in a table must conform to the predefined structure, with all rows having the same columns.

    Example (Different documents in the same collection):

    {
       "_id": ObjectId("507f191e810c19729de860ea"),
       "name": "Jane Smith",
       "email": "jane.smith@example.com"
    }
    
  3. Data Types and Relationships:

    • Embedded Documents: MongoDB allows embedding documents within other documents, which is a way to represent one-to-many relationships within a single document.

    • SQL: In SQL, relationships between tables (e.g., one-to-many) are typically handled using foreign keys and JOIN operations.

    Example (One-to-many relationship in MongoDB):

    {
       "_id": ObjectId("5f1f77bcf86cd799439011a"),
       "name": "Alice",
       "orders": [
          {
             "order_id": 1,
             "product": "Laptop",
             "price": 1200
          },
          {
             "order_id": 2,
             "product": "Phone",
             "price": 800
          }
       ]
    }
    

    Comparison:

    • In MongoDB, related data (e.g., orders) can be embedded within a document, avoiding the need for JOINs and making data retrieval faster in many cases.

    • In SQL, this would typically require two tables (Customers and Orders) and a JOIN operation to retrieve related data.

JSON (JavaScript Object Notation)#

  • Text-Based Format: JSON is a lightweight, text-based data interchange format. It’s human-readable and easy to understand, making it ideal for data exchange in web applications.

  • Data Types: JSON supports a limited set of data types, including:

    • Strings

    • Numbers

    • Booleans (true/false)

    • Arrays

    • Objects (key-value pairs)

    • Null

  • Example:

    {
       "name": "John Doe",
       "age": 29,
       "isStudent": false,
       "courses": ["Math", "Science"],
       "address": {
          "street": "123 Main St",
          "city": "New York"
       }
    }
    
  • Usage: JSON is commonly used for data exchange between a client and a server, particularly in web APIs and configurations. It’s supported by virtually all programming languages.

BSON (Binary JSON)#

  • Binary Format: BSON is a binary-encoded serialization of JSON-like documents. While it retains the JSON data structure, BSON is designed to be more efficient for storage and speed in databases like MongoDB.

  • Extended Data Types: BSON supports all the JSON data types but also includes additional data types that JSON does not natively support, such as:

    • ObjectId: A unique identifier used by MongoDB.

    • Date: Represents dates and times.

    • Binary Data: Allows storage of binary data, such as images and files.

    • 32-bit/64-bit integers: Specific integer types for efficient storage.

    • Min/Max Keys: Special types for internal MongoDB usage to define a minimum or maximum value.

  • Example: In BSON, the JSON example would be stored in a more compact binary form, but let’s look at how an ObjectId and a Date might be represented:

    {
       "_id": ObjectId("507f1f77bcf86cd799439011"),
       "name": "John Doe",
       "age": 29,
       "enrollmentDate": ISODate("2024-09-02T00:00:00Z"),
       "isStudent": false,
       "courses": ["Math", "Science"],
       "address": {
          "street": "123 Main St",
          "city": "New York"
       }
    }
    
    • ObjectId: The _id field is an ObjectId, a unique identifier used by MongoDB.

    • Date: The enrollmentDate field is an ISODate, which stores the date in a more precise and efficient format.

  • Usage in MongoDB: BSON is used by MongoDB for internal storage because it’s more efficient to parse and manipulate at the database level than JSON. It allows MongoDB to support richer data types and enables faster read/write operations, especially with large datasets.

Key differences between JSON & BSON#

Feature

JSON

BSON

Example

Data Format

Text-based, human-readable format.

Binary-encoded format, efficient for storage and transport.

-

Supported Data Types

Limited to basic types: string, number, object, array, boolean, null.

Supports all JSON types and additional types like ObjectId, Date, Binary, 32-bit/64-bit integers, Min/Max Keys.

-

ObjectId

Not natively supported.

Supported as a 12-byte unique identifier.

JSON: N/A
BSON: ObjectId("64f68a4f8a0b8b7f5434e001")

Date

Not natively supported, usually represented as a string.

Native support for dates with millisecond precision.

JSON: "1990-05-20T00:00:00Z"
BSON: ISODate("1990-05-20T00:00:00Z")

Binary Data

Not supported, requires encoding (e.g., Base64).

Native support for binary data, stored as BinData.

JSON: "Qk0oAQAAAAAAADYAAAAoAAAAEAAA..." (Base64-encoded)
BSON: BinData(0, "Qk0oAQAAAAAAADYAAAAoAAAAEAAA...")

Integers (32-bit/64-bit)

Only supports number (treated as a floating-point).

Supports both 32-bit and 64-bit integers for efficient storage.

JSON: 1234567890
BSON: NumberLong(1234567890) (64-bit)

Min/Max Keys

Not supported.

Special types for defining minimum and maximum values, used internally by MongoDB.

JSON: N/A
BSON: MinKey() / MaxKey()

Data Size

Larger, since it’s text-based.

More compact due to binary encoding, especially for complex or large documents.

-

Readability

Human-readable, easy to read and write.

Not human-readable, designed for efficient machine processing.

-

Summary

BSON is specifically designed for use in databases like MongoDB, where performance and storage efficiency are critical. JSON, while more universally readable and widely used in web applications, lacks the extended data types and binary efficiency of BSON. MongoDB’s use of BSON allows it to efficiently store and query complex data structures that go beyond what JSON can offer.