Worksheet 3: CRUD operations in MongoDB#

Welcome! In this week, we will practice some basic CRUD (Create, Replace, Update, Delete) operations in MongoDB using Pymongo.

Establish Pymongo connection#

Recall that in worksheet1, we have created a cluster on MongoDB Atlas and connect to it via Pymongo and a credentials JSON file. Let’s reestablish that connection for our exercises.

  • Make sure you use the adsc_3610 conda environment.

  • You might need to copy & paste the credentials_mongodb.json file that you used in worksheet1 to the working directory of worksheet3.

from pymongo import MongoClient # import mongo client to connect
import json # import json to load credentials
import urllib.parse

# load credentials from json file
with open('credentials_mongodb.json') as f:
    login = json.load(f)

# assign credentials to variables
username = login['username']
password = urllib.parse.quote(login['password'])
host = login['host']
url = "mongodb+srv://{}:{}@{}/?retryWrites=true&w=majority".format(username, password, host)
# connect to the database
client = MongoClient(url)
# drop database books and students if they exist
client.drop_database('bookstore')
client.drop_database('students')

To test if your connection has been succesful, let’s try to print out all the databases

# list all databases
client.list_database_names()
['library',
 'sample_airbnb',
 'sample_analytics',
 'sample_geospatial',
 'sample_guides',
 'sample_mflix',
 'sample_restaurants',
 'sample_supplies',
 'sample_training',
 'sample_weatherdata',
 'school',
 'admin',
 'local']

MongoDB VScode extension (optional)#

If you are using VScode, there is a MongoDB extension which provides a handy view of the databases (instead of having to view it on MongoDB Atlas via a browser).

To install the extension, navigate to the extension bar in VS code and search for “mongodb”

After installing the extension, you can now connect to your MongoDB Atlas cluster via the connection string.

To find your connection string, open a browser and navigate to MongoDB Atlas.

  • Under your cluster, click connect

  • Select MongoDB for VScode

  • Copy that URL and paste it in VScode

Once connected, you should now be able to see an overview of your databases. Now you can manage your MongoDB and do programming all within VSCode.

Now that we have establish the connection to our MongoDB cluster, let’s get started with the exercises:

Exercise 1: Import data into MongoDB#

{rubric: accuracy = 5}

Let’s first create a new database called school and store it in an object named db. We then create a new collection called students

# Step 1: Connect to your MongoDB cluster
client = MongoClient(url)

# Step 2: Create a new database named 'school' and store it in a variable named db
db = client.school

# Step 3: Create a collection named 'students' and insert sample documents
students = db.students

Let’s insert a sample student document to our students collection

# Sample document to insert
sample_students = [{"name": "Alice", "age": 20, "major": "Computer Science"},
            {"name": "Bob", "age": 21, "major": "Mathematics"},
            {"name": "Charlie", "age": 22, "major": "Engineering"}]

# Insert the sample document
students.insert_many(sample_students)
InsertManyResult([ObjectId('66f1c50f2a1f550705a4f5d3'), ObjectId('66f1c50f2a1f550705a4f5d4'), ObjectId('66f1c50f2a1f550705a4f5d5')], acknowledged=True)

You should see that the document has been inserted into our students collection, and it was assigned an ObjectId.

Note: Every document in MongoDB must have an ObjectId. If it doesn’t exist, then MongoDB will create a new one automatically

Now we can print out the documents that we just inserted

# Find the document that was just inserted
# The find_one() method returns the first occurrence in the selection.
students.find_one()
{'_id': ObjectId('66e8885dc48457d8d5b1262c'),
 'name': 'Alice',
 'age': 20,
 'major': 'Computer Science',
 'gpa': None}
# Find the document by a specific attribute
students.find_one({"name": "Bob"})
{'_id': ObjectId('66e8885dc48457d8d5b1262d'),
 'name': 'Bob',
 'age': 21,
 'major': 'Mathematics',
 'gpa': None}
# Find all documents in the collection
# The find() method returns all occurrences in the selection.
for student in students.find():
    print(student)
{'_id': ObjectId('66e8885dc48457d8d5b1262c'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8885dc48457d8d5b1262d'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12631'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12632'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12636'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12655'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12656'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265d'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265e'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b12662'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b7'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b8'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5bf'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c0'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c4'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d3'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science'}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d4'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics'}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d5'), 'name': 'Charlie', 'age': 22, 'major': 'Engineering'}

Now if you navigate to your databases in MongoDB Atlas, or using the VSCode MongoDB extension, you should see there’s a new database called school. Within that database, there should be a collection named students, which contains one document about Alice

# Try to find all documents where age is greater than 20
for student in students.find({"age": {"$gt": 20}}):
    print(student)
{'_id': ObjectId('66e8885dc48457d8d5b1262d'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12632'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12636'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12656'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265e'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b12662'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b8'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c0'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c4'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d4'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics'}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d5'), 'name': 'Charlie', 'age': 22, 'major': 'Engineering'}

YOUR TURN#

1.1#

  • Create a new database called bookstore, stored it in a variable called bookstore

  • Create a collection called books, stored it in a variable called books

# Create a new database called `bookstore`, stored it in a variable called `bookstore`
# YOUR CODE HERE
bookstore = client.bookstore # SOLUTION

# Create a collection called `books` 
# YOUR CODE HERE
books = bookstore.books # SOLUTION

Insert the following sample documents in sample_books into the books collection

sample_books = [
    {"title": "The Great Gatsby", "author": "F. Scott Fitzgerald", "genre": "Fiction", "price": 10.99},
    {"title": "To Kill a Mockingbird", "author": "Harper Lee", "genre": "Fiction", "price": 8.99},
    {"title": "A Brief History of Time", "author": "Stephen Hawking", "genre": "Non-Fiction", "price": 15.99}
]

# YOUR CODE HERE

# BEGIN SOLUTION
# Insert the sample_books into the books collection
books.insert_many(sample_books)
# END SOLUTION
InsertManyResult([ObjectId('66f1c50f2a1f550705a4f5d6'), ObjectId('66f1c50f2a1f550705a4f5d7'), ObjectId('66f1c50f2a1f550705a4f5d8')], acknowledged=True)

Print out a book with author as Stephen Hawking

# Print out a book with author as Stephen Hawking
# YOUR CODE HERE
books.find_one({"author": "Stephen Hawking"}) # SOLUTION
{'_id': ObjectId('66f1c50f2a1f550705a4f5d8'),
 'title': 'A Brief History of Time',
 'author': 'Stephen Hawking',
 'genre': 'Non-Fiction',
 'price': 15.99}
# Print out all books with price less than $10
# YOUR CODE HERE
for book in books.find({"price": {"$lt": 10}}): # SOLUTION
    print(book) # SOLUTION
{'_id': ObjectId('66f1c50f2a1f550705a4f5d7'), 'title': 'To Kill a Mockingbird', 'author': 'Harper Lee', 'genre': 'Fiction', 'price': 8.99}

Run the test below to see if you have done it correctly

# assert bookstore and books exist
assert 'bookstore' in client.list_database_names(), "bookstore database does not exist"
assert 'books' in bookstore.list_collection_names(), "books collection does not exist"
# assert sample_books were inserted
assert books.find_one({"author": "Stephen Hawking"}) is not None, "sample_books were not inserted"
# assert "title": "To Kill a Mockingbird" is in the collection
assert books.find_one({"title": "To Kill a Mockingbird"}) is not None,  "To Kill a Mockingbird is not in the collection"

1.2 Import from JSON#

Imagine we have a dataset called students.json and we would like to import it into the students collection in our MongoDB database.

import json

# Step 1: Load JSON data
with open('students.json', 'r') as file:
    data = json.load(file)
# view what the data looks like (first 5 records)
data[:5]
[{'name': 'Alice Johnson',
  'age': 22,
  'major': 'Computer Science',
  'email': 'alice.johnson@example.com'},
 {'name': 'Bob Smith',
  'age': 24,
  'major': 'Mathematics',
  'email': 'bob.smith@example.com'},
 {'name': 'Charlie Brown',
  'age': 20,
  'major': 'Physics',
  'email': 'charlie.brown@example.com'},
 {'name': 'Diana Prince',
  'age': 21,
  'major': 'History',
  'email': 'diana.prince@example.com'},
 {'name': 'Eve Adams',
  'age': 23,
  'major': 'Biology',
  'email': 'eve.adams@example.com'}]
# Step 2: Insert JSON data into a collection
students.insert_many(data)  # For a list of documents
InsertManyResult([ObjectId('66f1c5102a1f550705a4f5d9'), ObjectId('66f1c5102a1f550705a4f5da'), ObjectId('66f1c5102a1f550705a4f5db'), ObjectId('66f1c5102a1f550705a4f5dc'), ObjectId('66f1c5102a1f550705a4f5dd'), ObjectId('66f1c5102a1f550705a4f5de'), ObjectId('66f1c5102a1f550705a4f5df'), ObjectId('66f1c5102a1f550705a4f5e0'), ObjectId('66f1c5102a1f550705a4f5e1'), ObjectId('66f1c5102a1f550705a4f5e2')], acknowledged=True)
# Find all documents in the collection
# The find() method returns all occurrences in the selection.
for student in students.find():
    print(student)
{'_id': ObjectId('66e8885dc48457d8d5b1262c'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8885dc48457d8d5b1262d'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12631'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12632'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12636'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12655'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12656'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265d'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265e'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b12662'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b7'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b8'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5bf'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c0'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c4'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d3'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science'}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d4'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics'}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d5'), 'name': 'Charlie', 'age': 22, 'major': 'Engineering'}
{'_id': ObjectId('66f1c5102a1f550705a4f5d9'), 'name': 'Alice Johnson', 'age': 22, 'major': 'Computer Science', 'email': 'alice.johnson@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5da'), 'name': 'Bob Smith', 'age': 24, 'major': 'Mathematics', 'email': 'bob.smith@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5db'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5dc'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5dd'), 'name': 'Eve Adams', 'age': 23, 'major': 'Biology', 'email': 'eve.adams@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5de'), 'name': 'Frank Wright', 'age': 22, 'major': 'Chemistry', 'email': 'frank.wright@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5df'), 'name': 'Grace Hopper', 'age': 24, 'major': 'Computer Engineering', 'email': 'grace.hopper@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5e0'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5e1'), 'name': 'Ivy League', 'age': 23, 'major': 'Literature', 'email': 'ivy.league@example.com'}
{'_id': ObjectId('66f1c5102a1f550705a4f5e2'), 'name': 'Jack Daniels', 'age': 22, 'major': 'Business', 'email': 'jack.daniels@example.com'}

Now it’s your turn, import the books.json into the books collection in the bookstore database in your MongoDB.

# BEGIN SOLUTION
with open('books.json', 'r') as file:
    data = json.load(file)

books.insert_many(data)  # For a list of documents

# END SOLUTION
InsertManyResult([ObjectId('66f1c5102a1f550705a4f5e3'), ObjectId('66f1c5102a1f550705a4f5e4'), ObjectId('66f1c5102a1f550705a4f5e5'), ObjectId('66f1c5102a1f550705a4f5e6'), ObjectId('66f1c5102a1f550705a4f5e7'), ObjectId('66f1c5102a1f550705a4f5e8'), ObjectId('66f1c5102a1f550705a4f5e9'), ObjectId('66f1c5102a1f550705a4f5ea'), ObjectId('66f1c5102a1f550705a4f5eb'), ObjectId('66f1c5102a1f550705a4f5ec')], acknowledged=True)
assert books.find_one({"title": "Pride and Prejudice"}) is not None, "Pride and Prejudice is not in the collection"
assert books.find_one({"isbn": "978-0-06-112008-4"}) is not None, "ISBN 978-0-06-112008-4 is not in the collection"

Exercise 2: Insert & update new field#

Let’s say we want to add a field called gpa to the students collection. Here’s how

# Step 1: Define the new field and its value
new_field = {"gpa": None}

# Step 2: Update all documents to include the new field
students.update_many({}, {"$set": new_field})
UpdateResult({'n': 28, 'electionId': ObjectId('7fffffff00000000000000cb'), 'opTime': {'ts': Timestamp(1727120656, 42), 't': 203}, 'nModified': 13, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1727120656, 42), 'signature': {'hash': b'=\x98\x93T\x06\xbf\xb3|6^H\xbf\x1c\xf6\x97w\x0c\x17k\xa8', 'keyId': 7381891571605569538}}, 'operationTime': Timestamp(1727120656, 42), 'updatedExisting': True}, acknowledged=True)
# check if the new field was added
for student in students.find():
    print(student)
{'_id': ObjectId('66e8885dc48457d8d5b1262c'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8885dc48457d8d5b1262d'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12631'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12632'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12636'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12655'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12656'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265d'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265e'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b12662'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b7'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b8'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5bf'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c0'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c4'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d3'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d4'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d5'), 'name': 'Charlie', 'age': 22, 'major': 'Engineering', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5d9'), 'name': 'Alice Johnson', 'age': 22, 'major': 'Computer Science', 'email': 'alice.johnson@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5da'), 'name': 'Bob Smith', 'age': 24, 'major': 'Mathematics', 'email': 'bob.smith@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5db'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5dc'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5dd'), 'name': 'Eve Adams', 'age': 23, 'major': 'Biology', 'email': 'eve.adams@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5de'), 'name': 'Frank Wright', 'age': 22, 'major': 'Chemistry', 'email': 'frank.wright@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5df'), 'name': 'Grace Hopper', 'age': 24, 'major': 'Computer Engineering', 'email': 'grace.hopper@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5e0'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5e1'), 'name': 'Ivy League', 'age': 23, 'major': 'Literature', 'email': 'ivy.league@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5e2'), 'name': 'Jack Daniels', 'age': 22, 'major': 'Business', 'email': 'jack.daniels@example.com', 'gpa': None}

Let’s try to update the gpa of Jack Daniels to 3.4

# Step 2: Define the filter and the new value
filter = {"name": "Jack Daniels"}
new_value = {"$set": {"gpa": 3.4}}

# Step 3: Update the specific document
students.update_one(filter, new_value)
UpdateResult({'n': 1, 'electionId': ObjectId('7fffffff00000000000000cb'), 'opTime': {'ts': Timestamp(1727120656, 44), 't': 203}, 'nModified': 1, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1727120656, 44), 'signature': {'hash': b'=\x98\x93T\x06\xbf\xb3|6^H\xbf\x1c\xf6\x97w\x0c\x17k\xa8', 'keyId': 7381891571605569538}}, 'operationTime': Timestamp(1727120656, 44), 'updatedExisting': True}, acknowledged=True)
# check Jack Daniels GPA
students.find_one({"name": "Jack Daniels"})
{'_id': ObjectId('66f1c5102a1f550705a4f5e2'),
 'name': 'Jack Daniels',
 'age': 22,
 'major': 'Business',
 'email': 'jack.daniels@example.com',
 'gpa': 3.4}

Now it’s your turn.

  • Insert a field called "publisher" into the books collection with a default value of "Unknown".

  • Update the publisher of the book with the author name is “Ray Bradbury”, to “Penguin Books”

# BEGIN SOLUTION
books.update_many({}, {"$set": {"publisher": "Unknown"}})
books.update_one({"author": "Ray Bradbury"}, {"$set": {"publisher": "Penguin Books"}})
# END SOLUTION
UpdateResult({'n': 1, 'electionId': ObjectId('7fffffff00000000000000cb'), 'opTime': {'ts': Timestamp(1727120656, 58), 't': 203}, 'nModified': 1, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1727120656, 58), 'signature': {'hash': b'=\x98\x93T\x06\xbf\xb3|6^H\xbf\x1c\xf6\x97w\x0c\x17k\xa8', 'keyId': 7381891571605569538}}, 'operationTime': Timestamp(1727120656, 58), 'updatedExisting': True}, acknowledged=True)
# print out the document by Ray Bradbury
# check if the publisher field was added
assert books.find_one({"author": "Ray Bradbury"})["publisher"] == "Penguin Books", "publisher field was not added"

Exercise 3: Replace#

Let’s print out our student Jack Daniels

students.find_one({"name": "Jack Daniels"})
{'_id': ObjectId('66f1c5102a1f550705a4f5e2'),
 'name': 'Jack Daniels',
 'age': 22,
 'major': 'Business',
 'email': 'jack.daniels@example.com',
 'gpa': 3.4}

Let’s say we want to replace the entire document by a new one. Let’s called this new_student

new_student = {
    "name": "John Doe", 
    "age": 25, 
    "major": "History", 
    "email": "john_doe@gmail.com",
    "gpa": 3.1
}

students.replace_one({"name": "Jack Daniels"}, new_student)

# check if Jack Daniels was replaced
students.find_one({"name": "Jack Daniels"})
students.find_one({"name": "John Doe"})
{'_id': ObjectId('66f1c5102a1f550705a4f5e2'),
 'name': 'John Doe',
 'age': 25,
 'major': 'History',
 'email': 'john_doe@gmail.com',
 'gpa': 3.1}

Now it’s your turn.

Replace the book by Stephen Hawking with the new_book

new_book = {
    "title": "The Universe in a Nutshell",
    "author": "Stephen Hawking",
    "genre": "Non-Fiction",
    "price": 12.99,
    "publisher": "Bantam Books"
}

# YOUR CODE HERE
books.replace_one({"author": "Stephen Hawking"}, new_book) # SOLUTION
UpdateResult({'n': 1, 'electionId': ObjectId('7fffffff00000000000000cb'), 'opTime': {'ts': Timestamp(1727120657, 2), 't': 203}, 'nModified': 1, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1727120657, 2), 'signature': {'hash': b'[\x84\x00\xdc\xb3\x1e\xd6\x8b!?\x01\x08\xdfQ\xe9\x195\xdd\x16\xad', 'keyId': 7381891571605569538}}, 'operationTime': Timestamp(1727120657, 2), 'updatedExisting': True}, acknowledged=True)
# print out the document by Stephen Hawking
books.find_one({"author": "Stephen Hawking"}) # SOLUTION
{'_id': ObjectId('66f1c50f2a1f550705a4f5d8'),
 'title': 'The Universe in a Nutshell',
 'author': 'Stephen Hawking',
 'genre': 'Non-Fiction',
 'price': 12.99,
 'publisher': 'Bantam Books'}
assert books.find_one({"title": "The Universe in a Nutshell"}) is not None, "The Universe in a Nutshell is not in the collection"

Exercise 4: Delete#

Let’s say I want to delete all documents in the students collection where their age is above 21

# Step 1: Define the filter to find the documents to be deleted
filter = {"age": {"$gt": 21}}

# Step 2: Delete the documents
result = students.delete_many(filter)

# Step 3: Print the number of documents deleted
print(result.deleted_count, " documents deleted.")

# check if the documents were deleted
for student in students.find():
    print(student)
8  documents deleted.
{'_id': ObjectId('66e8885dc48457d8d5b1262c'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8885dc48457d8d5b1262d'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12631'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12632'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e88869c48457d8d5b12636'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12655'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66e8970fc48457d8d5b12656'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265d'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b1265e'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66e89710c48457d8d5b12662'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b7'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66f1c4df2a1f550705a4f5b8'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5bf'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c0'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c4e02a1f550705a4f5c4'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d3'), 'name': 'Alice', 'age': 20, 'major': 'Computer Science', 'gpa': None}
{'_id': ObjectId('66f1c50f2a1f550705a4f5d4'), 'name': 'Bob', 'age': 21, 'major': 'Mathematics', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5db'), 'name': 'Charlie Brown', 'age': 20, 'major': 'Physics', 'email': 'charlie.brown@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5dc'), 'name': 'Diana Prince', 'age': 21, 'major': 'History', 'email': 'diana.prince@example.com', 'gpa': None}
{'_id': ObjectId('66f1c5102a1f550705a4f5e0'), 'name': 'Hank Pym', 'age': 21, 'major': 'Biochemistry', 'email': 'hank.pym@example.com', 'gpa': None}

Now it’s your turn.

Delete all the documents in the books collection where the title start with letter “T”

Hint: You can use regex expression in the filter. For example, if I want all students start with letter “M”, I would use

filter = {"name": {"$regex": "^M"}}

# Delete all the documents in the `books` collection where the `title` start with letter "T"
# YOUR CODE HERE
# BEGIN SOLUTION
filter = {"title": {"$regex": "^T"}} 
result = books.delete_many(filter) 
# END SOLUTION
# Print the number of documents deleted
print(result.deleted_count, " documents deleted.") # SOLUTION
7  documents deleted.
# check if the documents were deleted
# find all books that start with letter "T"
for book in books.find({"title": {"$regex": "^T"}}): # SOLUTION
    print(book) # SOLUTION
# print all the documents in the collection
for book in books.find():   # SOLUTION
    print(book) # SOLUTION
{'_id': ObjectId('66f1c5102a1f550705a4f5e4'), 'title': '1984', 'author': 'George Orwell', 'published_year': 1949, 'isbn': '978-0-452-28423-4', 'price': 8.99, 'publisher': 'Unknown'}
{'_id': ObjectId('66f1c5102a1f550705a4f5e5'), 'title': 'Pride and Prejudice', 'author': 'Jane Austen', 'published_year': 1813, 'isbn': '978-0-19-953556-9', 'price': 12.99, 'publisher': 'Unknown'}
{'_id': ObjectId('66f1c5102a1f550705a4f5e7'), 'title': 'Moby-Dick', 'author': 'Herman Melville', 'published_year': 1851, 'isbn': '978-0-14-243724-7', 'price': 11.99, 'publisher': 'Unknown'}
{'_id': ObjectId('66f1c5102a1f550705a4f5e8'), 'title': 'War and Peace', 'author': 'Leo Tolstoy', 'published_year': 1869, 'isbn': '978-0-14-044793-4', 'price': 14.99, 'publisher': 'Unknown'}
{'_id': ObjectId('66f1c5102a1f550705a4f5eb'), 'title': 'Fahrenheit 451', 'author': 'Ray Bradbury', 'published_year': 1953, 'isbn': '978-0-7432-4722-2', 'price': 9.49, 'publisher': 'Penguin Books'}
{'_id': ObjectId('66f1c5102a1f550705a4f5ec'), 'title': 'Jane Eyre', 'author': 'Charlotte Brontë', 'published_year': 1847, 'isbn': '978-0-14-243720-9', 'price': 8.49, 'publisher': 'Unknown'}
# assert there are no documents that start with letter "T"
assert books.find_one({"title": {"$regex": "^T"}}) is None, "There are documents that start with letter T"

Exercise 5: Visualize#

Let’s say that we are retriving students data and we want to convert the results to pandas DataFrame

import pandas as pd
# Step 2: Query all documents in the collection
cursor = students.find()

# Step 3: Convert the cursor to a list and then to a DataFrame
students_data = list(cursor)
df = pd.DataFrame(students_data)

# Print the DataFrame to verify
df.head()
_id name age major gpa email
0 66e8885dc48457d8d5b1262c Alice 20 Computer Science None NaN
1 66e8885dc48457d8d5b1262d Bob 21 Mathematics None NaN
2 66e88869c48457d8d5b12631 Charlie Brown 20 Physics None charlie.brown@example.com
3 66e88869c48457d8d5b12632 Diana Prince 21 History None diana.prince@example.com
4 66e88869c48457d8d5b12636 Hank Pym 21 Biochemistry None hank.pym@example.com

Now let’s create a simple bar chart of students’ age

import matplotlib.pyplot as plt
# create a histogram of students age
df['age'].plot(kind='hist', title='Age Distribution of Students')
<Axes: title={'center': 'Age Distribution of Students'}, ylabel='Frequency'>
../../../_images/bdea02369d47d0263df2f8ee63e8c644e0e27224ce21ab23c8930c7711488986.png

Now it’s your turn

  • Convert the books collection into a pd.Dataframe called df_books

  • Create a histogram of books’ prices

# Convert the `books` collection into a pd.Dataframe called `df_books`
# YOUR CODE HERE
# BEGIN SOLUTION
cursor = books.find()
books_data = list(cursor)
df_books = pd.DataFrame(books_data)
df_books
# END SOLUTION
_id title author published_year isbn price publisher
0 66f1c5102a1f550705a4f5e4 1984 George Orwell 1949 978-0-452-28423-4 8.99 Unknown
1 66f1c5102a1f550705a4f5e5 Pride and Prejudice Jane Austen 1813 978-0-19-953556-9 12.99 Unknown
2 66f1c5102a1f550705a4f5e7 Moby-Dick Herman Melville 1851 978-0-14-243724-7 11.99 Unknown
3 66f1c5102a1f550705a4f5e8 War and Peace Leo Tolstoy 1869 978-0-14-044793-4 14.99 Unknown
4 66f1c5102a1f550705a4f5eb Fahrenheit 451 Ray Bradbury 1953 978-0-7432-4722-2 9.49 Penguin Books
5 66f1c5102a1f550705a4f5ec Jane Eyre Charlotte Brontë 1847 978-0-14-243720-9 8.49 Unknown
# visualize the distribution of book prices
df_books['price'].plot(kind='hist', title='Price Distribution of Books') # SOLUTION
<Axes: title={'center': 'Price Distribution of Books'}, ylabel='Frequency'>
../../../_images/e8987763d432ab839faaff00c04ad957aeb9cec805dcbc82ed43365ecfadeffc.png
# assert the dataframe was created
assert df_books is not None, "df_books does not exist"
# assert the dataframe has the correct columns
assert "title" in df_books.columns, "title column is missing"
assert "author" in df_books.columns, "author column is missing"
assert "price" in df_books.columns, "price column is missing"

Submission instructions#

{rubric: mechanics = 5}

  • Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.

  • Commit and push your notebook to the github repo

  • Double check your notebook is rendered properly on Github and you can see all the outputs clearly