ADSC 3610 - Database Systems in Applied Data Science II - Syllabus#
This course will introduce students to the fundamentals of non-relational databases, with a focus on MongoDB as the main platform. We will cover data modeling, CRUD operations, and advanced querying techniques using MongoDB and its integration in Python via Pymongo. The course then transitions to PySpark, covering distributed data processing, DataFrames, Spark SQL, and machine learning applications. In the final section, the focus shifts to data governance, ethics, and security, addressing key topics such as data quality management, privacy, legal compliance, and the ethical implications of AI.
Learning objectives#
Understand the fundamental concepts of non-relational databases and their differences from relational databases.
Gain proficiency in using MongoDB for data storage, retrieval, and manipulation.
Learn to design and implement efficient data models in MongoDB.
Master the basics of PySpark for big data processing and analytics.
Perform data wrangling and transformation tasks using PySpark.
Understand the principles of distributed computing and how they apply to PySpark.
Grasp the key principles of data governance, compliance, and privacy, ensuring responsible and secure data management.
Explore and address the ethical implications of AI and machine learning
Schedule & office hours#
Lectures every Tue, Thu, Fri from 9:30 - 10:20 AM in the computer lab OM 1241 at Old Main building.
Office hour: Thu 11:30 AM-12:30 PM at OM 1232. Please book your timeslot here
Communication#
For any course-related questions, such as lectures, assignments, exames, course logistics, please ask them under the discussion forum in Moodle.
For any individual-related questions, such as academic concession, deadline extension, personal circumstances, etc., please email me at lnguyen[at]tru[dot]ca
Response time: I will try our best to reply to your inquiries as soon as possible during the normal working hours (9AM-5PM Mon-Fri). If you send me a message outside of regular working hours, please expect a response on the next working day.
Lectures#
Please find the list of lecture, and the required readings below. This list will be updated frequently so please revisit to get the latest version.
Week no. |
Week day |
Date |
Topic |
---|---|---|---|
1 |
Tue |
Sep 3 |
Introduction to noSQL database, READINGS |
1 |
Thu |
Sep 5 |
Introduction to MongoDB and the document model, READINGS |
1 |
Fri |
Sep 6 |
Setup MongoDB Atlas & connecting to a MongoDB database, READINGS |
2 |
Tue |
Sep 10 |
Data modelling 1: types of data relationships,embedding, referencing MongoDB rule READINGS |
2 |
Thu |
Sep 12 |
Data modelling 2: Schema design pattern, READINGS |
3 |
Tue |
Sep 17 |
MongoDB CRUD operations: Insert,Find, Replace and Delete |
3 |
Thu |
Sep 19 |
MogoDB CRUD operations: Modifying query results, MongoDB Aggregation in Python |
4 |
Tue |
Sep 24 |
MongoDB ACID transactions |
4 |
Thu |
Sep 26 |
MongoDB Atlas search, Atlas Vector Search for LLMs |
5 |
Tue |
Oct 1 |
Part 1: MongoDB review |
5 |
Thu |
Oct 3 |
MIDTERM 1: Cover materials from week 1 - 4 |
6 |
Tue |
Oct 8 |
Introductino to PySpark ,Working with RDDs |
6 |
Thu |
Oct 10 |
Introductin to DataFrames in Pyspark |
7 |
Tue |
Oct 15 |
Basic dataframe operations |
7 |
Thu |
Oct 17 |
Spark SQL |
8 |
Tue |
Oct 22 |
Dataframe optimization techniques , Working with structured streaming |
8 |
Thu |
Oct 24 |
Spark streaming continued |
9 |
Tue |
Oct 29 |
MLflow |
9 |
Thu |
Oct 31 |
Course recap |
10 |
Tue |
Nov 5 |
MIDTERM 2: Cover materials from week 5 - 9 |
10 |
Thu |
Nov 7 |
MID-TERM BREAK |
11 |
Tue |
Nov 12 |
Data privacy overview management |
11 |
Thu |
Nov 14 |
Data privacy , Data security fundamentals |
12 |
Tue |
Nov 19 |
Bias and fairness in machine learning |
12 |
Thu |
Nov 21 |
Mitigate bias and fairness in machine learning |
13 |
Tue |
Nov 26 |
AI and ethics |
13 |
Thu |
Nov 28 |
Course conclusion |
Assessment overview#
You are responsible for the following deliverables, which will determine your course grade:
Assignment |
Note |
Weight |
Date |
---|---|---|---|
Weekly worksheet x 10 |
Open-book. Release on Mondays |
20% |
Sundays 11:59PM |
Midterm 1 |
Closed-book, Moodle timed-quiz, 45 minutes |
25% |
Thu, Oct 3, 9:30AM |
Midterm 2 |
Closed-book, Moodle timed-quiz, 90 minutes |
25% |
Nov 5th, 09:30 AM |
Exam |
Closed-book, Moodle timed-quiz, 90 minutes |
30% |
Dec 04, 02:00 PM |
Weekly assignments#
Every week, you will have a worksheet that is worth 2%. These low-stakes assignments consist of multiple choice questions and small exercises that help you consolidate your understanding of the materials and serve as a formative assessment.
The worksheet will be distributed via Github classroom every Monday (except for midterms’ weeks) and the deadline for each worksheet is the same Sunday of that week at 11:59 PM.
Mid-terms & exam#
There will be two midterms and one final exam in this course.
The midterms are closed-book format, and they will take place on Moodle for the duration of 45 minutes. It will consist of a mix of multiple choice, fill in the blank, and open-ended questions.
Midterm 1 will cover the materials on MongoDB Midterm 2 will be an individual project cover the materials on Spark
The final exam is closed-book format, taking place on Moodle for the duration of 90 minutes. It will consist of a mix of multiple choice, fill in the blank, and open-ended questions.
The final exam will cover all the materials in the course.
Attendance, late assignments, academic concessions, academic accomodation#
Attendance#
A registered student who does not attend the first two events (e.g., lectures/labs/ etc.) of their course(s) and who has not made prior arrangements acceptable to the instructor(s) may, at the discretion of the instructor(s), be considered to have withdrawn from the course(s) and have their course registration(s) deleted.
Please refer to TRU’s attendance policy. In addition, we will take attendance during class via Moodle’s QR code.
In the CS department, you need to get at least 75% attendance for passing any course.
Academic concessions#
If you encounter situations that may impede your ability to meet course requirements—such as illness, family emergencies, or other significant life events—please notify the instructor at least 24 hours before deadline. Academic concessions, including extensions or alternative assessments, will be considered on a case-by-case basis. You may be required to provide documentation to support your request. Concession requests after the deadline has passed will likely be refused.
Late Assignments#
Assignments are expected to be submitted on time. Late submissions will incur a penalty of 25% per day, up to a maximum of 75%. After 3 days, late assignments will no longer be accepted and will receive a grade of zero. Extensions may be granted in exceptional circumstances, provided that you contact the instructor before the deadline.
Accessbility#
Students registered with the Accessibility Services who require accommodations must provide their Letter of Accommodation to the instructor as soon as possible. This letter will outline the necessary accommodations to ensure an equitable learning environment. Please ensure that this is done early in the term to facilitate timely arrangements.
Policy on the use of generative AI#
Please refer to TRU’s guideline on the use of generative AI tools such as chatGPT or Copilot in this course.
https://libguides.tru.ca/artificialintelligence