ADSC 3910 - Applied Data Science Integrated Practice 2#
This course provides students with the opportunity to apply and practice their data science skills by developing an end-to-end data science project. The course follows a project-based format, consisting of multiple milestones designed to guide students in creating a comprehensive data science pipeline. We will focus on the application of non-relational database concepts, leveraging MongoDB for data modeling and management. The course will also emphasize the use of distributed data processing with PySpark, and the application of vector search and Retrieval-Augmented Generation (RAG) pipelines for advanced data analysis.
Learning objectives#
By the end of this course, students are expected be able to:
Identify an interesting data science question for which data are available or obtainable.
Define the scope of a possible solution, create a proposal and a timeline with key milestones.
Create a fully reproducible data science pipeline, packaged in a Github repository.
Utilize MongoDB as a database system.
Utilize PySpark in processing and analyzing the data.
Implement and evaluate vector search techniques and Retrieval-Augmented Generation (RAG) pipelines.
Work collaboratively as a team to create, document, and present (using written, oral, and visual means) the process and results from a solution to a data science problem.
Schedule & office hours#
Tuesday & Thursday 11:00 - 12:15 OM 1325
Office hour: Tuesday 1-2 PM on Microsoft Teams. Join via this link
Communication#
For any course-related questions, such as lectures, assignments, exames, course logistics, please ask them under the discussion forum in Moodle.
For any individual-related questions, such as academic concession, deadline extension, personal circumstances, etc., please email me at lnguyen[at]tru[dot]ca
Response time: I will try our best to reply to your inquiries as soon as possible during the normal working hours (9AM-5PM Mon-Fri). If you send me a message outside of regular working hours, please expect a response on the next working day.
Lectures#
Week no. |
Week day |
Date |
Topic |
Readings |
|---|---|---|---|---|
1 |
Thu |
Sep 4 |
Course introduction & teamwork contract |
|
2 |
Tue |
Sep 9 |
Lecture 1: Intro to Github for data science reproducibility |
|
2 |
Thu |
Sep 11 |
Lecture 2: Intro to Github copilot |
ββ- |
3 |
Tue |
Sep 16 |
Teamwork time |
ββ- |
3 |
Thu |
Sep 18 |
Milestone 1 presentations |
ββ- |
4 |
Tue |
Sep 23 |
Lecture 3: Managing virtual environment |
|
4 |
Thu |
Sep 25 |
Lecture 4: Project management with Github copilot |
ββ- |
5 |
Tue |
Sep 30 |
BREAK: Truth and conciliation |
ββ- |
5 |
Thu |
Oct 2 |
Lecture 5: How to deliver a technical presentation |
ββ- |
6 |
Tue |
Oct 7 |
Teamwork time |
ββ- |
6 |
Thu |
Oct 9 |
Milestone 2 presentations |
ββ- |
7 |
Tue |
Oct 14 |
ββ- |
|
7 |
Thu |
Oct 16 |
ββ- |
|
8 |
Tue |
Oct 21 |
ββ- |
|
8 |
Thu |
Oct 23 |
ββ- |
|
9 |
Tue |
Oct 28 |
Team work time |
ββ- |
9 |
Thu |
Oct 30 |
Milestone 3 presentations |
ββ- |
10 |
Tue |
Nov 4 |
ββ- |
|
10 |
Thu |
Nov 6 |
ββ- |
|
11 |
Tue |
Nov 11 |
BREAK: Rememberance day |
ββ- |
11 |
Thu |
Nov 13 |
Milestones 4 presentations |
ββ- |
12 |
Tue |
Nov 18 |
ββ- |
|
12 |
Thu |
Nov 20 |
ββ- |
|
13 |
Tue |
Nov 25 |
Team work time |
ββ- |
13 |
Thu |
Nov 27 |
Milestone 5 presentations |
ββ- |
Assessment overview#
Assessment component |
Weight |
Deadline |
|---|---|---|
Teamwork contract |
5% |
Sep 9, 2025, 11:59 PM |
Milestone 1: Dataset selection & project proposal |
10% |
Sep 18, 2025, 11:59 PM |
Peer review of Milestone 1 (completed individually) |
5% |
Sep 29, 2025, 11:59 PM |
Milestone 2: Database schema & EDA results |
10% |
Oct 09, 2025, 11:59 PM |
Peer review of Milestone 2 (completed individually) |
5% |
Oct 09, 2025, 11:59 PM |
Milestone 3: Vector embeddings implementation |
10% |
Oct 30, 2025, 11:59 PM |
Peer review of Milestone 3 (completed individually) |
5% |
Oct 30, 2025, 11:59 PM |
Milestone 4: RAG retrieval implementation |
10% |
Nov 13, 2025, 11:59 PM |
Peer review of Milestone 4 (completed individually) |
5% |
Nov 20, 2025, 11:59 PM |
Milestone 5: Final data product (presentation) |
15% |
Nov 27, 2025, 11:59 PM |
Final report (completed individually) |
15% |
Dec 10, 2025, 11:59 PM |
Teamwork reflection |
5% |
Dec 10, 2025, 11:59 PM |
Attendance, late assignments, academic concessions, academic accomodation#
Attendance#
A registered student who does not attend the first two events (e.g., lectures/labs/ etc.) of their course(s) and who has not made prior arrangements acceptable to the instructor(s) may, at the discretion of the instructor(s), be considered to have withdrawn from the course(s) and have their course registration(s) deleted.
Arriving more than 5 minutes late will be recorded as absent.
Missing more than 30% of class sessions will result in automatic failure of the course.
Attendance accounts for 10% of your final grade.
Each student gets three βfree passesβ for any reasons (e.g., illness, family matters, commuting issues) without penalty.
Please refer to TRUβs attendance policy. In addition, we will take attendance during class via Moodleβs QR code.
Academic Concessions#
If circumstances (e.g., illness, family emergency, significant life event) may prevent you from meeting course requirements:
Notify the instructor at least 24 hours before the deadline.
Requests are considered case-by-case; you may be asked for documentation.
Possible accommodations:
Deadline extensions
Alternative assessments
Requests made after the deadline are usually refused.
Late Assignments#
Penalty: β25% per day, up to β75% total.
After 3 days late, work is not accepted (grade = 0).
Extensions are granted only for exceptional cases if requested before the deadline.
Accessbility#
Students registered with the Accessibility Services who require accommodations must provide their Letter of Accommodation to the instructor as soon as possible. This letter will outline the necessary accommodations to ensure an equitable learning environment. Please ensure that this is done early in the term to facilitate timely arrangements.
Policy on the use of generative AI#
I am a proponent of the responsible and ethical use of AI in education. You are welcome and encouraged to use any AI tools to support your learning process as you see fit. However, you are responsible for critically evaluating and verifying any AI-generated content you use. I also encourage you to explicitly acknowledge the use of AI in your work.
Suggested format template: AI Acknowledgement: This assignment was completed with assistance from [AI tool name, version, and provider]. The AI was used for [specific purpose, e.g., generating code snippets, summarizing readings, checking grammar]. All AI-generated content was reviewed, verified, and edited by me.
Please refer to TRUβs guideline on the use of generative AI for more information.