ADSC 3910 - Applied Data Science Integrated Practice 2#

This course provides students with the opportunity to apply and practice their data science skills by developing an end-to-end data science project. The course follows a project-based format, consisting of multiple milestones designed to guide students in creating a comprehensive data science pipeline. We will focus on the application of non-relational database concepts, leveraging MongoDB for data modeling and management. The course will also emphasize the use of distributed data processing with PySpark, and the application of vector search and Retrieval-Augmented Generation (RAG) pipelines for advanced data analysis.

Learning objectives#

By the end of this course, students are expected be able to:

  • Identify an interesting data science question for which data are available or obtainable.

  • Define the scope of a possible solution, create a proposal and a timeline with key milestones.

  • Create a fully reproducible data science pipeline, packaged in a Github repository.

  • Utilize MongoDB as a database system.

  • Utilize PySpark in processing and analyzing the data.

  • Implement and evaluate vector search techniques and Retrieval-Augmented Generation (RAG) pipelines.

  • Work collaboratively as a team to create, document, and present (using written, oral, and visual means) the process and results from a solution to a data science problem.

Schedule & office hours#

  • Tuesday & Thursday 11:00 - 12:15 OM 1325

Office hour: Tuesday 1-2 PM on Microsoft Teams. Join via this link

Communication#

  • For any course-related questions, such as lectures, assignments, exames, course logistics, please ask them under the discussion forum in Moodle.

  • For any individual-related questions, such as academic concession, deadline extension, personal circumstances, etc., please email me at lnguyen[at]tru[dot]ca

  • Response time: I will try our best to reply to your inquiries as soon as possible during the normal working hours (9AM-5PM Mon-Fri). If you send me a message outside of regular working hours, please expect a response on the next working day.

Lectures#

Week no.

Week day

Date

Topic

Readings

1

Thu

Sep 4

Course introduction & teamwork contract

Syllabus, Teamwork contract

2

Tue

Sep 9

Lecture 1: Intro to Github for data science reproducibility

Chapter 1, Chapter 4, Chapter 5

2

Thu

Sep 11

Lecture 2: Intro to Github copilot

β€”β€”-

3

Tue

Sep 16

Teamwork time

β€”β€”-

3

Thu

Sep 18

Milestone 1 presentations

β€”β€”-

4

Tue

Sep 23

Lecture 3: Managing virtual environment

Chapter 7, Chapter 8, Chapter 8.5

4

Thu

Sep 25

Lecture 4: Project management with Github copilot

β€”β€”-

5

Tue

Sep 30

BREAK: Truth and conciliation

β€”β€”-

5

Thu

Oct 2

Lecture 5: How to deliver a technical presentation

β€”β€”-

6

Tue

Oct 7

Teamwork time

β€”β€”-

6

Thu

Oct 9

Milestone 2 presentations

β€”β€”-

7

Tue

Oct 14

β€”β€”-

7

Thu

Oct 16

β€”β€”-

8

Tue

Oct 21

β€”β€”-

8

Thu

Oct 23

β€”β€”-

9

Tue

Oct 28

Team work time

β€”β€”-

9

Thu

Oct 30

Milestone 3 presentations

β€”β€”-

10

Tue

Nov 4

β€”β€”-

10

Thu

Nov 6

β€”β€”-

11

Tue

Nov 11

BREAK: Rememberance day

β€”β€”-

11

Thu

Nov 13

Milestones 4 presentations

β€”β€”-

12

Tue

Nov 18

β€”β€”-

12

Thu

Nov 20

β€”β€”-

13

Tue

Nov 25

Team work time

β€”β€”-

13

Thu

Nov 27

Milestone 5 presentations

β€”β€”-

Assessment overview#

Assessment component

Weight

Deadline

Teamwork contract

5%

Sep 9, 2025, 11:59 PM

Milestone 1: Dataset selection & project proposal

10%

Sep 18, 2025, 11:59 PM

Peer review of Milestone 1 (completed individually)

5%

Sep 29, 2025, 11:59 PM

Milestone 2: Database schema & EDA results

10%

Oct 09, 2025, 11:59 PM

Peer review of Milestone 2 (completed individually)

5%

Oct 09, 2025, 11:59 PM

Milestone 3: Vector embeddings implementation

10%

Oct 30, 2025, 11:59 PM

Peer review of Milestone 3 (completed individually)

5%

Oct 30, 2025, 11:59 PM

Milestone 4: RAG retrieval implementation

10%

Nov 13, 2025, 11:59 PM

Peer review of Milestone 4 (completed individually)

5%

Nov 20, 2025, 11:59 PM

Milestone 5: Final data product (presentation)

15%

Nov 27, 2025, 11:59 PM

Final report (completed individually)

15%

Dec 10, 2025, 11:59 PM

Teamwork reflection

5%

Dec 10, 2025, 11:59 PM

Attendance, late assignments, academic concessions, academic accomodation#

Attendance#

A registered student who does not attend the first two events (e.g., lectures/labs/ etc.) of their course(s) and who has not made prior arrangements acceptable to the instructor(s) may, at the discretion of the instructor(s), be considered to have withdrawn from the course(s) and have their course registration(s) deleted.

  • Arriving more than 5 minutes late will be recorded as absent.

  • Missing more than 30% of class sessions will result in automatic failure of the course.

  • Attendance accounts for 10% of your final grade.

  • Each student gets three β€œfree passes” for any reasons (e.g., illness, family matters, commuting issues) without penalty.

Please refer to TRU’s attendance policy. In addition, we will take attendance during class via Moodle’s QR code.

Academic Concessions#

If circumstances (e.g., illness, family emergency, significant life event) may prevent you from meeting course requirements:

  • Notify the instructor at least 24 hours before the deadline.

  • Requests are considered case-by-case; you may be asked for documentation.

  • Possible accommodations:

    • Deadline extensions

    • Alternative assessments

  • Requests made after the deadline are usually refused.

Late Assignments#

  • Penalty: βˆ’25% per day, up to βˆ’75% total.

  • After 3 days late, work is not accepted (grade = 0).

  • Extensions are granted only for exceptional cases if requested before the deadline.

Accessbility#

Students registered with the Accessibility Services who require accommodations must provide their Letter of Accommodation to the instructor as soon as possible. This letter will outline the necessary accommodations to ensure an equitable learning environment. Please ensure that this is done early in the term to facilitate timely arrangements.

Policy on the use of generative AI#

I am a proponent of the responsible and ethical use of AI in education. You are welcome and encouraged to use any AI tools to support your learning process as you see fit. However, you are responsible for critically evaluating and verifying any AI-generated content you use. I also encourage you to explicitly acknowledge the use of AI in your work.

Suggested format template: AI Acknowledgement: This assignment was completed with assistance from [AI tool name, version, and provider]. The AI was used for [specific purpose, e.g., generating code snippets, summarizing readings, checking grammar]. All AI-generated content was reviewed, verified, and edited by me.

Please refer to TRU’s guideline on the use of generative AI for more information.