🎯 Milestone 5: Final Data Product (Presentation + Final Report)#

Milestone 5 is your final deliverable. You will produce:

  1. A 20-minute final presentation designed for an audience who is seeing your project for the first time, with no prior technical background required.

  2. A final written report (maximum 2,000 words) summarizing your entire project. You may combine, revise, or reuse content from previous milestones, but the final report must read as a polished, cohesive document.

  3. A finalized GitHub repository containing your full data product and reproducible workflow.


📌 Deliverables Overview#

#

Deliverable

Due Date

1

Final Presentation (20 min + 5 min Q&A)

Nov 27th, 9:30 AM

2

Final Written Report (≤ 2,000 words)

Dec 10th, 23:59 PM

3

Final GitHub Repository

Dec 10th, 23:59 PM


🗣️ 1. Final Presentation (20 Minutes)#

Your presentation must be accessible to an audience unfamiliar with your project. Assume they have never seen your repository, have not read your milestones, and know nothing about your dataset.

Presentation Requirements#

A. Introduce the Problem Clearly#

  • What real-world problem are you solving?

  • Why does this problem matter?

  • Who is the intended user or beneficiary?

B. Dataset Overview#

  • Source, size, format

  • Important characteristics or features

  • Any data cleaning or preparation required

C. Explain Your Data Pipeline (High-Level)#

Use simple, intuitive explanations:

  • How MongoDB was used (aggregation pipelines, filtering, transformations)

  • How RAG & LangChain were integrated

    • Embedding models used

    • Retrieval strategy (vector store, similarity search, filters)

    • Prompt design or chain structure

    • Querry transformation techniques

  • A system architecture diagram

Avoid code-level details. Do NOT show screenshots of your code in the presentation!

D. Present Your Final Data Product#

Examples may include:

  • A RAG application

  • Analytical insights

  • Dashboards

  • Reports or summaries

Explain:

  • What the product does

  • Why it is useful

  • What problem it solves

E. Demonstrate Value#

  • What did your project produce that did not exist before?

  • What insights or capabilities emerged from your analysis or RAG pipeline?

F. Reflection (Lessons Learned)#

  • What worked well?

  • What was challenging?

  • What would you improve with more time?

Presentation Logistics#

  • 20-minute presentation + 5 minutes Q&A

  • All group members must speak

  • Recommended: 10–14 slides

  • Focus on clarity and storytelling


📝 2. Final Written Report (≤ 2,000 Words)#

You may reuse text from previous milestones, but the final report must feel unified and polished.

Required Sections#

1. Introduction & Problem Statement#

  • Context and motivation

  • Project goals

2. Data Description#

  • Data source and structure

  • Schema and key variables

  • Cleaning and preprocessing steps

3. Methodology#

MongoDB#

  • Aggregation pipelines

  • Query transformations

  • Filtering and feature preparation

RAG & LangChain#

  • Embedding model used

  • Vector store & indexing

  • Retrieval strategies

  • Prompt engineering

  • Chain or agent design

  • Architecture diagram (recommended)

4. Results / Final Product#

  • Outputs, responses, or insights

  • Visualizations

  • Evidence the pipeline works

5. Reproducibility Notes#

  • Environment setup

  • API key / secrets configuration

  • Execution order

  • How to re-run everything end-to-end

6. Discussion#

  • Interpretation of results

  • Limitations

  • Challenges

7. Conclusion & Future Work#

  • Summary

  • Opportunities for further improvement

8. References#

  • Cite datasets, libraries, relevant research


📁 3. Final GitHub Repository#

Your repository must reflect professional standards for data science and RAG development.

A. Project Structure (25%)#

  • Organized folders (src/, notebooks/, configs/, prompts/, tests/)

  • Snake_case naming

  • A complete README.md including:

    • Project overview

    • Environment setup

    • Secrets configuration

    • Re-running instructions

    • Final app or analysis instructions

B. Code Quality (25%)#

  • Modular, readable Python code

  • No hard-coded values

  • Config-driven architecture

  • Clear commits and git history

C. Reproducible Environment (25%)#

  • environment.yml or requirements.txt included

  • Step-by-step environment setup

  • Full reproduction instructions

D. MongoDB + RAG & LangChain Application (15%)#

  • Clear, meaningful use of MongoDB

  • Proper integration of:

    • Embeddings

    • Vector stores

    • Retrievers

    • Prompts

    • Chains

E. Unit Tests (10%)#

  • Test files corresponding to main modules

  • Tests for correctness and edge cases


👥 4. Equal Group Contributions#

Your individual grade may vary based on:

  • Commit History and Contributions: Your individual contribution to the project will be assessed based on your Github commit history and the amount of code you contributed. Active participation throughout the project is essential, and your grade will be proportional to your level of contribution.

    • If your contribution is minimal (e.g., few commits), your grade for this project will reflect that.

  • Peer Evaluations: At the end of the project, every team member will submit a peer evaluation form. This evaluation, along with your commit history, will be used to assess your individual contribution. Failure to contribute adequately to the group project can result in a failing grade.