🎯 Milestone 5: Final Data Product (Presentation + Final Report)

🎯 Milestone 5: Final Data Product (Presentation + Final Report)#

Milestone 5 is your final deliverable. You will produce:

A 20-minute final presentation designed for an audience who is seeing your project for the first time, with no prior technical background required.
A final written report (maximum 2,000 words) summarizing your entire project. You may combine, revise, or reuse content from previous milestones, but the final report must read as a polished, cohesive document.
A finalized GitHub repository containing your full data product and reproducible workflow.

📌 Deliverables Overview#

#	Deliverable	Due Date
1	Final Presentation (20 min + 5 min Q&A)	Nov 27th, 9:30 AM
2	Final Written Report (≤ 2,000 words)	Dec 10th, 23:59 PM
3	Final GitHub Repository	Dec 10th, 23:59 PM

🗣️ 1. Final Presentation (20 Minutes)#

Your presentation must be accessible to an audience unfamiliar with your project. Assume they have never seen your repository, have not read your milestones, and know nothing about your dataset.

Presentation Requirements#

A. Introduce the Problem Clearly#

What real-world problem are you solving?
Why does this problem matter?
Who is the intended user or beneficiary?

B. Dataset Overview#

Source, size, format
Important characteristics or features
Any data cleaning or preparation required

C. Explain Your Data Pipeline (High-Level)#

Use simple, intuitive explanations:

How MongoDB was used (aggregation pipelines, filtering, transformations)
How RAG & LangChain were integrated
- Embedding models used
- Retrieval strategy (vector store, similarity search, filters)
- Prompt design or chain structure
- Querry transformation techniques
A system architecture diagram

Avoid code-level details. Do NOT show screenshots of your code in the presentation!

D. Present Your Final Data Product#

Examples may include:

A RAG application
Analytical insights
Dashboards
Reports or summaries

Explain:

What the product does
Why it is useful
What problem it solves

E. Demonstrate Value#

What did your project produce that did not exist before?
What insights or capabilities emerged from your analysis or RAG pipeline?

F. Reflection (Lessons Learned)#

What worked well?
What was challenging?
What would you improve with more time?

Presentation Logistics#

20-minute presentation + 5 minutes Q&A
All group members must speak
Recommended: 10–14 slides
Focus on clarity and storytelling

📝 2. Final Written Report (≤ 2,000 Words)#

You may reuse text from previous milestones, but the final report must feel unified and polished.

Required Sections#

1. Introduction & Problem Statement#

Context and motivation
Project goals

2. Data Description#

Data source and structure
Schema and key variables
Cleaning and preprocessing steps

3. Methodology#

MongoDB#

Aggregation pipelines
Query transformations
Filtering and feature preparation

RAG & LangChain#

Embedding model used
Vector store & indexing
Retrieval strategies
Prompt engineering
Chain or agent design
Architecture diagram (recommended)

4. Results / Final Product#

Outputs, responses, or insights
Visualizations
Evidence the pipeline works

5. Reproducibility Notes#

Environment setup
API key / secrets configuration
Execution order
How to re-run everything end-to-end

6. Discussion#

Interpretation of results
Limitations
Challenges

7. Conclusion & Future Work#

Summary
Opportunities for further improvement

8. References#

Cite datasets, libraries, relevant research

📁 3. Final GitHub Repository#

Your repository must reflect professional standards for data science and RAG development.

A. Project Structure (25%)#

Organized folders (src/, notebooks/, configs/, prompts/, tests/)
Snake_case naming
A complete README.md including:
- Project overview
- Environment setup
- Secrets configuration
- Re-running instructions
- Final app or analysis instructions

B. Code Quality (25%)#

Modular, readable Python code
No hard-coded values
Config-driven architecture
Clear commits and git history

C. Reproducible Environment (25%)#

environment.yml or requirements.txt included
Step-by-step environment setup
Full reproduction instructions

D. MongoDB + RAG & LangChain Application (25%)#

Clear, meaningful use of MongoDB
Proper integration of:
- Embeddings
- Vector stores
- Retrievers
- Prompts
- Chains

👥 4. Equal Group Contributions#

Your individual grade may vary based on:

Commit History and Contributions: Your individual contribution to the project will be assessed based on your Github commit history and the amount of code you contributed. Active participation throughout the project is essential, and your grade will be proportional to your level of contribution.
- If your contribution is minimal (e.g., few commits), your grade for this project will reflect that.
Peer Evaluations: At the end of the project, every team member will submit a peer evaluation form. This evaluation, along with your commit history, will be used to assess your individual contribution. Failure to contribute adequately to the group project can result in a failing grade.