Milestone 2: Exploratory Data Analysis (EDA)

Milestone 2: Exploratory Data Analysis (EDA)#

1. Deliverable Description#

The goal of this milestone is to conduct and present your Exploratory Data Analysis (EDA). You should demonstrate that you have explored, cleaned, and structured your dataset, and that you are prepared to move into deeper analysis/modeling. There are three deliverables:

Group Presentation – Showcase your EDA results (due Oct 9)
Written EDA Report – Around 500 words (2 pages) (due Oct 12)
Peer Reviews – Review other groups’ EDA reports individually (due Oct 16)

2. Presentation Guidelines#

Time: Maximum 8 minutes + 2 minutes for Q&A
Slides: Maximum 6 slides
Participation: All team members must contribute
Content:
- Show EDA results with clear visuals (charts, summaries, distributions)
- Highlight MongoDB Atlas database usage (screenshots, schema, or queries)
- If applicable, include an example vector search query and results
- Summarize key insights and how they shape the next stage of your project

3. Suggested 6-Slide Template#

Slide 1: Title & Team#

Project title
Team member names
Course name & date
Visual/logo related to your dataset

Slide 2: Dataset Recap#

Source and type of data (structured/unstructured)
Size and characteristics (# records, features, text length, etc.)
Example snippet of raw data

Slide 3: MongoDB Atlas Setup#

Screenshot of MongoDB Atlas dashboard with your dataset
Overview of schema (collections, documents, nested fields)
Example query (aggregation pipeline, filters, etc.)

Slide 4: EDA Results#

Descriptive statistics (counts, missing values, distributions)
Key patterns found (correlations, trends, anomalies)
Clear visualizations (histograms, word clouds, bar charts, etc.)

Slide 5: Vector Search Demo (if applicable)#

Brief explanation of embeddings/vector search
Example query (e.g., “Find similar documents to this text…”)
Screenshot or sample output showing results and relevance scores

Slide 6: Key Insights & Next Steps#

Main takeaways from your EDA
How these insights refine your research question/approach
Planned methods for the next milestone

4. Written EDA Report Components#

Your report should include:

4.1. Introduction#

Recap of your dataset and project goals
Explain the importance of EDA for your dataset (structured vs unstructured, challenges, etc.)

4.2. Data Exploration#

Summary statistics and descriptive analysis
Visualizations to highlight distributions, relationships, or anomalies
Screenshots or code snippets from MongoDB queries used for EDA

4.3. MongoDB Atlas Demonstration#

Describe how the dataset is stored and queried
Example aggregation pipelines, filters, or schema exploration
Note any preprocessing steps performed in MongoDB

4.4. Vector Search Example (optional/if relevant)#

Explain the embedding model used (if applicable)
Show a query and its top results
Discuss how vector search can support your project goals

4.5. Key Findings & Implications#

Insights gained from EDA that shape the next stage
Challenges encountered (e.g., missing data, schema complexity)
Adjustments to your project plan based on findings

4.6. References#

Any tools, libraries, or methods cited

4.7. Format#

Around 500 words
Write in Markdown or JupyterNotebook format and save as .md or .ipynb in your GitHub repository
Use tables, figures, and screenshots where helpful

5. Peer-review Activity#

Each student will be assigned 2 EDA reports to review.
For each report, provide half a page of feedback in GitHub Issues.
Peer review accounts for 5% of your grade.

Your feedback will be graded on:

Preparedness – Deep understanding of the EDA and its results
Constructiveness – Actionable, supportive suggestions
Professionalism – Respectful and clear tone

Submission Instruction:

Go to the group’s GitHub repository.
Create a GitHub Issue using the peer feedback template.
Fill in your review and submit.