Milestone 2: Exploratory Data Analysis (EDA)#

1. Deliverable Description#

The goal of this milestone is to conduct and present your Exploratory Data Analysis (EDA). You should demonstrate that you have explored, cleaned, and structured your dataset, and that you are prepared to move into deeper analysis/modeling. There are three deliverables:

  1. Group Presentation – Showcase your EDA results (due Oct 9)

  2. Written EDA Report – Around 500 words (2 pages) (due Oct 12)

  3. Peer Reviews – Review other groups’ EDA reports individually (due Oct 16)


2. Presentation Guidelines#

  • Time: Maximum 8 minutes + 2 minutes for Q&A

  • Slides: Maximum 6 slides

  • Participation: All team members must contribute

  • Content:

    • Show EDA results with clear visuals (charts, summaries, distributions)

    • Highlight MongoDB Atlas database usage (screenshots, schema, or queries)

    • If applicable, include an example vector search query and results

    • Summarize key insights and how they shape the next stage of your project


3. Suggested 6-Slide Template#

Slide 1: Title & Team#

  • Project title

  • Team member names

  • Course name & date

  • Visual/logo related to your dataset

Slide 2: Dataset Recap#

  • Source and type of data (structured/unstructured)

  • Size and characteristics (# records, features, text length, etc.)

  • Example snippet of raw data

Slide 3: MongoDB Atlas Setup#

  • Screenshot of MongoDB Atlas dashboard with your dataset

  • Overview of schema (collections, documents, nested fields)

  • Example query (aggregation pipeline, filters, etc.)

Slide 4: EDA Results#

  • Descriptive statistics (counts, missing values, distributions)

  • Key patterns found (correlations, trends, anomalies)

  • Clear visualizations (histograms, word clouds, bar charts, etc.)

Slide 5: Vector Search Demo (if applicable)#

  • Brief explanation of embeddings/vector search

  • Example query (e.g., “Find similar documents to this text…”)

  • Screenshot or sample output showing results and relevance scores

Slide 6: Key Insights & Next Steps#

  • Main takeaways from your EDA

  • How these insights refine your research question/approach

  • Planned methods for the next milestone


4. Written EDA Report Components#

Your report should include:

4.1. Introduction#

  • Recap of your dataset and project goals

  • Explain the importance of EDA for your dataset (structured vs unstructured, challenges, etc.)

4.2. Data Exploration#

  • Summary statistics and descriptive analysis

  • Visualizations to highlight distributions, relationships, or anomalies

  • Screenshots or code snippets from MongoDB queries used for EDA

4.3. MongoDB Atlas Demonstration#

  • Describe how the dataset is stored and queried

  • Example aggregation pipelines, filters, or schema exploration

  • Note any preprocessing steps performed in MongoDB

4.4. Vector Search Example (optional/if relevant)#

  • Explain the embedding model used (if applicable)

  • Show a query and its top results

  • Discuss how vector search can support your project goals

4.5. Key Findings & Implications#

  • Insights gained from EDA that shape the next stage

  • Challenges encountered (e.g., missing data, schema complexity)

  • Adjustments to your project plan based on findings

4.6. References#

  • Any tools, libraries, or methods cited

4.7. Format#

  • Around 500 words

  • Write in Markdown or JupyterNotebook format and save as .md or .ipynb in your GitHub repository

  • Use tables, figures, and screenshots where helpful


5. Peer-review Activity#

  • Each student will be assigned 2 EDA reports to review.

  • For each report, provide half a page of feedback in GitHub Issues.

  • Peer review accounts for 5% of your grade.

Your feedback will be graded on:

  • Preparedness – Deep understanding of the EDA and its results

  • Constructiveness – Actionable, supportive suggestions

  • Professionalism – Respectful and clear tone

Submission Instruction:

  1. Go to the group’s GitHub repository.

  2. Create a GitHub Issue using the peer feedback template.

  3. Fill in your review and submit.