✅ Reproducibility Checklist

✅ Reproducibility Checklist#

This checklist ensures your RAG pipeline project is organized, reproducible, and easy to review.
Complete each item ✅ before submission.

🗂️ 1. Directory Structure#

Goal: Keep your repository clean and predictable so others can navigate easily.

Expected structure:

├── src/
│   ├── pipeline.py
│   ├── retriever.py
│   └── utils.py
├── prompts/
│   ├── system_prompt.txt
│   ├── human_prompt.txt
│   └── README.md
├── data/
│   └── (dataset or ingestion scripts)
├── logs/
│   └── (LangSmith or trace logs)
├── notebooks/
│   └── (jupyter notebooks)
├── environment.yml
├── .env.example
├── README.md

✅ Checklist

[ ] Folder structure follows the example above
[ ] Each folder has a clear purpose (no mixed or temporary files)
[ ] No random or unused files (e.g., final_v2_copy.ipynb)

🧾 2. README.md#

Goal: Help anyone clone your repo, set up the environment, and reproduce results.

✅ Checklist

[ ] Includes project overview and team info
[ ] Step-by-step Conda setup instructions
[ ] Clear run command or notebook usage example
[ ] Includes example query and expected output
[ ] Documents reproducibility (environment, prompts, logs)

🧠 3. Prompts in Separate Files#

Goal: Prompts must not be hard-coded. Store them in prompts/ and load dynamically.

✅ Checklist

[ ] All system/human prompts stored as .txt, .yaml, or .json files
[ ] Prompts are read into the code (not embedded directly)
[ ] File names follow versioned naming convention (e.g., system_prompt_v1.txt)

✅ Do

system_prompt = load_prompt("prompts/system_prompt_v1.txt")

🚫 Don’t

system_prompt = "You are a helpful assistant that..."

🧭 4. Prompt Documentation#

Goal: Explain how each prompt is used and what placeholders it expects.

✅ Checklist

[ ] prompts/README.md included
[ ] Each prompt file described (role, variables like {query} or {context})
[ ] Version or date noted in prompt file or header comment

Example prompts/README.md:

system_prompt.txt  – defines the assistant’s role and tone
human_prompt.txt   – template for user query and retrieved context
eval_prompt.txt    – optional evaluation prompt

⚙️ 5. Conda Environment File#

Goal: Allow anyone to recreate the same environment with matching Python and library versions.

✅ Checklist

[ ] environment.yml exists in repo root
[ ] Specifies exact Python version (e.g., python=3.10.14)
[ ] Includes all key dependencies (langchain, openai, pymongo, python-dotenv, etc.)
[ ] Team tested environment creation on a clean machine

✅ Do

name: rag-pipeline
channels:
  - defaults
dependencies:
  - python=3.10.14
  - langchain=0.3.2
  - openai=1.50.1
  - pymongo=4.10.1
  - python-dotenv=1.0.1
  - jupyter

🚫 Don’t

dependencies:
  - python
  - langchain
  - openai

(Too vague — missing versions.)

🔑 6. Environment Variables#

Goal: Keep credentials out of the repository, but show others how to configure them.

✅ Checklist

[ ] .env.example file provided with placeholder keys
[ ] Real .env file excluded via .gitignore
[ ] README explains how to copy .env.example to .env

Example .env.example:

OPENAI_API_KEY=your_api_key_here
MONGODB_URI=your_mongo_connection_here

🧩 7. Naming Conventions#

Goal: Maintain clarity, consistency, and traceability.

✅ Checklist

Category	Convention	Example
Code files	lowercase with underscores	`pipeline.py`, `query_transformer.py`
Notebooks	prefix with milestone or step	`03_rag_pipeline.ipynb`
Prompts	include role + version	`system_prompt_v1.txt`

✅ Do

system_prompt_v1.txt
trace_2025-10-26.json

🚫 Don’t

finalprompt2.txt
log_newest.json

🏁 Final Check Before Submission#

[ ] Directory structure clean and consistent
[ ] README complete and tested
[ ] Prompts in separate, documented files
[ ] Conda environment file includes versions
[ ] .env.example present, .env excluded
[ ] Naming conventions followed throughout

✅ Reproducibility Checklist

Contents

✅ Reproducibility Checklist#

🗂️ 1. Directory Structure#

🧾 2. README.md#

🧠 3. Prompts in Separate Files#

🧭 4. Prompt Documentation#

⚙️ 5. Conda Environment File#

🔑 6. Environment Variables#

🧩 7. Naming Conventions#

🏁 Final Check Before Submission#