βœ… Reproducibility Checklist#

This checklist ensures your RAG pipeline project is organized, reproducible, and easy to review.
Complete each item βœ… before submission.


πŸ—‚οΈ 1. Directory Structure#

Goal: Keep your repository clean and predictable so others can navigate easily.

Expected structure:

β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ pipeline.py
β”‚   β”œβ”€β”€ retriever.py
β”‚   └── utils.py
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ system_prompt.txt
β”‚   β”œβ”€β”€ human_prompt.txt
β”‚   └── README.md
β”œβ”€β”€ data/
β”‚   └── (dataset or ingestion scripts)
β”œβ”€β”€ logs/
β”‚   └── (LangSmith or trace logs)
β”œβ”€β”€ notebooks/
β”‚   └── (jupyter notebooks)
β”œβ”€β”€ environment.yml
β”œβ”€β”€ .env.example
β”œβ”€β”€ README.md

βœ… Checklist

  • [ ] Folder structure follows the example above

  • [ ] Each folder has a clear purpose (no mixed or temporary files)

  • [ ] No random or unused files (e.g., final_v2_copy.ipynb)


🧾 2. README.md#

Goal: Help anyone clone your repo, set up the environment, and reproduce results.

βœ… Checklist

  • [ ] Includes project overview and team info

  • [ ] Step-by-step Conda setup instructions

  • [ ] Clear run command or notebook usage example

  • [ ] Includes example query and expected output

  • [ ] Documents reproducibility (environment, prompts, logs)


🧠 3. Prompts in Separate Files#

Goal: Prompts must not be hard-coded. Store them in prompts/ and load dynamically.

βœ… Checklist

  • [ ] All system/human prompts stored as .txt, .yaml, or .json files

  • [ ] Prompts are read into the code (not embedded directly)

  • [ ] File names follow versioned naming convention (e.g., system_prompt_v1.txt)

βœ… Do

system_prompt = load_prompt("prompts/system_prompt_v1.txt")

🚫 Don’t

system_prompt = "You are a helpful assistant that..."

🧭 4. Prompt Documentation#

Goal: Explain how each prompt is used and what placeholders it expects.

βœ… Checklist

  • [ ] prompts/README.md included

  • [ ] Each prompt file described (role, variables like {query} or {context})

  • [ ] Version or date noted in prompt file or header comment

Example prompts/README.md:

system_prompt.txt  – defines the assistant’s role and tone
human_prompt.txt   – template for user query and retrieved context
eval_prompt.txt    – optional evaluation prompt

βš™οΈ 5. Conda Environment File#

Goal: Allow anyone to recreate the same environment with matching Python and library versions.

βœ… Checklist

  • [ ] environment.yml exists in repo root

  • [ ] Specifies exact Python version (e.g., python=3.10.14)

  • [ ] Includes all key dependencies (langchain, openai, pymongo, python-dotenv, etc.)

  • [ ] Team tested environment creation on a clean machine

βœ… Do

name: rag-pipeline
channels:
  - defaults
dependencies:
  - python=3.10.14
  - langchain=0.3.2
  - openai=1.50.1
  - pymongo=4.10.1
  - python-dotenv=1.0.1
  - jupyter

🚫 Don’t

dependencies:
  - python
  - langchain
  - openai

(Too vague β€” missing versions.)


πŸ”‘ 6. Environment Variables#

Goal: Keep credentials out of the repository, but show others how to configure them.

βœ… Checklist

  • [ ] .env.example file provided with placeholder keys

  • [ ] Real .env file excluded via .gitignore

  • [ ] README explains how to copy .env.example to .env

Example .env.example:

OPENAI_API_KEY=your_api_key_here
MONGODB_URI=your_mongo_connection_here

🧩 7. Naming Conventions#

Goal: Maintain clarity, consistency, and traceability.

βœ… Checklist

Category

Convention

Example

Code files

lowercase with underscores

pipeline.py, query_transformer.py

Notebooks

prefix with milestone or step

03_rag_pipeline.ipynb

Prompts

include role + version

system_prompt_v1.txt

βœ… Do

system_prompt_v1.txt
trace_2025-10-26.json

🚫 Don’t

finalprompt2.txt
log_newest.json

🏁 Final Check Before Submission#

  • [ ] Directory structure clean and consistent

  • [ ] README complete and tested

  • [ ] Prompts in separate, documented files

  • [ ] Conda environment file includes versions

  • [ ] .env.example present, .env excluded

  • [ ] Naming conventions followed throughout