β Reproducibility Checklist#
This checklist ensures your RAG pipeline project is organized, reproducible, and easy to review.
Complete each item β
before submission.
ποΈ 1. Directory Structure#
Goal: Keep your repository clean and predictable so others can navigate easily.
Expected structure:
βββ src/
β βββ pipeline.py
β βββ retriever.py
β βββ utils.py
βββ prompts/
β βββ system_prompt.txt
β βββ human_prompt.txt
β βββ README.md
βββ data/
β βββ (dataset or ingestion scripts)
βββ logs/
β βββ (LangSmith or trace logs)
βββ notebooks/
β βββ (jupyter notebooks)
βββ environment.yml
βββ .env.example
βββ README.md
β Checklist
[ ] Folder structure follows the example above
[ ] Each folder has a clear purpose (no mixed or temporary files)
[ ] No random or unused files (e.g.,
final_v2_copy.ipynb)
π§Ύ 2. README.md#
Goal: Help anyone clone your repo, set up the environment, and reproduce results.
β Checklist
[ ] Includes project overview and team info
[ ] Step-by-step Conda setup instructions
[ ] Clear run command or notebook usage example
[ ] Includes example query and expected output
[ ] Documents reproducibility (environment, prompts, logs)
π§ 3. Prompts in Separate Files#
Goal: Prompts must not be hard-coded. Store them in prompts/ and load dynamically.
β Checklist
[ ] All system/human prompts stored as
.txt,.yaml, or.jsonfiles[ ] Prompts are read into the code (not embedded directly)
[ ] File names follow versioned naming convention (e.g.,
system_prompt_v1.txt)
β Do
system_prompt = load_prompt("prompts/system_prompt_v1.txt")
π« Donβt
system_prompt = "You are a helpful assistant that..."
π§ 4. Prompt Documentation#
Goal: Explain how each prompt is used and what placeholders it expects.
β Checklist
[ ]
prompts/README.mdincluded[ ] Each prompt file described (role, variables like
{query}or{context})[ ] Version or date noted in prompt file or header comment
Example prompts/README.md:
system_prompt.txt β defines the assistantβs role and tone
human_prompt.txt β template for user query and retrieved context
eval_prompt.txt β optional evaluation prompt
βοΈ 5. Conda Environment File#
Goal: Allow anyone to recreate the same environment with matching Python and library versions.
β Checklist
[ ]
environment.ymlexists in repo root[ ] Specifies exact Python version (e.g.,
python=3.10.14)[ ] Includes all key dependencies (
langchain,openai,pymongo,python-dotenv, etc.)[ ] Team tested environment creation on a clean machine
β Do
name: rag-pipeline
channels:
- defaults
dependencies:
- python=3.10.14
- langchain=0.3.2
- openai=1.50.1
- pymongo=4.10.1
- python-dotenv=1.0.1
- jupyter
π« Donβt
dependencies:
- python
- langchain
- openai
(Too vague β missing versions.)
π 6. Environment Variables#
Goal: Keep credentials out of the repository, but show others how to configure them.
β Checklist
[ ]
.env.examplefile provided with placeholder keys[ ] Real
.envfile excluded via.gitignore[ ] README explains how to copy
.env.exampleto.env
Example .env.example:
OPENAI_API_KEY=your_api_key_here
MONGODB_URI=your_mongo_connection_here
π§© 7. Naming Conventions#
Goal: Maintain clarity, consistency, and traceability.
β Checklist
Category |
Convention |
Example |
|---|---|---|
Code files |
lowercase with underscores |
|
Notebooks |
prefix with milestone or step |
|
Prompts |
include role + version |
|
β Do
system_prompt_v1.txt
trace_2025-10-26.json
π« Donβt
finalprompt2.txt
log_newest.json
π Final Check Before Submission#
[ ] Directory structure clean and consistent
[ ] README complete and tested
[ ] Prompts in separate, documented files
[ ] Conda environment file includes versions
[ ]
.env.examplepresent,.envexcluded[ ] Naming conventions followed throughout