# How to connect mongodb atlas with langchain

## 1. Connect to your mongodb

- You should create a file called `credentials_mongodb.json` in the same directory of this notebook
- Update your information as necessary
```
{
    "host": "cluster0.XXX.mongodb.net",
    "username": "XXX",
    "password": "XXX"
}
```

Run the following cell to establish a connection to your mongodb

In [1]:
from pymongo import MongoClient # import mongo client to connect
import json # import json to load credentials
import urllib.parse

# load credentials from json file
with open('credentials_mongodb.json') as f:
    login = json.load(f)

# assign credentials to variables
username = login['username']
password = urllib.parse.quote(login['password'])
host = login['host']
url = "mongodb+srv://{}:{}@{}/?retryWrites=true&w=majority".format(username, password, host)

# connect to the database
client = MongoClient(url)

Select your database and collection, for example

In [4]:
database = client['demo_vector_db']
collection = database['reviews']
collection.find_one()  # check connection

{'_id': 2,
 'product': 'coffee',
 'text': 'Espresso had a strong flavor but the aftertaste was harsh.'}

## 2. Load your API keys




First step, I would like you to set up langsmith API and google AI studio API

1. Create an empty `.env` file in the same working directory of this notebook

Copy the following to the `.env` file content

```
# Environment variables for LangSmith
export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

# add google api key for testing
export GOOGLE_API_KEY="..."
```

2. Create a LangSmith account & API key
- Go to LangSmith and sign in.
[https://smith.langchain.com](https://smith.langchain.com)
- Log in with your Github account
- Open Settings â†’ API Keys.
- Click Create API key (personal) and copy it.
- Now paste your API key in the `.env` file

3. Get a Gemini API key
- Visit [Google AI Studio](https://aistudio.google.com) â†’ Get API Key and sign in.
- Click Create API key and copy it.
- Paste it in the `.env` file

Once you're done setting up your `.env` file, run the cell below to import your API keys to the environment of this notebook

In [8]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Access the environment variables
os.environ['LANGCHAIN_TRACING_V2'] = os.getenv('LANGSMITH_TRACING')
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = os.getenv('LANGSMITH_API_KEY')

# get google api key from .env
os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY')

## 3. Create your vector embeddings

In [6]:
# ðŸ§© Imports
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import GoogleGenerativeAIEmbeddings 
from langchain.chat_models import init_chat_model
from langchain_mongodb import MongoDBAtlasVectorSearch
from pymongo import MongoClient

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [9]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai")


E0000 00:00:1760641585.873410 43685615 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.
E0000 00:00:1760641585.880439 43685615 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.
E0000 00:00:1760641585.887316 43685615 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


In [12]:
# create embedding for all documents in the collection
docs = collection.find({})
for doc in docs:
    text = doc['text']
    vector = embeddings.embed_query(text)
    collection.update_one({'_id': doc['_id']}, {'$set': {'embedding': vector}})

collection.find_one()  # check the embedding field has been added

{'_id': 2,
 'product': 'coffee',
 'text': 'Espresso had a strong flavor but the aftertaste was harsh.',
 'embedding': [0.005555155221372843,
  -0.004266100004315376,
  0.0005982645670883358,
  -0.07797335088253021,
  -0.009961280040442944,
  0.010153294540941715,
  0.018066799268126488,
  0.0032630597706884146,
  -0.01573711819946766,
  0.006337878759950399,
  -0.0034090213011950254,
  -0.019130930304527283,
  0.008163469843566418,
  0.010324482806026936,
  0.1423736810684204,
  -0.011681054718792439,
  -0.00299204234033823,
  0.01278656255453825,
  0.002621940802782774,
  -0.012735817581415176,
  0.025707533583045006,
  -0.001461464329622686,
  0.018543118610978127,
  -0.015003417618572712,
  -0.0043244496919214725,
  -0.031390704214572906,
  0.03164766728878021,
  0.03883090615272522,
  0.03823671489953995,
  -0.0022437938023358583,
  0.009748902171850204,
  0.00608069309964776,
  -0.008276988752186298,
  0.009335118345916271,
  0.003786098212003708,
  0.04993116855621338,
  0.012127

## 4. Create a vector store

In [16]:
# Now let's create a vectorstore
vector_store = MongoDBAtlasVectorSearch(
    collection=collection, # use the collection we created above
    embedding=embeddings, # use the embedding model we created above
    index_name="vector_index_1", # name of the index
    relevance_score_fn="cosine"
)

In [21]:
retriever = vector_store.as_retriever()
retriever.invoke("Give me a salad")

[Document(id='4', metadata={'_id': 4, 'product': 'salad'}, page_content='Fresh greens and a light dressingâ€”very refreshing.'),
 Document(id='3', metadata={'_id': 3, 'product': 'pizza'}, page_content='Crispy crust, generous toppings, and the sauce was tangy.'),
 Document(id='5', metadata={'_id': 5, 'product': 'cake'}, page_content='Moist chocolate cake with rich frosting. A perfect dessert.'),
 Document(id='1', metadata={'_id': 1, 'product': 'coffee'}, page_content='The latte was smooth and not too bitter. Loved it!')]

## 5. Create your prompt template

In [22]:
# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser

generate_queries = (
    prompt_perspectives 
    | llm 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [25]:
# get the questions
questions = generate_queries.invoke({"question": "What is something sweet and fruity?"})


In [26]:
questions

['Describe a dessert or snack that is both sugary and berry-like.',
 'Can you list some foods or drinks that have a sweet and fruit-derived flavor profile?',
 "I'm looking for items characterized by a sugary taste and fruit essence. What comes to mind?",
 'Suggest a treat or beverage that combines sweetness with natural fruit flavors.',
 'What are some delightful edibles or potables known for their blend of sweetness and fruitiness?']

In [27]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
# question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union

In [28]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
# create runnable chain

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)
question = "What is something sweet and fruity?"
final_rag_chain.invoke({"question":question})


  return [loads(doc) for doc in unique_docs]


"Based on the reviews I have, none of the products are described as both sweet and fruity.\n\nHere's what the reviews mention:\n*   **Cake:** A moist chocolate cake with rich frosting (sweet, but chocolate, not fruity).\n*   **Pizza:** Crispy crust, generous toppings, and tangy sauce.\n*   **Coffee:** An espresso with a strong, harsh flavor, and a smooth, not too bitter latte.\n*   **Salad:** Fresh greens and a light, refreshing dressing.\n\nIt seems like we don't have any fruity options in these reviews!"