Skip to content

Build RAG Application (LangChain)


1. Purpose

Ground LLM responses in fact. LLMs hallucinate. RAG (Retrieval Augmented Generation) solves this by fetching relevant context from your private docs and forcing the LLM to "Answer using only the provided context."


2. When to Use / When Not to Use

Use This Workflow When

  • Building a "Chat with your PDF" feature.
  • Creating an internal Knowledge Base bot.
  • The answers must be verifiable citations.

Do NOT Use This Workflow When

  • Fine-tuning behavior is needed (Use LoRA).
  • The context window fits all data (Just stuff the context).
  • Questions require global reasoning across all docs ("Summarize all 10k files"). RAG is for specific lookup.

3. Inputs

Required Inputs

  • [[DOCUMENT_SOURCE]]: PDFs, Text files, Notion API.
  • [[LLM_MODEL]]: GPT-4, Claude 3, Llama 3.
  • [[VECTOR_DB]]: Pinecone, Chroma, FAISS, PGVector.

4. Outputs

  • Ingestion Pipeline: Load -> Split -> Embed -> Store.
  • Retrieval Chain: Query -> Search -> Prompt -> Answer.

5. Preconditions

  • API Keys (OpenAI / Anthropic / Pinecone).
  • Python environment with langchain installed.

6. Procedure

Phase 1: Ingestion (Indexing)

  1. Action: Load Documents.

    • Expected Output: List of Document objects.
    • Notes: Use LangChain Loaders (PyPDFLoader, WebBaseLoader).
  2. Action: Split Text (Chunking).

    • Expected Output: Chunks of ~500-1000 tokens with overlap.
    • Notes: Crucial step. Too small = missing context. Too big = confuses retrieval. Use RecursiveCharacterTextSplitter.
  3. Action: Embed & Store.

    • Expected Output: Vectors in [[VECTOR_DB]].
    • Notes: Use a fast Embedding Model (text-embedding-3-small).

Phase 2: Retrieval Chain

  1. Action: Define Prompt Template.

    • Expected Output: "Context: {context}. Question: {question}. Answer:".
  2. Action: Construct Chain.

    • Expected Output: create_retrieval_chain linking Retriever and LLM.

Phase 3: Evaluation

  1. Action: Test Faithfulness.
    • Expected Output: Verify answer comes from the context, not pre-training knowledge.

7. Quality Gates

  • [ ] Chunk Overlap: Overlap exists (e.g., 100 chars) to prevent cutting sentences mid-thought.
  • [ ] Prompt Guardrails: System prompt explicitly forbids making up info.
  • [ ] Latency: Retrieval + Generation < 5s (Stream output if longer).

8. Failure Handling

Irrelevant Context

  • Symptoms: LLM says "I don't know" or gives wrong info because the Retriever found bad chunks.
  • Recovery: Tune Chunk Size/Count (k). Use Hybrid Search (Keyword + Semantic). Re-rank results.

Hallucination

  • Symptoms: LLM ignores context and lies.
  • Recovery: Reduce Temperature to 0. Strengthen System Prompt instructions ("Say 'I don't know' if unclear").

9. Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as a GenAI Engineer.
Task: Execute the RAG implementation workflow.

## Objective
Build RAG pipeline for [[DOCUMENT_SOURCE]] using [[VECTOR_DB]].

## Inputs
- **Model**: [[LLM_MODEL]]

## Procedure
Execute the following phases:

1. **Ingest**:
   - Helper: Load Docs using LC Loader.
   - Helper: Chunk with `RecursiveCharacterTextSplitter` (1000/200).
   - Helper: Upsert to [[VECTOR_DB]].

2. **Chain**:
   - Create Retriever (k=5).
   - Create "Stuff" Documents Chain.
   - Create `RetrievalQA` chain.

3. **Interact**:
   - Expose `ask(query)` function.

## Quality Gates
- [ ] Splitter preserves sentence structure.
- [ ] Temperature = 0.
- [ ] Source citations included in response.

## Constraints
- Output: Python Code.
- Library: LangChain v0.1+.

## Command
Write the Ingestion script.

Cập nhật lần cuối: