Giao diện
Build RAG Application (LangChain)
1. Purpose
Ground LLM responses in fact. LLMs hallucinate. RAG (Retrieval Augmented Generation) solves this by fetching relevant context from your private docs and forcing the LLM to "Answer using only the provided context."
2. When to Use / When Not to Use
Use This Workflow When
- Building a "Chat with your PDF" feature.
- Creating an internal Knowledge Base bot.
- The answers must be verifiable citations.
Do NOT Use This Workflow When
- Fine-tuning behavior is needed (Use LoRA).
- The context window fits all data (Just stuff the context).
- Questions require global reasoning across all docs ("Summarize all 10k files"). RAG is for specific lookup.
3. Inputs
Required Inputs
- [[DOCUMENT_SOURCE]]: PDFs, Text files, Notion API.
- [[LLM_MODEL]]: GPT-4, Claude 3, Llama 3.
- [[VECTOR_DB]]: Pinecone, Chroma, FAISS, PGVector.
4. Outputs
- Ingestion Pipeline: Load -> Split -> Embed -> Store.
- Retrieval Chain: Query -> Search -> Prompt -> Answer.
5. Preconditions
- API Keys (OpenAI / Anthropic / Pinecone).
- Python environment with
langchaininstalled.
6. Procedure
Phase 1: Ingestion (Indexing)
Action: Load Documents.
- Expected Output: List of
Documentobjects. - Notes: Use LangChain Loaders (
PyPDFLoader,WebBaseLoader).
- Expected Output: List of
Action: Split Text (Chunking).
- Expected Output: Chunks of ~500-1000 tokens with overlap.
- Notes: Crucial step. Too small = missing context. Too big = confuses retrieval. Use
RecursiveCharacterTextSplitter.
Action: Embed & Store.
- Expected Output: Vectors in [[VECTOR_DB]].
- Notes: Use a fast Embedding Model (
text-embedding-3-small).
Phase 2: Retrieval Chain
Action: Define Prompt Template.
- Expected Output: "Context: {context}. Question: {question}. Answer:".
Action: Construct Chain.
- Expected Output:
create_retrieval_chainlinking Retriever and LLM.
- Expected Output:
Phase 3: Evaluation
- Action: Test Faithfulness.
- Expected Output: Verify answer comes from the context, not pre-training knowledge.
7. Quality Gates
- [ ] Chunk Overlap: Overlap exists (e.g., 100 chars) to prevent cutting sentences mid-thought.
- [ ] Prompt Guardrails: System prompt explicitly forbids making up info.
- [ ] Latency: Retrieval + Generation < 5s (Stream output if longer).
8. Failure Handling
Irrelevant Context
- Symptoms: LLM says "I don't know" or gives wrong info because the Retriever found bad chunks.
- Recovery: Tune Chunk Size/Count (k). Use Hybrid Search (Keyword + Semantic). Re-rank results.
Hallucination
- Symptoms: LLM ignores context and lies.
- Recovery: Reduce Temperature to 0. Strengthen System Prompt instructions ("Say 'I don't know' if unclear").
9. Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as a GenAI Engineer.
Task: Execute the RAG implementation workflow.
## Objective
Build RAG pipeline for [[DOCUMENT_SOURCE]] using [[VECTOR_DB]].
## Inputs
- **Model**: [[LLM_MODEL]]
## Procedure
Execute the following phases:
1. **Ingest**:
- Helper: Load Docs using LC Loader.
- Helper: Chunk with `RecursiveCharacterTextSplitter` (1000/200).
- Helper: Upsert to [[VECTOR_DB]].
2. **Chain**:
- Create Retriever (k=5).
- Create "Stuff" Documents Chain.
- Create `RetrievalQA` chain.
3. **Interact**:
- Expose `ask(query)` function.
## Quality Gates
- [ ] Splitter preserves sentence structure.
- [ ] Temperature = 0.
- [ ] Source citations included in response.
## Constraints
- Output: Python Code.
- Library: LangChain v0.1+.
## Command
Write the Ingestion script.