Giao diện
📚 RAG Engineering
🎓 Page Overview
Trang này cung cấp kiến thức chuyên sâu về Retrieval-Augmented Generation (RAG), từ chunking strategies đến retrieval optimization và citation management.
Level: Core Solves: Xây dựng production-ready RAG systems với high retrieval quality và proper grounding
🎯 RAG Architecture Overview
Standard RAG Pipeline
RAG Components
| Component | Purpose | Key Decisions |
|---|---|---|
| Chunking | Split documents into retrievable units | Size, overlap, semantic boundaries |
| Embedding | Convert text to dense vectors | Model selection, dimension trade-offs |
| Vector Store | Index và search embeddings | Index type, hybrid search capability |
| Retrieval | Find relevant chunks | Top-K, similarity threshold |
| Reranking | Re-score retrieved results | Model choice, latency budget |
| Generation | Produce grounded answers | Context window management, citation |
📦 Chunking Strategies
Chunking Methods Comparison
| Method | Description | Best For |
|---|---|---|
| Fixed Size | Split by character/token count | Uniform documents, simple setup |
| Sentence-based | Split on sentence boundaries | Well-structured prose |
| Paragraph-based | Split on paragraph breaks | Documents with clear structure |
| Semantic | Split on topic changes | Mixed content, technical docs |
| Recursive | Hierarchical splitting | Complex documents, code |
Chunking Parameters
python
# Example configuration
CHUNK_CONFIG = {
"chunk_size": 512, # tokens
"chunk_overlap": 50, # tokens
"separators": ["\n\n", "\n", ". ", " "],
"length_function": "tiktoken",
"keep_separator": True
}Chunk Size Trade-offs
| Chunk Size | Retrieval Precision | Context Richness | Embedding Quality |
|---|---|---|---|
| Small (128-256) | High | Low | High |
| Medium (512-1024) | Medium | Medium | Medium |
| Large (1024-2048) | Low | High | Lower |
💡 Best Practice
Start with 512 tokens với 10% overlap. Adjust based on retrieval quality metrics.
Advanced Chunking: Hierarchical
Benefits:
- Parent chunks provide broader context
- Child chunks enable precise retrieval
- Can dynamically expand context when needed
🔍 Retrieval Optimization
Retrieval Strategies
| Strategy | Description | Use Case |
|---|---|---|
| Dense Retrieval | Semantic similarity via embeddings | Conceptual matching |
| Sparse Retrieval | BM25/TF-IDF keyword matching | Exact term matching |
| Hybrid | Combine dense + sparse | Best of both worlds |
| Multi-Vector | Multiple embeddings per chunk | Diverse query types |
Hybrid Search Implementation
Fusion Formula (RRF):
score(d) = Σ 1 / (k + rank_i(d))Query Transformation
| Technique | Description | When to Use |
|---|---|---|
| Query Expansion | Add synonyms và related terms | Broad topic coverage |
| HyDE | Generate hypothetical answer, embed that | Improved semantic matching |
| Multi-Query | Generate query variations | Diverse result coverage |
| Step-back | Generate more abstract query | Conceptual understanding |
🎯 Reranking
Reranker Types
| Type | Latency | Quality | Cost |
|---|---|---|---|
| Cross-Encoder | High | Highest | High |
| ColBERT | Medium | High | Medium |
| LLM-based | Very High | Variable | Highest |
| Hybrid Score | Low | Medium | Low |
Reranking Pipeline
Implementation Considerations
python
# Reranking configuration
RERANK_CONFIG = {
"initial_k": 50, # Retrieve more for reranking
"final_k": 5, # Return top after rerank
"model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
"batch_size": 32,
"score_threshold": 0.5 # Optional filtering
}📝 Context Assembly
Context Window Management
Context Ordering Strategies
| Strategy | Description | Impact |
|---|---|---|
| Relevance First | Most relevant at top | Best for short contexts |
| Lost in Middle | Relevant at start and end | Mitigates attention bias |
| Chronological | Time-ordered | Good for temporal reasoning |
| Hierarchical | Structured by importance | Good for complex topics |
Context Compression
markdown
## Original Context (500 tokens)
[Long detailed passage about machine learning...]
## Compressed Context (150 tokens)
Key points: ML involves training models on data.
Main approaches: supervised, unsupervised, reinforcement learning.
Critical consideration: data quality and quantity.📎 Citation & Grounding
Citation Patterns
| Pattern | Example | Use Case |
|---|---|---|
| Inline | "The answer is X [1]" | Academic style |
| Section-based | "According to Section 3.2..." | Document reference |
| Verbatim Quote | "As stated: 'exact quote'" | Legal/compliance |
| Confidence-tagged | "X (confidence: high)" | Uncertainty awareness |
Grounding Verification
Citation Policy Template
markdown
## Citation Requirements
1. **Every factual claim** must reference a source document
2. **Format**: Use [Source: document_id, page X] inline
3. **Unsupported claims**: Prefix with "Based on general knowledge:"
4. **Conflicting sources**: Present both views with sources
5. **No source found**: Explicitly state "No relevant source found"📊 Evaluation Metrics
Retrieval Metrics
| Metric | Formula | Target |
|---|---|---|
| Recall@K | Relevant retrieved / Total relevant | > 0.8 |
| Precision@K | Relevant retrieved / K | > 0.6 |
| MRR | Mean Reciprocal Rank | > 0.7 |
| NDCG | Normalized DCG | > 0.75 |
End-to-End Metrics
| Metric | Description | Measurement |
|---|---|---|
| Faithfulness | Answer grounded in context | LLM-as-judge |
| Answer Relevance | Answer addresses query | LLM-as-judge |
| Context Relevance | Retrieved context quality | Embedding similarity |
| Groundedness | Claims supported by sources | Citation verification |
📋 RAG Engineering Checklist
Design Phase
- [ ] Define retrieval quality targets (Recall@K, MRR)
- [ ] Choose chunking strategy based on document types
- [ ] Select embedding model (dimension, domain fit)
- [ ] Design hybrid search weights
Implementation Phase
- [ ] Implement chunking with overlap handling
- [ ] Set up vector store với proper indexing
- [ ] Add reranking layer
- [ ] Implement citation extraction
Production Phase
- [ ] Monitor retrieval metrics continuously
- [ ] Set up A/B testing for prompt variations
- [ ] Implement feedback loop for relevance
- [ ] Regular re-indexing schedule
🔗 Cross-References
- 📎 Data Modeling - Vector DB - Vector database design patterns
- 📎 LLM Evaluation - RAG evaluation frameworks
- 📎 ML Monitoring - Drift detection for embeddings
- 📎 Prompting Patterns - Context integration in prompts
📚 Further Reading
- "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" - Lewis et al.
- "LlamaIndex Documentation" - RAG implementation patterns
- "LangChain RAG Guide" - Production RAG patterns