Giao diện
Fine-Tune LLM (LoRA)
1. Purpose
Specialize a generic genius (GPT/Llama) into a domain expert (Legal/Medical). LoRA freezes the main model and trains tiny "adapter" layers, reducing VRAM usage by 90% and preventing "Catastrophic Forgetting".
2. When to Use / When Not to Use
Use This Workflow When
- Base model fails to follow a specific format (JSON, SQL).
- Domain language is unique (Medical jargon).
- You have high-quality example pairs.
Do NOT Use This Workflow When
- You just need to add knowledge (Use RAG). Fine-tuning is for Behavior, RAG is for Facts.
- You have < 100 examples (Prompts engineering is better).
- You want to teach the model "Reasoning" (Very hard to fine-tune, requires RLHF).
3. Inputs
Required Inputs
- [[BASE_MODEL]]: e.g.,
meta-llama/Llama-2-7b-hf. - [[DATASET_PATH]]: JSONL file with
{"instruction": "...", "output": "..."}. - [[OUTPUT_ADAPTER]]: Directory to save weights.
4. Outputs
- Adapter:
adapter_model.bin(~100MB). - Merged Model: (Optional) Base + Adapter fused for faster inference.
5. Preconditions
- GPU with adequate VRAM (e.g., A10G or local RTX 3090/4090).
- HuggingFace Token (if using Gated models).
peft,transformers,bitsandbytesinstalled.
6. Procedure
Phase 1: Setup
Action: Quantize Base Model.
- Expected Output: Load model in 4-bit (QLoRA) using
BitsAndBytesConfig. - Notes: Crucial for memory. 7B model fits on 16GB VRAM.
- Expected Output: Load model in 4-bit (QLoRA) using
Action: Prepare LoRA Config.
- Expected Output:
LoraConfig(r=8, lora_alpha=32, target_modules=["q_proj", "v_proj"]). - Notes:
ris rank. Higher = more parameters to train, but diminishing returns.
- Expected Output:
Phase 2: Training
Action: Tokenize Data.
- Expected Output: Inputs padded to consistent length (e.g., 512 or 2048).
Action: Run Trainer.
- Expected Output: Training Loop. Loss should decrease.
- Notes: Watch for Overfitting (Loss goes to 0, Validation Loss goes up).
Phase 3: Validation
- Action: Inference Test.
- Expected Output: Compare
Base ModelvsBase + Adapteron a test prompt.
- Expected Output: Compare
7. Quality Gates
- [ ] Loss Convergence: Training loss steadily declined.
- [ ] Format Compliance: Output strictly follows desired format (if training for format).
- [ ] Safety: Model didn't unlearn safety guardrails (Run basic red-teaming).
8. Failure Handling
OOM (Out of Memory)
- Symptoms: CUDA OOM error.
- Recovery: Reduce Batch Size (try 1 with Gradient Accumulation). Reduce Context Length. Use 4-bit loading.
Collapse (Gibberish Output)
- Symptoms: Model outputs repeating loops or
NaN. - Recovery: Learning Rate too high. Reduce by 10x. Check dataset quality (Garbage In, Garbage Out).
9. Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as a LLM Engineer.
Task: Execute the LoRA Fine-Tining workflow.
## Objective
Fine-tune [[BASE_MODEL]] on [[DATASET_PATH]].
## Inputs
- **Output**: [[OUTPUT_ADAPTER]]
## Procedure
Execute the following phases:
1. **Load**:
- `AutoModelForCausalLM` with `load_in_4bit=True`.
- Apply `prepare_model_for_kbit_training`.
2. **Config**:
- Define `LoraConfig`. Target `q_proj`, `v_proj`.
- Define `TrainingArguments` (lr=2e-4, batch=1).
3. **Train**:
- Use `SFTTrainer` (Supervised Fine-tuning).
- Save adapter.
## Quality Gates
- [ ] Gradient Checkpointing enabled (Save VRAM).
- [ ] Validation Set included.
- [ ] Loss logged to Tensorboard/W&B.
## Constraints
- Output: Python Training Script.
- Library: HF PEFT + Transformers.
## Command
Write the training script.