Label Data with Active Learning

1. Objective

To execute Label Data with Active Learning for high-performing AI systems.

2. When to use / When not to use

When to use:

When working on label data with active learning.

When not to use:

Without sufficient data.

3. Inputs (Required/Optional)

Required:

Training Data
Model Config

Optional:

Relevant documentation

4. Outputs (Artifacts)

Model Artifact: Description of the Model Artifact artifact.
Metrics Report: Description of the Metrics Report artifact.

5. Operating Modes

Mode	Description	Verification Level
Fast	Focus on speed, minimal validation.	Basic syntax/lint checks only.
Standard	Balanced approach.	Unit tests and standard linting.
Deep	Comprehensive analysis and optimization.	Full test suite, performance profiling, security scan.

6. Constraints & Guardrails

No Broken Builds: Ensure all changes pass the build process.
Code Style: Strictly adhere to the project's linting and formatting rules.
Security: Do not introduce new vulnerabilities; sanitize all inputs.
Performance: Avoid O(n^2) or worse complexity unless strictly necessary and documented.
Testing: Maintain or improve code coverage; do not degrade it.

7. Procedure

Phase 1: Data Prep

Clean data.
Split train/test.
Normalize features.

Phase 2: Training

Configure model.
Run training loop.
Log metrics.

Phase 3: Evaluation

Calculate accuracy/F1.
Check bias.
Save artifact.

8. Quality Gates (Definition of Done)

[ ] code compiles/runs without errors.
[ ] All new components include identical or improved test coverage.
[ ] No new linting errors or warnings introduced.
[ ] Documentation updated (inline and external).
[ ] Security scan passes (no high/critical severities).

9. Failure Modes & Recovery

Failure Mode	Recovery Action
Build Failure	Check error logs, revert recent changes, verify dependencies.
Test Failure	Isolate failing test, debug logic, or update test if requirements changed.
Linting Error	Run auto-formatter and manually fix remaining issues.
Merge Conflict	Rebase on main, resolve conflicts manually, run tests again.

10. Copy-Paste Prompt

text

Role: Act as a Senior ML Engineer.
Task: Execute the Label Data with Active Learning workflow.

## Objective & Scope
- **Goal**: Efficiently label training data by selecting the most uncertain/informative samples.
- **Scope**: Model Inference, Uncertainty Sampling, Labeling Interface, and Retraining loop.

## Inputs
- [ ] UNLABELED_POOL: Dataset of unlabeled samples.
- [ ] MODEL: Current model (if exists) or Cold Start heuristic.
- [ ] BUDGET: Number of samples to label (e.g., 100).

## Output Artifacts
- [ ] Labeled Dataset Batch
- [ ] Improved Model Checkpoint

## Execution Steps
1. **Select**
   - Run inference on pool. Calculate Uncertainty (Entropy/Margin). Select top BUDGET samples.
2. **Label**
   - Present samples to annotator (Human-in-the-loop). Store labels.
3. **Train**
   - Retrain model on accumulated labeled data. Evaluate improvement.

## Quality Gates
- [ ] High-uncertainty samples selected.
- [ ] Annotations verified for quality.
- [ ] Model performance improved after retraining.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Cost**: Optimize for minimal human labeling effort.
- **Bias**: Balance uncertainty sampling with random sampling to avoid mode collapse.

## Command
Now execute this workflow step-by-step.

Label Data with Active Learning ​

1. Objective ​

2. When to use / When not to use ​

3. Inputs (Required/Optional) ​

4. Outputs (Artifacts) ​

5. Operating Modes ​

6. Constraints & Guardrails ​

7. Procedure ​

Phase 1: Data Prep ​

Phase 2: Training ​

Phase 3: Evaluation ​

8. Quality Gates (Definition of Done) ​

9. Failure Modes & Recovery ​

10. Copy-Paste Prompt ​

Label Data with Active Learning

1. Objective

2. When to use / When not to use

3. Inputs (Required/Optional)

4. Outputs (Artifacts)

5. Operating Modes

6. Constraints & Guardrails

7. Procedure

Phase 1: Data Prep

Phase 2: Training

Phase 3: Evaluation

8. Quality Gates (Definition of Done)

9. Failure Modes & Recovery

10. Copy-Paste Prompt