Convert Model to ONNX

1. Objective

The objective of this workflow is to decouple the model from the training framework. PyTorch is great for training, but heavy for deployment. ONNX (Open Neural Network Exchange) acts as a universal intermediate representation. Converting to ONNX allows you to run models on ONNX Runtime (ORT), which is often 2-5x faster and supports hardware acceleration (TensorRT, OpenVINO) automatically.

2. Context & Scope

In Scope

This workflow covers Tracing/Scripting the model (PyTorch), Converting Sklearn pipelines, Verifying numerical correctness, and optimizing the ONNX graph.

Assumption: You have a trained model in memory.

Out of Scope

Quantization: Converting Float32 to Int8 is a separate optimization step (See "Quantize Model"). Ideally, you convert to ONNX then quantize.
Training: ONNX is primarily for Inference.

3. When to Use / When Not to Use

✅ Use This Workflow When

Deploying to Edge Devices (Mobile/IoT).
Deploying a model trained in PyTorch into a C#/.Net or Java environment.
You need faster CPU inference (ORT is highly optimized for AVX512).

❌ Do NOT Use This Workflow When

The model uses dynamic, data-dependent control flow that ONNX doesn't support (rare, but happens with complex loops).
You are debugging the model (Loss of stack traces in ONNX).

4. Inputs (Required/Optional)

Required Inputs

Input	Description	Format	Example
MODEL	The source object.	Object	`torch_model`
INPUT_SAMPLE	Dummy data.	Tensor	`torch.randn(1, 3, 224, 224)`

Optional Inputs

Input	Description	Default	Condition
OPSET	ONNX Version.	`17`	Newer is better, check Runtime support.

5. Outputs (Artifacts)

Artifact	Format	Destination	Quality Criteria
Model File	`.onnx`	Disk	Valid Graph (Netron check).
Benchmark	Text	Report	Latency comparison.

6. Operating Modes

⚡ Fast Mode

Timebox: 30 minutes Scope: Simple Export. Details: Using torch.onnx.export with default settings and a static input shape.

🎯 Standard Mode (Default)

Timebox: 2 hours Scope: Dynamic Shapes. Details: Configuring "Dynamic Axes" so the model can accept variable Batch Sizes or Sequence Lengths. Verifying outputs match PyTorch outputs within 1e-5 tolerance.

🔬 Deep Mode

Timebox: 1 day Scope: Graph Optimization. Details: Using onnxruntime-tools to fuse layers (Conv+BN), simplify the graph, and tune for specific Execution Providers (e.g., CUDAExecutionProvider).

7. Constraints & Guardrails

Technical Constraints

Opset Mismatch: Sometimes PyTorch uses an operator that ONNX Opset 11 didn't have. Use the highest Opset supported by your deployment target's Runtime.
Custom Layers: If you have valid custom C++ CUDA layers in PyTorch, they won't export to ONNX unless you write a custom ONNX exporter for them.

Security & Privacy

NOTE

Serialization ONNX is a protobuf format. It is generally safer than Pickle, but malformed files can still exploit parser vulnerabilities.

Compliance

Version Lock: Record the PyTorch version and ONNX version used. Discrepancies often cause "Invalid Graph" errors.

8. Procedure

Phase 1: Preparation & Tracing

Objective: Inspect.

Set model to eval mode: model.eval(). Create Dummy Input: dummy_input = torch.randn(...). Note: Tracing runs the input through the model and records operations. If you have if x > 0: logic, Tracing fixes that path. Use Scripting (torch.jit.script) if logic must remain dynamic.

Verify: Model runs on dummy input without error.

Phase 2: Export

Objective: Transpile.

Define Dynamic Axes (Crucial for Batching):

python

dynamic_axes = {
    'input': {0: 'batch_size'},
    'output': {0: 'batch_size'}
}

Export:

python

torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    opset_version=OPSET,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes=dynamic_axes
)

Verify: model.onnx created.

Phase 3: Validation & Inference

Objective: Prove correctness.

Load with ORT: session = onnxruntime.InferenceSession("model.onnx"). Run Inference: ort_out = session.run(None, {"input": numpy_input}) Compare: np.testing.assert_allclose(torch_out, ort_out, rtol=1e-03, atol=1e-05)

Verify: Assertion passes. Results are identical.

9. Technical Considerations

HuggingFace Optimum: If converting Transformers (BERT/GPT), do NOT use raw torch.onnx. Use the optimum library: ORTModelForSequenceClassification.from_pretrained(..., export=True). It handles all the complex configs for you.

Sklearn: Use skl2onnx. It's robust for classic ML (Forests/SVMs).

Visualization: Drag the .onnx file into Netron.app. Inspect the graph. Look for disconnected nodes or weird subgraphs.

10. Quality Gates (Definition of Done)

Checklist

[ ] Model exported to ONNX.
[ ] Dynamic Batch size works.
[ ] Numerical Output matches original.
[ ] Latency check (< Original).

Validation

Criterion	Method	Threshold
Accuracy	Max Abs Error	< 1e-4
Speed	ORT Inference	Faster or Equal to Torch

11. Failure Modes & Recovery

Failure Mode	Symptoms	Recovery Action
Unsupported Op	`ExportError: Aten op X not supported`.	Update Opset version. Or rewrite the PyTorch code to use standard ops (e.g. replace specialized interpolation).
Static Shape	Fails when batch=2.	You forgot `dynamic_axes` during export. Re-export.
Accuracy Loss	Outputs differ significantly.	Check for `fp16` mismatch. Set `model.eval()` to disable Dropout/BatchNorm changes.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text

Role: Act as a Senior Edge AI Engineer.
Task: Execute the Convert Model to ONNX workflow.

## Objective & Scope
- **Goal**: Decouple model from training framework for optimized, portable inference.
- **Scope**: Tracing/Scripting, Exporting to ONNX, Validation, and Dynamic Shapes.

## Inputs
- [ ] MODEL: Source PyTorch/TF object.
- [ ] INPUT_SAMPLE: Dummy tensor for tracing.
- [ ] OPSET: ONNX Opset version (default 17).

## Output Artifacts
- [ ] ONNX Model File (.onnx)
- [ ] Verification Report (Text)

## Execution Steps
1. **Configure**
   - Set model to eval mode. Define dynamic axes for batching/sequence length.
2. **Export**
   - Run export (torch.onnx or optimum). Handle control flow if needed.
3. **Verify**
   - Load with ONNX Runtime. Assert output matches original model (tolerance 1e-4).

## Quality Gates
- [ ] Model exported.
- [ ] Dynamic Batching works.
- [ ] Numerical Output matches original.
- [ ] Latency is equal or better.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Technical**: Validate Opset compatibility.
- **Reliability**: Ensure graph validity (no disconnected nodes).

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

Version	Date	Author	Changes
1.0.0	2026-01-14	AI Engineering Team	Initial release

Convert Model to ONNX ​

1. Objective ​

2. Context & Scope ​

In Scope ​

Out of Scope ​

3. When to Use / When Not to Use ​

✅ Use This Workflow When ​

❌ Do NOT Use This Workflow When ​

4. Inputs (Required/Optional) ​

Required Inputs ​

Optional Inputs ​

5. Outputs (Artifacts) ​

6. Operating Modes ​

⚡ Fast Mode ​

🎯 Standard Mode (Default) ​

🔬 Deep Mode ​

7. Constraints & Guardrails ​

Technical Constraints ​

Security & Privacy ​

Compliance ​

8. Procedure ​

Phase 1: Preparation & Tracing ​

Phase 2: Export ​

Phase 3: Validation & Inference ​

9. Technical Considerations ​

10. Quality Gates (Definition of Done) ​

Checklist ​

Validation ​

11. Failure Modes & Recovery ​

12. Copy-Paste Prompt ​

Appendix: Change Log ​