Skip to content

Convert Model to ONNX


1. Objective

The objective of this workflow is to decouple the model from the training framework. PyTorch is great for training, but heavy for deployment. ONNX (Open Neural Network Exchange) acts as a universal intermediate representation. Converting to ONNX allows you to run models on ONNX Runtime (ORT), which is often 2-5x faster and supports hardware acceleration (TensorRT, OpenVINO) automatically.


2. Context & Scope

In Scope

This workflow covers Tracing/Scripting the model (PyTorch), Converting Sklearn pipelines, Verifying numerical correctness, and optimizing the ONNX graph.

Assumption: You have a trained model in memory.

Out of Scope

  • Quantization: Converting Float32 to Int8 is a separate optimization step (See "Quantize Model"). Ideally, you convert to ONNX then quantize.
  • Training: ONNX is primarily for Inference.

3. When to Use / When Not to Use

Use This Workflow When

  • Deploying to Edge Devices (Mobile/IoT).
  • Deploying a model trained in PyTorch into a C#/.Net or Java environment.
  • You need faster CPU inference (ORT is highly optimized for AVX512).

Do NOT Use This Workflow When

  • The model uses dynamic, data-dependent control flow that ONNX doesn't support (rare, but happens with complex loops).
  • You are debugging the model (Loss of stack traces in ONNX).

4. Inputs (Required/Optional)

Required Inputs

InputDescriptionFormatExample
MODELThe source object.Objecttorch_model
INPUT_SAMPLEDummy data.Tensortorch.randn(1, 3, 224, 224)

Optional Inputs

InputDescriptionDefaultCondition
OPSETONNX Version.17Newer is better, check Runtime support.

5. Outputs (Artifacts)

ArtifactFormatDestinationQuality Criteria
Model File.onnxDiskValid Graph (Netron check).
BenchmarkTextReportLatency comparison.

6. Operating Modes

Fast Mode

Timebox: 30 minutes Scope: Simple Export. Details: Using torch.onnx.export with default settings and a static input shape.

🎯 Standard Mode (Default)

Timebox: 2 hours Scope: Dynamic Shapes. Details: Configuring "Dynamic Axes" so the model can accept variable Batch Sizes or Sequence Lengths. Verifying outputs match PyTorch outputs within 1e-5 tolerance.

🔬 Deep Mode

Timebox: 1 day Scope: Graph Optimization. Details: Using onnxruntime-tools to fuse layers (Conv+BN), simplify the graph, and tune for specific Execution Providers (e.g., CUDAExecutionProvider).


7. Constraints & Guardrails

Technical Constraints

  • Opset Mismatch: Sometimes PyTorch uses an operator that ONNX Opset 11 didn't have. Use the highest Opset supported by your deployment target's Runtime.
  • Custom Layers: If you have valid custom C++ CUDA layers in PyTorch, they won't export to ONNX unless you write a custom ONNX exporter for them.

Security & Privacy

NOTE

Serialization ONNX is a protobuf format. It is generally safer than Pickle, but malformed files can still exploit parser vulnerabilities.

Compliance

  • Version Lock: Record the PyTorch version and ONNX version used. Discrepancies often cause "Invalid Graph" errors.

8. Procedure

Phase 1: Preparation & Tracing

Objective: Inspect.

Set model to eval mode: model.eval(). Create Dummy Input: dummy_input = torch.randn(...). Note: Tracing runs the input through the model and records operations. If you have if x > 0: logic, Tracing fixes that path. Use Scripting (torch.jit.script) if logic must remain dynamic.

Verify: Model runs on dummy input without error.

Phase 2: Export

Objective: Transpile.

Define Dynamic Axes (Crucial for Batching):

python
dynamic_axes = {
    'input': {0: 'batch_size'},
    'output': {0: 'batch_size'}
}

Export:

python
torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    opset_version=OPSET,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes=dynamic_axes
)

Verify: model.onnx created.

Phase 3: Validation & Inference

Objective: Prove correctness.

Load with ORT: session = onnxruntime.InferenceSession("model.onnx"). Run Inference: ort_out = session.run(None, {"input": numpy_input}) Compare: np.testing.assert_allclose(torch_out, ort_out, rtol=1e-03, atol=1e-05)

Verify: Assertion passes. Results are identical.


9. Technical Considerations

HuggingFace Optimum: If converting Transformers (BERT/GPT), do NOT use raw torch.onnx. Use the optimum library: ORTModelForSequenceClassification.from_pretrained(..., export=True). It handles all the complex configs for you.

Sklearn: Use skl2onnx. It's robust for classic ML (Forests/SVMs).

Visualization: Drag the .onnx file into Netron.app. Inspect the graph. Look for disconnected nodes or weird subgraphs.


10. Quality Gates (Definition of Done)

Checklist

  • [ ] Model exported to ONNX.
  • [ ] Dynamic Batch size works.
  • [ ] Numerical Output matches original.
  • [ ] Latency check (< Original).

Validation

CriterionMethodThreshold
AccuracyMax Abs Error< 1e-4
SpeedORT InferenceFaster or Equal to Torch

11. Failure Modes & Recovery

Failure ModeSymptomsRecovery Action
Unsupported OpExportError: Aten op X not supported.Update Opset version. Or rewrite the PyTorch code to use standard ops (e.g. replace specialized interpolation).
Static ShapeFails when batch=2.You forgot dynamic_axes during export. Re-export.
Accuracy LossOutputs differ significantly.Check for fp16 mismatch. Set model.eval() to disable Dropout/BatchNorm changes.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as a Senior Edge AI Engineer.
Task: Execute the Convert Model to ONNX workflow.

## Objective & Scope
- **Goal**: Decouple model from training framework for optimized, portable inference.
- **Scope**: Tracing/Scripting, Exporting to ONNX, Validation, and Dynamic Shapes.

## Inputs
- [ ] MODEL: Source PyTorch/TF object.
- [ ] INPUT_SAMPLE: Dummy tensor for tracing.
- [ ] OPSET: ONNX Opset version (default 17).

## Output Artifacts
- [ ] ONNX Model File (.onnx)
- [ ] Verification Report (Text)

## Execution Steps
1. **Configure**
   - Set model to eval mode. Define dynamic axes for batching/sequence length.
2. **Export**
   - Run export (torch.onnx or optimum). Handle control flow if needed.
3. **Verify**
   - Load with ONNX Runtime. Assert output matches original model (tolerance 1e-4).

## Quality Gates
- [ ] Model exported.
- [ ] Dynamic Batching works.
- [ ] Numerical Output matches original.
- [ ] Latency is equal or better.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Technical**: Validate Opset compatibility.
- **Reliability**: Ensure graph validity (no disconnected nodes).

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

VersionDateAuthorChanges
1.0.02026-01-14AI Engineering TeamInitial release

Cập nhật lần cuối: