Creating, editing, and merging ONNX pipelines

towards-data-science

This post was originally published by Maurits Kaptein at Towards Data Science - Medium Tagged


Visualizing a simple image processing pipeline (image by author).

ONNX is an amazingly useful format for storing Data Science / AI artifacts for version control and deployment. We are happy to share sclblonnx, a python package that enables easy editing and augmenting of ONNX graphs.

Over the last year at Scailable we have heavily been using ONNX as a tool for storing Data Science / AI artifacts: an ONNX graph effectively specifies all the necessary operations to be carried out on input data to generate a desired output. As such, ONNX can be used to not only store complex AL/ML models (as popular ML frameworks like TensorFlow and PyTorch will allow you to do out-of-the-box), but also all the pre- and post-processing that is neccesary for the deployment of the trained model in some specific context. Thus, we use ONNX as a tool for version control of DS / AI models and pipelines. Furthermore, once a pipeline is available in ONNX, it can easily be deployed efficiently.

Because at Scailable we use ONNX often, and because our use of ONNX models/pipelines almost always extends (far) beyond simply storing a fitted model in a single environment to use it in that exact same environment later on, we often find ourselves in the situation that we would like to inspect, alter, test, or merge existing ONNX graphs. For example, we often add image resizing to an existing vision model such that the resulting ONNX pipeline can be put into production for camera’s with different resolutions. However, the existing onnx.helper API is, in our view, a bit challenging to use. Thus, internally we are developing (and continuously trying to improve) a higher level API for the manipulation of ONNX graphs. Today we are open-sourcing our current version; please do download, explore, append, or submit issues / feature requests.

In the remainder of this article I will provide an overview of the sclblonnx python package which intends to make manually manipulating ONNX graphs easy.

In its bare essence, the sclblonnx package provides a number of high-level utility functions to deal with ONNX graphs. We try to use a consistent syntax which looks as follows:

# Import the package
import sclblonnx as so
# Assuming we have a graph object g:
g = so.FUNCTION(g, ...)

Thus, the we provide a number of functions to operate on a graph (and often alter the graph) which result in an updated version of the graph. Common functions are:

  • add_node(g, node): Add a node to an existing graph (and yeah, obviously you can also delete_node(g, node)).
  • add_input(g, input): Add a new input to an existing graph. You can also delete or change inputs.
  • add_output(g, output): Add a new output to an exsiting graph.
  • add_constant(g, constant): Add a constant to a graph.
  • clean(g): Clean up the graph; this is rather important as often exported graphs are bloated or not fully consistent.
  • check(g): Check whether the graph is valid, can be run, and can be deployed using Scailable (the latter you can turn off)
  • display(g): Visually inspect the graph using Netron.
  • merge(g1, g2, outputs, inputs): Merge two (sub) graphs into a single graph. E.g., add preprocessing to a trained model.

Note that the ONNX graph is not the only thing that stored when you export a model to ONNX from your favorite training tools: what will be stored is an ONNX model (the content of the .onnx file) which contains the graph, and a description of the software / versions used to generate the graph. Thus, once you open a model using sclblonnx the package the package will distill the graph and, if you use the package to store the opened graph to .onnx — again even without editing the model — the stored model will be different than the original as now it will be generated by the sclblonnx package.

The easiest way to introduce the package is by example; we have provided a number of them with the package itself. The first example creates a super simple ONNX graph that adds two numbers.

First, let’s create an empty graph:

g = so.empty_graph()

Now that we have empty graph g, we can start adding nodes and inputs and outputs to it:

# Add a node to the graph.
n1 = so.node('Add', inputs=['x1', 'x2'], outputs=['sum'])
g = so.add_node(g, n1)
# Add inputs:
g = so.add_input(g, 'x1', "FLOAT", [1])
g = so.add_input(g, 'x2', "FLOAT", [1])
# And, add an output.
g = so.add_output(g, 'sum', "FLOAT", [1])

That’s it really, we have just created our first functioning ONNX graph. However, we might like to clean, check, and try it:

# First, let's clean the graph (not really necessary here) 
g = so.clean(g)
# Next, lets see if it passes all checks:
so.check(g)
# Display the graph
display(g)
# Evaluate the graph:
example = {
"x1": np.array([1.2]).astype(np.float32),
"x2": np.array([2.5]).astype(np.float32)
}
result = so.run(g,
inputs=example,
outputs=["sum"])
print(result)

The last line prints 3.7, which is reasonable.

Finally, we can store the model:

so.graph_to_file(g, "filename.onnx")

Perhaps more useful than creating ONNX graph to add two numbers from scratch, is merging two existing — potentially complex — ONNX graphs; the merging of two or more graphs is how one creates a pipeline.

Merging is relatively easy using sclblonnx (although admittedly there might be edge cases that we have not considered yet; one of our motivations for opening up the package is to get it tested beyond our own use cases: please do submit an issue if you find any problem and do feel free to submit changes). Here is an example:

# Open the graphs. 
sg1 = so.graph_from_file("resize-image-450x600-300x400.onnx")
sg2 = so.graph_from_file("check-container.onnx")
# Merge the two graphs
g = so.merge(sg1, sg2, outputs=["small_image"], inputs=["in"])

The code above opens two existing graphs. The first has small_image as output, while the second has in as input. Jointly, these two graphs a) resize and image, and b) check whether the container in the image is empty. You can find the working example here.

The sclblonnx package makes it easy to edit, alter, check, and merge ONNX graphs. The package is still under development; if you find any issues please do share them, and please do submit any improvements. Again, we feel ONNX is a great tool for storing and managing data processing pipelines that need to be used across devices / platforms. We hope the sclblonnx package helps to improve the utility of ONNX.

Enjoy!

It’s good to note my own involvement here: I am a professor of Data Science at the Jheronimus Academy of Data Science and one of the cofounders of Scailable. Thus, no doubt, I have a vested interest in Scailable; I have an interest in making it grow such that we can finally bring AI to production and deliver on its promises. The opinions expressed here are my own.

Spread the word

This post was originally published by Maurits Kaptein at Towards Data Science - Medium Tagged

Related posts