It’s early morning and the sun is shining, but where is the birdsong? The new bird feeder should be filled with seeds, but it’s empty, and a happy squirrel is scurrying up a nearby tree with the stolen goods. Unfortunately, most modern bird feeders have not been able to prevent this common problem. By bringing the bird feeder into the 21st century, we can examine how deep learning helps keep birdseed for the birds.
In the following, we will explore how to design an image classification solution using the Deep Learning Workbench (DL Workbench). This tool allows us to evaluate, fine-tune, compare, and visualize deep learning model performance on different Intel architecture configurations. For the evaluations, we can import pre-trained models from OpenVINO™ Toolkit Pre-Trained Models (Open Model Zoo) or upload a custom-trained model.
Using our squirrel-versus-bird example, we will examine the basic functionality for image classification using the DL Workbench. We will demonstrate how to access the DL Workbench, evaluate a sample of our dataset on pretrained models, and upload our own Keras model. We will accomplish all of this without installing applications, downloading packages, or configuring development environments on our home machines, by using Intel DevCloud for the Edge.
The first step to designing our squirrel detection device is to assemble a dataset containing images of squirrels and birds. Luckily, we can leverage publicly available datasets and will not need to take and label our own images. For this example, we will take images from the Caltech_Birds_2011 for images of birds and extract squirrel images from the Animals-10 dataset.
The Caltech_Birds_2011 dataset contains images for 200 bird species. We can narrow the images we need to species that are likely to visit our bird feeder. The Animals-10 dataset contains one class for squirrels, but the images show a diverse representation of squirrel species.
Before we can use our dataset to train a classification model, we need to standardize the size and dimensions of our images. For this example, we have standardized the image to 64 x 64 pixels. This is an arbitrarily small image size; however, using this size will make it more accessible and less time-consuming to train the model, if you would like to repeat the actions as you read along. The dataset extraction and standardization can be found in GitHub. The repository also contains a pickle file with the training and validation sets.
The dataset we created uses the Image Net standard for the label file. This is in the annotation.txt file in the validation image folder. This file follows a comma separated value (csv) format as follows:
The Image Net annotation is one of the supported label formats in the DL Workbench. Pascal Visual Object Classes (Pascal VOC), Common Objects in Context (COCO), Common Semantic Segmentation, and Common Super-Resolution annotation formats can also be used. More information on support formats can be found in the DL Workbench documentation here.
Accessing the DL Workbench through Intel DevCloud
Before getting started with the DL Workbench, users need to create an account with Intel DevCloud for the Edge at DevCloud.intel.com/edge.
After creating an account, the DL Workbench can be accessed by selecting “Deep Learning Workbench” from the “Advanced” menu.
Figure 1: Screenshot of how to access the Deep Learning Workbench from the DevCloud homepage
This will load a Jupyter Notebook that will be used to create a new instance of the DL Workbench. Jupyter Notebook is a web application that contains blocks for executing Python code. This notebook contains a code block for executing the Deep Learning Workbench launcher.
Figure 2: The code block contents for running the Deep Learning Workbench launcher
This block of Python code can be run by clicking the “Play” button at the top of the tab, or by pressing Shift+Enter while the code block is selected. This will populate a “Start Application” button below the code block. Clicking this button will create a DL Workbench session. This can take as long as one minute. Once the session has finished loading, a link will appear underneath the button. Selecting the link will create a new web browser tab with the DL Workbench.
Creating a New Configuration
The DL Workbench landing page lists the model configurations that are saved to the user’s profile. With no currently saved models, we can create a new configuration by clicking the “Create” button in the top left corner. This will lead to a new page where we can select the details for our configuration. We need to select a model, choose a target environment, and select a dataset.
To begin, we will import a model from Intel Open Model Zoo. There are numerous models for image classification that have been trained on the Image Net dataset. AlexNet is a well-known convolutional neural network (CNN) model for image classification. We can select this model, then import it to use it in our configuration. AlexNet is a Caffe model and needs to be converted into the OpenVINO™ intermediate format. After hitting the “Import” button, we will be given a prompt to select the precision for the model conversion. The default precision is FP32 (floating-point 32).
After importing a model, we need to select a hardware environment. This is not a virtual machine emulating the hardware. Our model will be loaded on to a physical piece of the selected hardware that will run our experiments.
This is where we need to make a design decision for our squirrel detection device. What computing hardware will we use? Using the DL Workbench, we can compare the performance of different design solutions so that we can make a selection that best fits our needs.
For the sake of our example, we consider two CPU implementations. First, we will imagine we are attaching a camera to a PC and will run our model against an Intel Core CPU. Second, we will look at a low-powered solution on an Intel Atom® processor as if we were creating a mobile edge device.
Let’s start by looking at the PC implementation. We need to select “Intel Core” as the Base Platform from the drop-down menu. This will populate a list of the available Core CPU environments. For our first experiment we will select the Intel® Core™ i7-8665UE Processor.
Figure 3: Screenshot showing how to select a hardware environment
Finally, we need to select a dataset to run against the model we selected. There are two options for selecting a dataset. We can generate a dataset of images consisting of Gaussian noise. This would allow us to benchmark the performance metrics of a configuration without the need to supply “real” data. However, since the generated dataset does not have annotations, we will not be able to receive an accuracy percentage. Currently, the online version of the DL Workbench does not report accuracy after running an experiment; however, the model can be tested on single samples.
For our experiments, we have the SquirrelVsBird data. We can import our dataset by pressing the “Import” button in the “Validation Dataset” section. On the “Import Validation Dataset” page, clicking the “Choose File” button will open a file browser menu where we can select a zip file containing the validation images and an annotation.txt file with the image labels. We can adjust the dataset name in the text field before importing it into the DL Workbench. Once imported, the dataset will be retained for use in the future.
Now that we have selected a model, environment platform, and a dataset set, we can click the “Create” button at the bottom of the page to launch the first execution of our configuration.
Discussion of Results and Visualizations
Launching the first execution runs the validation set against the model with a single stream with a batch of one image. Under the “Analyze” tab, a summary table shows the number of parallel streams and the number of images in a batch. Additional experiments can be run by selecting different stream and batch values in the “Perform” tab’s “Explore Inference Configuration” menu.
Scrolling through this dashboard, the DL Workbench provides several metrics that help us evaluate the performance of our model. The primary measurements are throughput and latency. Throughput is the number of tasks that are completed in a period of time. Latency is the amount of time it takes to complete a single task. In our case, this task consists of inferring if an image is a bird or a squirrel.
Considering throughput and latency will help us decide what hardware we should be selected and how it should be optimized. If we prioritize throughput, we are looking to optimize the amount of data we are processing. If we put a greater emphasis on latency, we will optimize how fast we can make a single inference. For our squirrel detection project, we would be working with a single stream of data and likely want to quickly identify a squirrel and take an action to prevent the theft of our birdseed. This means we should emphasize latency.
Reviewing the remaining graphics on the page gives more information about the performance of different layers of our network. These show where in the network the execution time is being spent. This information can help determine how optimizations are affecting the different layers in our network.
Compare Configurations for the Same Model
After reviewing the metrics for our Core CPU experiments, we want to compare the performance of an edge solution. To do this, we need to create another experiment using Atom hardware. We can start a new experiment by clicking the “Create” button at the top of the page. This will create a new configuration with the model and dataset preset. We can then select “Atom” from the hardware environment drop-down and select a device for executing our experiment. Clicking the “Create’” button at the bottom of the page will run a one batch, one stream experiment. With two experiments, we can compare the execution metrics.
Figure 4: Comparison table between Atom (Target idc008u2g) and Core (idc016ai7) experiments using a trained AlexNet model with the SquirrelVsBird dataset.
Viewing the table, we can see the throughput and latency of our two experiments. As expected, the Core CPU outperforms the Atom CPU. Clicking the “Compare” button adjacent to the “Projects” header launches a new page providing more detailed comparisons between experiments. After selecting our two configurations, the page displays side-by-side graphics showing the throughput and latency for each. Additionally, under the “Kernel-Level Performance” tab, we can see the different layer execution times between the two configurations.
Depending on the precision we need in our application, the latency of the Atom configuration may be acceptable. To improve the latency, we could also look at optimization features; however, if we look at the model, AlexNet is trained on the Image Net dataset and is capable of inferring 1000 classes. Since we are only interested in detecting birds and squirrels, we could create our own model. This model can be imported into the DL Workbench and executed on the Intel hardware configurations.
Importing a Custom Model
The DL Workbench supports importing models in different formats. For this project, a Convolutional Neural Network (CNN) was trained on the SquirrelVsBird dataset we have assembled. The model was created in Intel DevCloud following the Tensorflow-Keras tutorial. The Jupyter Notebook and model files are publicly available in GitHub.
Figure 5: Screenshot of the parameters for importing an original TensorFlow Keras model
Starting from the “Create Configuration” page in the DL Workbench, we can import a custom model by first clicking on the “Import” button, then selecting the “Original Model” tab. To load the SquirrelVsBird model, we select “TensorFlow” from the “Framework” drop-down menu. Selecting TensorFlow Version 2.x will display an “Is Keras Model” checkbox. After clicking the checkbox, the “Select” button will allow us to select the H5 file containing our Keras model.
Importing the model will automatically convert it to OpenVINO™’s Intermediate Representation (IR) format. For this, we need to set additional parameters. In this menu, we can select the floating-point precision and set the color space and the input and output parameters. Adjusting the precision affects the performance of the model. Keep in mind that lower floating-point precision can improve model efficiency but this may come at the cost of accuracy. The current recommendation is floating-point 16. In supported models, this can be optimized to integer 8 in the performance tuning section of the DL Workbench.
For color space, our model was trained using Red Green Blue (RGB) images. Selecting the incorrect color space will affect how the DL Workbench evaluates our model.
The input and outputs can also be configured during the conversion to IR format. Typically, we want to keep the default values that the DL Workbench has extracted from our model file. The first dimension may not be recognized during the import. In this case, the value should match what was used during training. For the SquirrelVSBird model, we are using one 64 x 64 pixel RGB image. The input shape will be 1 x 64 x 64 x 3. More information about configuring the parameters can be found in the tool tips, and in the DL Workbench documentation.
Figure 6: Screenshot of the parameters for converting an original model into OpenVINO™'s intermediate representation (IR)
Once the model is converted, we can run the same experiments we conducted with the AlexNet model we retrieved from Intel Open Model Zoo.
Evaluating a Custom Model
With our custom SquirrelVsBird model imported, we can repeat the previous steps to run the experiments on the same Intel Core and Intel Atom® CPU hardware we used before. Our custom model outperforms the trained AlexNet model on both hardware configurations. The AlexNet model is larger and more generalized than our custom model. The input parameters for AlexNet are 227 x 277 images, compared to the 64 x 64 image size of the custom model. Similarly, there are 1000 AlexNet output parameters compared to two for the SquirrelVsBird model. The custom model also has fewer convolutional layers—three compared to AlexNet’s five.
Comparing the two hardware environment experiments for the custom model, we see the Core processor environment significantly outperforms the Atom configuration, with time to classify a single image at less than one millisecond. The Atom implementation performs the inference on one image in just under seven milliseconds.
Figure 7: Comparison table between Core (idc016ai7) and Atom (Target idc008u2g) experiments using an original model with the SquirrelVsBird dataset—the third experiment in the table uses INT8 precision on the Atom hardware configuration
We can retrieve more information about our model’s performance using the DL Workbench’s Model Visualizer. The “Performance Summary”, “Precision-Level Performance”, and “Kernel-Level Performance” tabs on the “Projects” page provide execution metrics for the model by the layers in the network. In the “Kernel-Level Performance” menu, the Model Visualizer provides a color-coded graph of our model’s network. Under the “Coloring” drop-down menu, we can select “By Execution Time” to color code the graphic to show where our model is spending the most time when performing inference on a sample.
For the SquirrelVsBird model running on the Atom CPU, the second convolution layer is colored red, indicating that this layer took a longer time to complete than the other layers in the network. This graphic is a valuable tool for visualizing how optimizations are affecting the individual layer runtimes when tuning the model. Additional information on the layer execution is present adjacent to the graphic.
Figure 8: Screenshot of the Model Visualizer for the SquirrelVsBird model running on an Atom CPU
Since we are interested in a possible edge device solution for our squirrel-vs-bird detection system, we may want to look for ways to further optimize our model to reduce latency. One of the methods provided in the DL Workbench is reducing the representation to INT8.
INT8 representation refers to the data structure being used to store the values in our network. When we imported our original model, we selected a floating-point 32 precision. This means that values in our network will be stored in a floating-point format in a 32-bit structure. For an INT8 optimization, the floating-point value is converted to an integer, and the data size is reduced to 8-bits. This reduces the memory and computational resources needed to perform inference tasks with the model. Although the INT8 representation improves performance, it may negatively affect the model’s accuracy. More information on how the DL Workbench implements INT8 representation can be found here.
Converting an experiment to use INT8 in the DL Workbench is simple. We can create a new experiment for our original model Atom implementation by selecting the previous Atom experiment and clicking the “Perform” tab. This will display the “Optimize Performance” menu with the INT8 option. Selecting this option and clicking the “Optimize” button will run the experiment with the INT8 configuration.
Figure 9: Screenshot of the “Optimize Performance” menu from the “Perform” tab on the “Projects” page.
Comparing the results of the INT8 optimization with the previous Atom experiment, we can see that changing the precision resulted in reducing the latency by half and doubling the throughput. This still does not reach the metrics of the Core experiment; however, we can see that this optimization may be useful for improving performance, particularly when considering an edge implementation.
For our squirrel-vs-bird problem, using our lightweight custom model reduced the performance on the inference task compared to the AlexNet implementation. For hardware design, we lose performance on the Atom CPU; however, the latency and throughput are sufficient to solve our problem. This allows us to make other considerations for our final implementation. We can look at cost, the infrastructure where our device will be installed, and the need for mobility to design an ideal solution.
Packing and Exporting Configurations
Now that we’ve run some experiments, we are ready to decide on the best hardware for our device. Once we select our hardware configuration, we can create a deployment package that can be installed on the target device. With the experiment configuration selected on the Projects page, as we did with the INT8 optimization, we select the “Create Deployment Package” menu from the “Perform” tab. This menu provides selectable parameters that are used to create a package. Clicking the “Pack” button will create a tar.gz file and initialize a download.
Figure 10: Screenshot of the 'Create Deployment Package' menu from the 'Perform' tab on the 'Projects' page
Conclusion and Other Functions of DL Workbench
Using the DL Workbench, we evaluated how to design and select a hardware configuration for a squirrel and bird classification problem. We covered how to import a trained model from Intel Open Model Zoo and an original model, how to run and compare hardware experiments, how to optimize a model using INT8 precision conversion, and how to export a deployment package. After all of this, we have only scratched the surface of what can be done in the platform. The DL Workbench supports other deep learning applications, such as object detection and style transformation, more optimization methods, and more hardware configurations.
Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, Intel Atom®, Intel Core and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.