Bring your own dataset and retrain a TensorFlow model with OpenVINO™ toolkit

Stephanie_Maluso · ‎06-30-2022

Authors: Stewart Christie and Ragesh Hajela

Machine learning requires us to have existing data — not the data our application will use when we run it, but data to learn from. You need a lot of real data, in fact, the more the better. The more examples you provide, the better the computer should be able to learn. It’s the most crucial aspect that makes algorithm training possible and explains why machine learning became so popular in recent years.

The OpenVINO™ Notebooks repo on GitHub is a collection of ready-to-run Jupyter Notebooks, for learning and experimenting with the OpenVINO™ toolkit. The notebooks provide an introduction to OpenVINO™ basics and teach developers how to leverage our APIs for optimized deep learning inference. One of the examples is a notebook titled 301-tensorflow-training-openvino.ipynb, where you can train a model using TensorFlow*, and then run it in both the native TensorFlow and then using the OpenVINO™ Toolkit. It demonstrates an end-to-end deep learning training tutorial which borrows the open source code from the TensorFlow image classification tutorial, demonstrating how to train the model and then convert to OpenVINO™ Intermediate Representation (IR). It leverages the tf_flowers dataset which includes about 3,700 photos of flowers.

But, how to modify this notebook, retrain the same model but with a different dataset. Let’s say, instead of flowers, you want to use fruits dataset, retrain the same model and run inference on this new dataset. What all changes would be required?

Let’s pick an interesting new dataset Fruits-360. It’s a dataset of images containing fruits and vegetables. Unlike the name suggests, it’s not another dataset of 360 fruits, instead is a collection of 131 fruits and vegetables. This dataset was created by mounting these fruits and vegetables on a stick that was rotated 360 degrees, hence the name. Check out the research paper to understand how these dataset images were post processed and prepared for training and testing purpose.

To set the baseline, launch the existing notebook with a sample from flowers dataset and run inference.

# Pre-process the image and get it ready for inference
inp_img_url = 
"https://upload.wikimedia.org/wikipedia/commons/4/48/A_Close_Up_Photo_of_a_Dandelion.jpg"
file_path = Path("output")/Path("A_Close_Up_Photo_of_a_Dandelion.jpg")
input_image = pre_process_image(file_path)
# Run inference on the input image
res = compiled_model([input_image])[output_layer]
score = tf.nn.softmax(res[0])
# Show the results
image = Image.open(file_path)
plt.imshow(image)
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)

Here is the classification result for this sample image of a flower:

Now, to use the Fruits-360 dataset, there would be some obvious modifications in the Jupyter notebook along with few technical adaptations. This table shows the first level changes required and you will notice that one of most notable differences is about the image size. Fruits-360 dataset consists of images with size 100x100 pixels.

Apply all the necessary code changes, run inference on the newly trained model and try classification results on any fruit or vegetable, for e.g. pick Red Potato or Cauliflower images below from the dataset.

To know about all the code changes and learn the steps which would be required to accomplish this, take up a free 20-minutes tutorial titled “BYOD- Retrain a Tensorflow Model Using Your Data Set”. It’s hosted on Intel Learning portal where there are tons of courses around AI that you can take. Just sign-up and there’s a complete curriculum for newcomers to OpenVINO™ which is 10 part learning plan, along with many newly launched training courses. Happy learning!

Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more on the Performance Index site.

No product or component can be absolutely secure.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.