[Q&A] Pre-trained Models and Training Extensions

Zoe_C_Intel · ‎03-10-2021

Last week, we hosted a live webinar all around using pre-trained models and fine-tuning/re-training with Training Extensions on GitHub: https://github.com/openvinotoolkit/training_extensions

The webinar was on: "Take the Stress Out of Going from Training to Inferencing with the OpenVINO™ toolkit"

Here's a copy of the presentation. Here are the answers to the questions we had received during the webinar!

Q: What is OpenVINO used for?

A: It is used for optimizing your models and then giving you the flexibility to deploy a variety of hardware options, like CPU, Movidius VPUs, FPGAs and other accelerators.

Q: Do you have pre-trained models using 3D data?

A: Yes, there’s a great example for Segmentation of Thoracic Organs using Pixel Shuffle on GitHub.

Q: Is it possible to reach the precision accuracy of 1 if you do a lot of re-training and fine-tuning?

A: We have never seen this. Even for a human being, the accuracy will never be 1 for reasonably complicated problems and any deep learning computer vision system is bounded by the annotation accuracy. So if humans can get a 100% accuracy, then the annotation can be 100% accurate, then the model can be 100% accurate. However, perhaps if you’re recognizing a black and white, 1px, possible, you can try that

Q: How do you figure out if the dataset is big enough?

A: You have to export, evaluate and visualize. You have to do the iterations to learn from your actual data in the field. It depends on your use case and your business. First thing to take into account is having a good validation dataset is very critical. Make sure the validation dataset covers the necessary use cases and are distinct and diverse enough. Essentially, when you have the accuracy of this dataset and satisfaction is high, then you’d know you have enough. This is the general idea. Secondly, as classes as more visually complex, you would need more data. For example, like a license plate, which is a fairly simple rectangular object with a good contrast with a background will be fairly easy to detect and recognize, so you would need a smaller amount to learn than a face, which is more complicated with its features, and then the person as an entire figure, is even harder to detect, because the shapes are different. Therefore, the more complicated your classes are visually, the more data you would typically need. Good starting point is thousands for simpler classes, and then tens of thousands for more complicated classes.

Hope you found the webinar insightful! If you have any other questions, feel free to follow up on this thread.

DarkHorse · ‎03-31-2021

Hi @Zoe_C_Intel ,

Do you have any link or guideline on creating different labels file for different pre-trained models especially for object detection or object classification.

Because so far I don't see any link or guidelines on how to create different labels file.

Thanks.

ilya-krylov · ‎04-06-2021

Hi @DarkHorse

Classification labels are taken from dataset structure : https://github.com/openvinotoolkit/training_extensions/tree/develop/models/image_classification/model_templates/custom-classification#3-prepare-data

Object detection labels are taken from the 'classes' parameter https://github.com/openvinotoolkit/training_extensions/tree/develop/models/object_detection/model_templates/custom-object-detection#6-training

Annotation for image classification and object detection can be made using CVAT (https://github.com/openvinotoolkit/cvat). Corresponding dataset formats are ImageNet 1.0 and COCO 1.0.

[Q&A] Pre-trained Models and Training Extensions

open model zoo