- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm tsting how much inference gets faster.
I've alredy tested compression algorithms uisng intel-NNCF.
When I checked information in Web related Pytorch, I found that pre-process of a image can also make the inference faster a bit. Two exsamples are introduced there.
1). max_length:
Limits the input max_length to make the input data lighter.
from transformers import BertTokenizer MAX_LENGTH = 512 tokenizer = BertTokenizer.from_pretrained("hoge_pretrain") data = tokenizer.encode_plus( TEXT, add_special_tokens=True, max_length=MAX_LENGTH, padding="max_length", truncation=True, return_tensors="pt", )
2). do_not_pad
This method can be used when inferring with batch_size == 1. Normally, padding of input data is required for batch inference, but in the situation of batch_size == 1, it can be inferred without padding.
from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("hoge_pretrain") data = tokenizer.encode_plus( TEXT, add_special_tokens=True, max_length=512, padding="do_not_pad", truncation=True, return_tensors="pt", )
These methods are for text/lungauge related inference.
Do you know it there exists any similar pre-processing techniquefor the inference of image classificoation task??
Best regards!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timosy,
Yes, you are correct. This GitHub discussion might useful.
During quantization, the process inserts an operation called FakeQuantize into the model graph.
During runtime, these FakeQuantize layers convert the input to the convolution layer into Int8. For example, if the next convolutional layer has Int8 weights, then the input to that layer will also be converted to Int8. Further on, the precision however depends on the next operation. If the next operation requires a full-precision format, then the inputs will be reconverted to full-precision during runtime.
Regards,
Peh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timosy,
Thanks for sharing this information with us.
For preprocessing in OpenVINO™, we usually resize the input image, convert colour format, convert U8 to FP32 precision and change layout. You can refer to Optimize Preprocessing for more details.
In addition, using model caching can help to increase the inferencing which minimizes model’s read and load time. This is because the application’s code can load saved file and don’t perform preprocessing anymore. You can refer to Use Case - Integrate and Save Preprocessing Steps Into IR for more information.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the useful infomation above.
I have another question. I'm currently testing INT8 model compressed with NNCF. Is it possible to input Integer data (image or whatever) to the compressed model indated of Float data. If its possible, inference gets more faster thoght accuracy might get low.
A function to handle such conversion exsits in the openVino framework?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried below preprocessing with INT8 model
if 1:
# https://docs.openvino.ai/2022.1/openvino_docs_OV_UG_Preprocessing_Overview.html
ppp = PrePostProcessor(ir_model)
# no index/name is needed if model has one input
# N=1, C=3, H=224, W=224
ppp.input().model().set_layout(Layout('NCHW'))
#ppp.input().preprocess() \ # just speed test, not necessary
# .mean([0.5029, 0.4375, 0.3465]) .scale([0.2818, 0.2659, 0.2629])
# First define data type for your tensor
ppp.input().tensor().set_element_type(Type.u8)
# Then define preprocessing step
#ppp.input().preprocess().convert_element_type(Type.f32)
ppp.input().preprocess().convert_element_type(Type.u8)
# Model expects shape {1, 3, 480, 640}
ppp.input().preprocess().convert_layout([0, 3, 1, 2])
print(f'Dump preprocessor: {ppp}')
What I got as Dump as follows
Dump preprocessor: Input "input.0":
User's input tensor: {1,2048,2048,3}, [N,H,W,C], u8
Model's expected tensor: {1,3,2048,2048}, [N,C,H,W], f32
Pre-processing steps (2):
convert type (u8): ({1,2048,2048,3}, [N,H,W,C], u8) -> ({1,2048,2048,3}, [N,H,W,C], u8)
convert layout (0,3,1,2): ({1,2048,2048,3}, [N,H,W,C], u8) -> ({1,3,2048,2048}, [N,C,H,W], u8)
Implicit pre-processing steps (1):
convert type (f32): ({1,3,2048,2048}, [N,C,H,W], u8) -> ({1,3,2048,2048}, [N,C,H,W], f32)
The inference speed does not get fast.
I'm mistaking something ?
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timosy,
First and foremost, we don’t have this conversion option for Model Optimizer.
Next, preprocessing in OpenVINO™ is used for perfectly fit the input data to Neural Network model input tensor. It is not used to increase the inference speed. It is recommended to use model caching if increasing inference speed is critical for you.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your comments. I'd like to confirm my understanding.
Simply speaking, there is no support to input a "INT" tensor to an INT8-model so that we make inference faster more compared to a input of genral "Float" tensor, currently.
Is this correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timosy,
Yes, you are correct. This GitHub discussion might useful.
During quantization, the process inserts an operation called FakeQuantize into the model graph.
During runtime, these FakeQuantize layers convert the input to the convolution layer into Int8. For example, if the next convolutional layer has Int8 weights, then the input to that layer will also be converted to Int8. Further on, the precision however depends on the next operation. If the next operation requires a full-precision format, then the inputs will be reconverted to full-precision during runtime.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timosy,
This thread will no longer be monitored since we have provided answers and suggestions. If you need any additional information from Intel, please submit a new question.
Regards,
Peh
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page