Re: I can find a solution

Aman1901 · ‎01-14-2022

Hi, there I am doing this video 5 of. Intel Edge AI Foundations

there is a question:- What interference parameters improve when using an INT8 Model instead of an FP32 model.

I have Attempted this question many times and I am unable to find a solution for this.

Also, it doesn't show the correct answer it only shows the same explanation.

Please help me clear this out and one more where we find a solution for this so that in future I would refer to that than finding here as it would take a lot of time to find a single solution.

Moreover, I can't go back and the video of the quiz keeps popping every time.

Thanks please reply.

Markus_B_Intel · ‎01-16-2022

After quantizing a model, usually the accuracy drops (a bit), it will not improve, even not with accuracy-aware-quantization.

With much less resolution/bit-width when using INT8 instead of FP32 the "speed" of inference (i.e. the throughput) usually is much higher.

Similar for the latency - the latency usually is smaller as an INT8 model is much smaller and faster to load, and with less resolution/bit-depth the time for the first inference-result to be available is shorter.

I would vote for the answers "Inference Speed" and "Inference Latency".

Have you already tried this combination?