Hi, there I am doing this video 5 of. Intel Edge AI Foundations
there is a question:- What interference parameters improve when using an INT8 Model instead of an FP32 model.
I have Attempted this question many times and I am unable to find a solution for this.
Also, it doesn't show the correct answer it only shows the same explanation.
Please help me clear this out and one more where we find a solution for this so that in future I would refer to that than finding here as it would take a lot of time to find a single solution.
Moreover, I can't go back and the video of the quiz keeps popping every time.
Thanks please reply.
After quantizing a model, usually the accuracy drops (a bit), it will not improve, even not with accuracy-aware-quantization.
With much less resolution/bit-width when using INT8 instead of FP32 the "speed" of inference (i.e. the throughput) usually is much higher.
Similar for the latency - the latency usually is smaller as an INT8 model is much smaller and faster to load, and with less resolution/bit-depth the time for the first inference-result to be available is shorter.
I would vote for the answers "Inference Speed" and "Inference Latency".
Have you already tried this combination?