Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1689 Discussions

I can find a solution


Hi, there I am doing this video 5 of. Intel Edge AI Foundations 

there is a question:-  What interference parameters improve when using an INT8 Model instead of an FP32 model.

I have Attempted this question many times and I am unable to find a solution for this.

Also, it doesn't show the correct answer it only shows the same explanation.

Please help me clear this out and one more where we find a solution for this so that in future I would refer to that than finding here as it would take a lot of time to find a single solution. 

Moreover, I can't go back and the video of the quiz keeps popping every time.


Thanks please reply.

0 Kudos
1 Reply

After quantizing a model, usually the accuracy drops (a bit), it will not improve, even not with accuracy-aware-quantization.

With much less resolution/bit-width when using INT8 instead of FP32 the "speed" of inference (i.e. the throughput) usually is much higher.

Similar for the latency - the latency usually is smaller as an INT8 model is much smaller and faster to load, and with less resolution/bit-depth the time for the first inference-result to be available is shorter.

I would vote for the answers "Inference Speed" and "Inference Latency".

Have you already tried this combination?