I have a trained onnx model that needs to be quantized to INT8. But I want my last fully connected layers are still in FP32 or FP16. So how can I choose specific layers to quantize (or not to quantize)?
PS when I was working with NNCF, I just use parameter ignored_scopes. Maybe is there something similar here at Workbench?