Solved: Data size after pruning using NNCF

timosy · ‎09-02-2022

Relating this post:

https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/difference-between-sparce-and-pruning/m-p/1409037#M28160

> .... such as filter pruning. The important advantage of this method is that it is generic and does not require special HW instructions. Currently, two filter pruning techniques are supported:

and a paper: https://arxiv.org/pdf/2002.08679.pdf

> Filter pruning ... NNCF also supports structured pruning for convolutional neural networks in the form of filter pruning.

This means, when applying the structure pruning, can I get a small model which allows me to make fast inference? But, when reading this instruction on NNCF pruning: https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Pruning.md

I did not find a mention on the structure pruning itself and how much the structure pruning has advantage on the inference speed... Please tell me comments on the structure pruning? Is it available?

dlyakhov · ‎09-20-2022

I got your previous message: you couldn't run pruning transformation.
Please try this example:

# Assume you have openvino_dev==2022.1 and nncf installed
# Go to examples dir
cd nncf/examples/torch/classification
# Export pruned model
python main.py --config configs/pruning/resnet50_imagenet_accuracy_aware.json --mode export --to-onnx resnet50_pruned.onnx --cpu-only
# Convert without pruning
 mo --input_model resnet50_pruned.onnx -o not_pruned
# Convert with pruning
 mo --input_model resnet50_pruned.onnx --transform=Pruning -o pruned
# Check IR sizes
du -h
# my output is 
# ...
# 89M     ./pruned
# 98M     ./not_pruned

My openvino version:

$ pip freeze | grep openvino
openvino==2022.1.0
openvino-dev==2022.1.0
openvino-telemetry==2022.1.1

View solution in original post

Zulkifli_Intel · ‎09-05-2022

Hello Timosy,

Thank you for reaching out to us.

We are checking this with our development team and will get back to you when we receive feedback.

Sincerely,

Zulkifli

timosy · ‎09-05-2022

thnaks for your help

Zulkifli_Intel · ‎09-11-2022

Hi Timosy,

Thank you for your patience.

This documentation explains filter pruning and how it can help optimize the model and speed up inference. We do not have the table/graph that shows the inference speed before and after applying the compression method, since the speed varies depending on the hardware and the model.

Sincerely,

Zulkifli

timosy · ‎09-11-2022

Thanks for your kind additional informaton

Accrong to Filter Pruning in the page you linked, "Filter Pruning" (structure pruning) is ceratainly mentioned to be supported. It means that a model can be shrinked with this method, and gets a bit fast.

I try to use/check "Filter Pruning" more.

My previous usage might be improper.

timosy · ‎09-11-2022

I tested a config file below, which is the filer pruning

    nncf_config_pruning_dict = {
    "model": "testnet",
    "num_classes": classes,
    "batch_size": g_batch_size,
    "pretrained": True,
    "epochs": 100,
    "input_info": {"sample_size": [1, 3, image_size, image_size] },
    "optimizer": {
        "type": "SGD",
        "base_lr": 0.1,
        "weight_decay": 1e-4,
        "schedule_type": "multistep",
        "steps": [
            20,
            40,
            60,
            80
        ],
        "optimizer_params":
        {
            "momentum": 0.9,
            "nesterov": True
        }
    },
    "compression": [
       {
            "algorithm": "filter_pruning",
            "pruning_init": 0.1,
            "params": {
                "schedule": "exponential",
                "pruning_target": 0.6,
                 "pruning_steps": 15,
                "filter_importance": "geometric_median"
            }
       }
    ]
    }

and files before/after pruning I got are something like

-rwxrwxrwx 1 user user 242031227 Aug 31 19:59 model_fp32.onnx
-rwxrwxrwx 1 user user    900032 Sep 12 13:26 model_fp32_layer2.bin
-rwxrwxrwx 1 user user       940 Sep 12 13:26 model_fp32_layer2.mapping
-rwxrwxrwx 1 user user     11266 Sep 12 13:26 model_fp32_layer2.xml
-rwxrwxrwx 1 user user 242031626 Sep 12 13:18 model_prun.onnx
-rwxrwxrwx 1 user user    900032 Sep 12 13:27 model_prun_layer2.bin
-rwxrwxrwx 1 user user      1012 Sep 12 13:27 model_prun_layer2.mapping
-rwxrwxrwx 1 user user     11362 Sep 12 13:27 model_prun_layer2.xml

It seems the file are not compressed... I mistook something?

Zulkifli_Intel · ‎09-12-2022

Hi Timosy,

We are currently investigating this issue and will get back to you with the finding.

Sincerely,

Zulkifli

Hari_B_Intel · ‎09-13-2022

Hi timosy,

Could you provide us with the model that you try to compress with filter Pruning? So that we can understand the layer of the model and the cause of it.

If you could provide some detail on your compression, that would be helpful for us to further investigate the issue you are facing.

Thank you

timosy · ‎09-13-2022

Thanks for your kind help.

My model is actually a simple AlexNet model shown below because what I wanted check at first was to confirm whether a function of the filter/structure pruning works or not. It means whether a file is compressed or not ? and inference gets fast or not ? with a simple model.

A cutting model that I listed in my above message corresponds to a model where the cutting was done just after a 2nd convolution layer. It's also a quite simple model.

Since maximum file size I can upload here is <100MB, I can not do it. But, if AlexNet is compressed with your filter/structure pruning configuration, it will be already great information for me. Hopefully, I'd like to test the filter/stucture pruning + quantization for AlexNet.

class AlexNet_A(nn.Module):
    def __init__(self, num_classes: int = 1000, dropout: float = 0.5):
        super(AlexNet_A, self).__init__()

        self.features = nn.Sequential(
            nn.Conv2d( 3,  96, kernel_size=14, stride=4, padding=0),
            nn.ReLU(inplace=True),
            nn.AvgPool2d(kernel_size=3, stride=2, ceil_mode=True),
            nn.Conv2d(96, 256, kernel_size=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

Hari_B_Intel · ‎09-19-2022

Hi timosy

Thank you for the information you provide, currently, we are still investigating this and will get back to you very soon.

Thank you

dlyakhov · ‎09-19-2022

Hi @timosy,
thank you for you interest in NNCF!
Sorry for inconvenience, we have gap in our documentation.
NNCF filter pruning algo only puts zeros inside convolutional and linear layers parameters of a model and doesn't reduce size of the model. To actually remove channels/rows from the model one need to utilize additional Ngraph pruning transformation. To do this, additional argument `--transform=Pruning` should be added to Model Optimizer conversion line during model conversion to IR. Ref: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-relnotes.html

Please try this argument and reach us out in case anything goes wrong.
Best Regards,
Daniil Liakhov, NNCF team member

timosy · ‎09-19-2022

Thanks for your kind comment.

I tested it, but unfortunately it seems not to work...

Model size does not change after converting to IR model.

Packages I'm using are

onnx                 1.11.0
onnxruntime          1.9.0
opencv-python        4.5.5.64
openvino             2022.1.0
openvino-dev         2022.1.0
openvino-telemetry   2022.1.1

and, configuration and a message when applying nncf is following,

    "compression": [
       {
            "algorithm": "filter_pruning",
            "pruning_init": 0.1,
            "params": {
                "schedule": "exponential",
                "pruning_target": 0.8,
                "pruning_steps": 15,
                "filter_importance": "geometric_median"
            }
       }
    ]


INFO:nncf:Please, provide execution parameters for optimal model initialization
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[0] by AlexNet/Sequential[features]/NNCFConv2d[0]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[3] by AlexNet/Sequential[features]/NNCFConv2d[3]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[6] by AlexNet/Sequential[features]/NNCFConv2d[6]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[8] by AlexNet/Sequential[features]/NNCFConv2d[8]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[10] by AlexNet/Sequential[features]/NNCFConv2d[10]
INFO:nncf:Wrapping module AlexNet/Sequential[classifier]/Linear[1] by AlexNet/Sequential[classifier]/NNCFLinear[1]
INFO:nncf:Wrapping module AlexNet/Sequential[classifier]/Linear[4] by AlexNet/Sequential[classifier]/NNCFLinear[4]
INFO:nncf:Wrapping module AlexNet/Sequential[classifier]/Linear[6] by AlexNet/Sequential[classifier]/NNCFLinear[6]

INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[0]/conv2d_0] can't be pruned, because some nodes should't be pruned, error messages for this nodes: ignored adding Weight Pruner in: AlexNet/Sequential[features]/NNCFConv2d[0]/conv2d_0 because this scope is one of the first convolutions.

INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[3]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[6]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[8]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[10]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[classifier]/NNCFLinear[1]/linear_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[classifier]/NNCFLinear[4]/linear_0] will be pruned together.

INFO:nncf:Group of nodes [AlexNet/Sequential[classifier]/NNCFLinear[6]/linear_0] can't be pruned, because some nodes should't be pruned, error messages for this nodes: ignored adding Weight Pruner in: AlexNet/Sequential[classifier]/NNCFLinear[6]/linear_0 because this scope is convolution with output which directly affects model output dimensions.

INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[3]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[6]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[8]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[10]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[classifier]/NNCFLinear[1]/linear_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[classifier]/NNCFLinear[4]/linear_0
INFO:nncf:Computing filter importance scores and binary masks...

NNCF ONNX model exported.

Model Optimizer arguments:
Common parameters:
        - Path to the Input Model:      model_nncf.onnx
        - Path for generated IR:        ./
        - IR output name:       model_nncf
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model
        - Input layers:         Not specified, inherited from the model
        - Output layers:        32
        - Input shapes:         [1, 3, 2048, 2048]
        - Source layout:        Not specified
        - Target layout:        Not specified
        - Layout:       Not specified
        - Mean values:  Not specified
        - Scale values:         Not specified
        - Scale factor:         Not specified
        - Precision of IR:      FP16
        - Enable fusing:        True
        - User transformations:         Pruning
        - Reverse input channels:       False
        - Enable IR generation for fixed input shape:   False
        - Use the transformations config file:  None

I also cheked "mo --help", I found "--transform", but no explanation on Pruning.

  --data_type {FP16,FP32,half,float}
     Data type for all intermediate tensors and weights. If original 
     model is in FP32 and --data_type=FP16 is specified, all model 
     weights and biases are compressed to FP16.
  --transform TRANSFORM
     Apply additional transformations. Usage: "--transform transformation_name1[args],transformation_name2..." 
     where [args] is key=value pairs separated by semicolon. Examples: "--transform LowLatency2" or
      "--transform LowLatency2[use_const_initializer=False]" or 
      "--transform "MakeStateful[param_res_names={'input_name_1':'output_name_1','input_name_2':'output_name_2'}]"" 
     Available transformations: "LowLatency2", "MakeStateful"
....

I might mistake something?

timosy · ‎09-19-2022

My message I wrote here 6 hours ago was removed??

dlyakhov · ‎09-20-2022

I got your previous message: you couldn't run pruning transformation.
Please try this example:

# Assume you have openvino_dev==2022.1 and nncf installed
# Go to examples dir
cd nncf/examples/torch/classification
# Export pruned model
python main.py --config configs/pruning/resnet50_imagenet_accuracy_aware.json --mode export --to-onnx resnet50_pruned.onnx --cpu-only
# Convert without pruning
 mo --input_model resnet50_pruned.onnx -o not_pruned
# Convert with pruning
 mo --input_model resnet50_pruned.onnx --transform=Pruning -o pruned
# Check IR sizes
du -h
# my output is 
# ...
# 89M     ./pruned
# 98M     ./not_pruned

My openvino version:

$ pip freeze | grep openvino
openvino==2022.1.0
openvino-dev==2022.1.0
openvino-telemetry==2022.1.1

timosy · ‎09-20-2022

Thank your for the demonstration.

Finally, i could also apply Pruning + Quantization to my model.
I could see that the model data was compressed.

I actually had to set "prune_first_conv" to be True

because I'm cutting shallow layer.　

I appreciate your help! and close this question.

Zulkifli_Intel · ‎09-28-2022

Hi Timosy,

This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.

Sincerely,

Zulkifli

Data size after pruning using NNCF

Inference Engine

Model Optimizer

Post training Optimizer Tool