Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Data size after pruning using NNCF

timosy
New Contributor I
3,810 Views

Relating this post: 

https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/difference-between-sparce-and-pruning/m-p/1409037#M28160

> .... such as filter pruning. The important advantage of this method is that it is generic and does not require special HW instructions. Currently, two filter pruning techniques are supported:

 

and a paper: https://arxiv.org/pdf/2002.08679.pdf

> Filter pruning ... NNCF also supports structured pruning for convolutional neural networks in the form of filter pruning.

 

This means, when applying the structure pruning, can I get a small model which allows me to make fast inference? But, when reading this instruction on NNCF pruning: https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Pruning.md

I did not find a mention on the structure pruning itself and how much the structure pruning has advantage on  the inference speed... Please tell me comments on the structure pruning? Is it available?

 

 

0 Kudos
1 Solution
dlyakhov
Employee
3,495 Views

I got your previous message: you couldn't run pruning transformation.
Please try this example:

 

# Assume you have openvino_dev==2022.1 and nncf installed
# Go to examples dir
cd nncf/examples/torch/classification
# Export pruned model
python main.py --config configs/pruning/resnet50_imagenet_accuracy_aware.json --mode export --to-onnx resnet50_pruned.onnx --cpu-only
# Convert without pruning
 mo --input_model resnet50_pruned.onnx -o not_pruned
# Convert with pruning
 mo --input_model resnet50_pruned.onnx --transform=Pruning -o pruned
# Check IR sizes
du -h
# my output is 
# ...
# 89M     ./pruned
# 98M     ./not_pruned

 

 My openvino version:

$ pip freeze | grep openvino
openvino==2022.1.0
openvino-dev==2022.1.0
openvino-telemetry==2022.1.1

View solution in original post

15 Replies
Zulkifli_Intel
Moderator
3,765 Views

Hello Timosy,

Thank you for reaching out to us.

 

We are checking this with our development team and will get back to you when we receive feedback.

 

Sincerely,

Zulkifli 


0 Kudos
timosy
New Contributor I
3,758 Views
0 Kudos
Zulkifli_Intel
Moderator
3,718 Views

Hi Timosy,

Thank you for your patience.

  

This documentation explains filter pruning and how it can help optimize the model and speed up inference. We do not have the table/graph that shows the inference speed before and after applying the compression method, since the speed varies depending on the hardware and the model.

 

Sincerely,

Zulkifli


0 Kudos
timosy
New Contributor I
3,712 Views

Thanks for your kind additional informaton

 

Accrong to Filter Pruning in the page you linked, "Filter Pruning" (structure pruning) is ceratainly mentioned to be supported. It means that a model can be shrinked with this method, and gets a bit fast.

I try to use/check "Filter Pruning" more.

My previous usage might be improper. 

 

 

0 Kudos
timosy
New Contributor I
3,710 Views

I tested a config file below, which is the filer pruning

 

    nncf_config_pruning_dict = {
    "model": "testnet",
    "num_classes": classes,
    "batch_size": g_batch_size,
    "pretrained": True,
    "epochs": 100,
    "input_info": {"sample_size": [1, 3, image_size, image_size] },
    "optimizer": {
        "type": "SGD",
        "base_lr": 0.1,
        "weight_decay": 1e-4,
        "schedule_type": "multistep",
        "steps": [
            20,
            40,
            60,
            80
        ],
        "optimizer_params":
        {
            "momentum": 0.9,
            "nesterov": True
        }
    },
    "compression": [
       {
            "algorithm": "filter_pruning",
            "pruning_init": 0.1,
            "params": {
                "schedule": "exponential",
                "pruning_target": 0.6,
                 "pruning_steps": 15,
                "filter_importance": "geometric_median"
            }
       }
    ]
    }

 

and files before/after pruning I got are something like 

 

-rwxrwxrwx 1 user user 242031227 Aug 31 19:59 model_fp32.onnx
-rwxrwxrwx 1 user user    900032 Sep 12 13:26 model_fp32_layer2.bin
-rwxrwxrwx 1 user user       940 Sep 12 13:26 model_fp32_layer2.mapping
-rwxrwxrwx 1 user user     11266 Sep 12 13:26 model_fp32_layer2.xml
-rwxrwxrwx 1 user user 242031626 Sep 12 13:18 model_prun.onnx
-rwxrwxrwx 1 user user    900032 Sep 12 13:27 model_prun_layer2.bin
-rwxrwxrwx 1 user user      1012 Sep 12 13:27 model_prun_layer2.mapping
-rwxrwxrwx 1 user user     11362 Sep 12 13:27 model_prun_layer2.xml

 

It seems the file are not compressed... I mistook something?

 

0 Kudos
Zulkifli_Intel
Moderator
3,687 Views

Hi Timosy,

 

We are currently investigating this issue and will get back to you with the finding.


Sincerely,

Zulkifli


Hari_B_Intel
Moderator
3,669 Views

Hi timosy,


Could you provide us with the model that you try to compress with filter Pruning? So that we can understand the layer of the model and the cause of it.

If you could provide some detail on your compression, that would be helpful for us to further investigate the issue you are facing.


Thank you


0 Kudos
timosy
New Contributor I
3,649 Views

 

 

Thanks for your kind help.

 

My model is actually a simple AlexNet model shown below because what I wanted check at first was to confirm whether a function of the filter/structure pruning works or not. It means whether a file is compressed or not ? and inference gets fast or not ? with a simple model.

 

A cutting model that I listed in my above message corresponds to a model where the cutting was done just after a 2nd convolution layer. It's also a quite simple model.

 

Since maximum file size I can upload here is <100MB, I can not do it. But, if AlexNet is compressed with your filter/structure pruning configuration, it will be already great information for me. Hopefully, I'd like to test the filter/stucture pruning + quantization for AlexNet.

 

 

 

 

class AlexNet_A(nn.Module):
    def __init__(self, num_classes: int = 1000, dropout: float = 0.5):
        super(AlexNet_A, self).__init__()

        self.features = nn.Sequential(
            nn.Conv2d( 3,  96, kernel_size=14, stride=4, padding=0),
            nn.ReLU(inplace=True),
            nn.AvgPool2d(kernel_size=3, stride=2, ceil_mode=True),
            nn.Conv2d(96, 256, kernel_size=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

 

 

 

 

 

0 Kudos
Hari_B_Intel
Moderator
3,546 Views

Hi timosy


Thank you for the information you provide, currently, we are still investigating this and will get back to you very soon.


Thank you




dlyakhov
Employee
3,532 Views

Hi @timosy,
thank you for you interest in NNCF!
Sorry for inconvenience, we have gap in our documentation.
NNCF filter pruning algo only puts zeros inside convolutional and linear layers parameters of a model and doesn't reduce size of the model. To actually remove channels/rows from the model one need to utilize additional Ngraph pruning transformation. To do this, additional argument `--transform=Pruning` should be added to Model Optimizer conversion line during model conversion to IR. Ref: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-relnotes.html 

Please try this argument and reach us out in case anything goes wrong.
Best Regards,
Daniil Liakhov, NNCF team member

timosy
New Contributor I
3,473 Views

Thanks for your kind comment.

I tested it, but unfortunately it seems not to work...

Model size does not change after converting to IR model.

Packages I'm using are 

 

onnx                 1.11.0
onnxruntime          1.9.0
opencv-python        4.5.5.64
openvino             2022.1.0
openvino-dev         2022.1.0
openvino-telemetry   2022.1.1

 

and,  configuration and a message when applying nncf is following,

 

    "compression": [
       {
            "algorithm": "filter_pruning",
            "pruning_init": 0.1,
            "params": {
                "schedule": "exponential",
                "pruning_target": 0.8,
                "pruning_steps": 15,
                "filter_importance": "geometric_median"
            }
       }
    ]


INFO:nncf:Please, provide execution parameters for optimal model initialization
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[0] by AlexNet/Sequential[features]/NNCFConv2d[0]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[3] by AlexNet/Sequential[features]/NNCFConv2d[3]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[6] by AlexNet/Sequential[features]/NNCFConv2d[6]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[8] by AlexNet/Sequential[features]/NNCFConv2d[8]
INFO:nncf:Wrapping module AlexNet/Sequential[features]/Conv2d[10] by AlexNet/Sequential[features]/NNCFConv2d[10]
INFO:nncf:Wrapping module AlexNet/Sequential[classifier]/Linear[1] by AlexNet/Sequential[classifier]/NNCFLinear[1]
INFO:nncf:Wrapping module AlexNet/Sequential[classifier]/Linear[4] by AlexNet/Sequential[classifier]/NNCFLinear[4]
INFO:nncf:Wrapping module AlexNet/Sequential[classifier]/Linear[6] by AlexNet/Sequential[classifier]/NNCFLinear[6]

INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[0]/conv2d_0] can't be pruned, because some nodes should't be pruned, error messages for this nodes: ignored adding Weight Pruner in: AlexNet/Sequential[features]/NNCFConv2d[0]/conv2d_0 because this scope is one of the first convolutions.

INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[3]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[6]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[8]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[features]/NNCFConv2d[10]/conv2d_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[classifier]/NNCFLinear[1]/linear_0] will be pruned together.
INFO:nncf:Group of nodes [AlexNet/Sequential[classifier]/NNCFLinear[4]/linear_0] will be pruned together.

INFO:nncf:Group of nodes [AlexNet/Sequential[classifier]/NNCFLinear[6]/linear_0] can't be pruned, because some nodes should't be pruned, error messages for this nodes: ignored adding Weight Pruner in: AlexNet/Sequential[classifier]/NNCFLinear[6]/linear_0 because this scope is convolution with output which directly affects model output dimensions.

INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[3]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[6]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[8]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[features]/NNCFConv2d[10]/conv2d_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[classifier]/NNCFLinear[1]/linear_0
INFO:nncf:Adding Weight Pruner in scope: AlexNet/Sequential[classifier]/NNCFLinear[4]/linear_0
INFO:nncf:Computing filter importance scores and binary masks...

NNCF ONNX model exported.

 

Model Optimizer arguments:
Common parameters:
        - Path to the Input Model:      model_nncf.onnx
        - Path for generated IR:        ./
        - IR output name:       model_nncf
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model
        - Input layers:         Not specified, inherited from the model
        - Output layers:        32
        - Input shapes:         [1, 3, 2048, 2048]
        - Source layout:        Not specified
        - Target layout:        Not specified
        - Layout:       Not specified
        - Mean values:  Not specified
        - Scale values:         Not specified
        - Scale factor:         Not specified
        - Precision of IR:      FP16
        - Enable fusing:        True
        - User transformations:         Pruning
        - Reverse input channels:       False
        - Enable IR generation for fixed input shape:   False
        - Use the transformations config file:  None

I also cheked "mo --help", I found "--transform", but no explanation on Pruning.

 

  --data_type {FP16,FP32,half,float}
     Data type for all intermediate tensors and weights. If original 
     model is in FP32 and --data_type=FP16 is specified, all model 
     weights and biases are compressed to FP16.
  --transform TRANSFORM
     Apply additional transformations. Usage: "--transform transformation_name1[args],transformation_name2..." 
     where [args] is key=value pairs separated by semicolon. Examples: "--transform LowLatency2" or
      "--transform LowLatency2[use_const_initializer=False]" or 
      "--transform "MakeStateful[param_res_names={'input_name_1':'output_name_1','input_name_2':'output_name_2'}]"" 
     Available transformations: "LowLatency2", "MakeStateful"
....

 

I might mistake something?

0 Kudos
timosy
New Contributor I
3,511 Views

My message I wrote here 6 hours ago was removed??

0 Kudos
dlyakhov
Employee
3,496 Views

I got your previous message: you couldn't run pruning transformation.
Please try this example:

 

# Assume you have openvino_dev==2022.1 and nncf installed
# Go to examples dir
cd nncf/examples/torch/classification
# Export pruned model
python main.py --config configs/pruning/resnet50_imagenet_accuracy_aware.json --mode export --to-onnx resnet50_pruned.onnx --cpu-only
# Convert without pruning
 mo --input_model resnet50_pruned.onnx -o not_pruned
# Convert with pruning
 mo --input_model resnet50_pruned.onnx --transform=Pruning -o pruned
# Check IR sizes
du -h
# my output is 
# ...
# 89M     ./pruned
# 98M     ./not_pruned

 

 My openvino version:

$ pip freeze | grep openvino
openvino==2022.1.0
openvino-dev==2022.1.0
openvino-telemetry==2022.1.1
timosy
New Contributor I
3,486 Views

  Thank your for the demonstration.

Finally, i could also apply Pruning + Quantization to my model.
I could see that the model data was compressed. 

I actually had to set "prune_first_conv" to be True

because I'm cutting shallow layer. 

I appreciate your help! and close this question.

0 Kudos
Zulkifli_Intel
Moderator
3,279 Views

Hi Timosy,


This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.


Sincerely,

Zulkifli


0 Kudos
Reply