NNCF training time compression not working

LFan · ‎06-17-2025

Hello,

I am looking for help with the NNCF training-time compression on a U-net model. I have a U-net model trained in PyTorch, and then I tried to use NNCF to do filter pruning in training, so that we can speedup the model inference.

During the NNCF training, it shows the "current pruning level" to slowly increase to reach the target value of 0.30, but the FLOPS reduction is always 0. It also shows "WARNING:nncf:Binary masks are identity matrix, please check your config." during the training which seems suspicious if NNCF is doing anything.

Finally, I converted the model to OpenVINO and compare the pruned model vs the original unpruned model (both in OV), and there seem to be no speed improvement, while the output accuracy has reduced.

Could you provide any suggestion on what may be the problem here? Really appreciated.

==========================

Output from compression_stats.filter_pruning

==========================

Current pruning level: 0.00
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 1/10, Train Batch 1/325, Loss: 0.04538079723715782
--
Current pruning level: 0.00
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 2/10, Train Batch 1/325, Loss: 0.045010462403297424
--
Current pruning level: 0.00
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 3/10, Train Batch 1/325, Loss: 0.0421537347137928
--
Current pruning level: 0.10
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 4/10, Train Batch 1/325, Loss: 0.045686956495046616
--
Current pruning level: 0.15
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 5/10, Train Batch 1/325, Loss: 0.045078352093696594
--
Current pruning level: 0.21
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 6/10, Train Batch 1/325, Loss: 0.05612863600254059
--
Current pruning level: 0.25
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 7/10, Train Batch 1/325, Loss: 0.0534798726439476
--
Current pruning level: 0.30
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 8/10, Train Batch 1/325, Loss: 0.05307655408978462
--
Current pruning level: 0.30
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 9/10, Train Batch 1/325, Loss: 0.05456303432583809
--
Current pruning level: 0.30
Target pruning level: 0.30
FLOPs reduction due to pruning: 0.00
Epoch 10/10, Train Batch 1/325, Loss: 0.05304049700498581

============================
Code snippet of the NNCF training:
============================

# NNCF setup

print("LF: self.densityLayoutNumPixels = ", self.densityLayoutNumPixels) # = 128

nncf_config_dict = {

"input_info": [

{

"sample_size": [1, 1, 1, 1], # input shape required for model tracing

},

{

"sample_size": [1, 1, self.densityLayoutNumPixels, self.densityLayoutNumPixels], # input shape required for model tracing

}

],

"compression": [

{

"algorithm": "filter_pruning",

"pruning_init": 0.1, #weights initially considered for pruning

"params": {

"pruning_target": 0.3, # aim to prune 30% of the weights

"pruning_steps": numEpochsPruning,

"schedule": "exponential",

"num_init_steps": 3

}

},

]

}

nncf_config = NNCFConfig.from_dict(nncf_config_dict)

init_dataloader = ThruSlitInitializingDataLoader(joinedDataLoader)

nncf_config = register_default_init_args(nncf_config, init_dataloader)

compression_ctrl, nncf_model = create_compressed_model(unet_model, nncf_config)

for epoch in range(numEpochsToRun):

compression_ctrl.scheduler.epoch_step()

nncf_model.train()

# Print pruning statistics

compression_stats = compression_ctrl.statistics()

filter_pruning_stats = compression_stats.filter_pruning

if filter_pruning_stats:

print("Filter Pruning Statistics:")

print(f"Current pruning level: {filter_pruning_stats.current_pruning_level:.2f}")

print(f"Target pruning level: {filter_pruning_stats.target_pruning_level:.2f}")

print(f"FLOPs reduction due to pruning: {filter_pruning_stats.prune_flops:.2f}")

else:

print("No filter pruning statistics available.")

last_training_loss = 0

training_loss_per_step = []

# the last minibatch size can differ, so track each for weighting loss average for epoch

minibatch_size_per_step = []

for batch_idx, (inputs, targets, inputFileNames) in enumerate(joinedDataLoader):

compression_ctrl.scheduler.step()

# unsqueeze so that the 32 batch size goes to first index and add a 1-channel index

inputTensors = [inTensor.unsqueeze(1).to(device) for inTensor in inputs]

targetTensors = [targetTensor.unsqueeze(1).to(device) for targetTensor in targets]

outputs = nncf_model(inputTensors[0]) if bSingleInputTensor else nncf_model(inputTensors[0],inputTensors[1])

losses = {}

tt=0

for output_name, output in outputs.items():

losses[output_name] = lossDict[output_name](output, targetTensors[tt])

tt+=1

total_loss = sum(losses.values())

optimizer.zero_grad()

total_loss.backward()

optimizer.step()

print(f'Epoch {epoch+1}/{numEpochsToRun}, Train Batch {batch_idx+1}/{len(joinedDataLoader)}, Loss: {total_loss.item()}')

last_training_loss = total_loss.item()

training_loss_per_step.append(last_training_loss)

minibatch_size_per_step.append(float(inputTensors[0].shape[0]))

nncf_model.eval()

val_loss = 0

if self.bTurnOnValidationMonitoring:

with torch.no_grad():

for inputs, targets, _ in joinedMonitorDataLoader: # unused variable is inputFileNames

inputTensors = [inTensor.unsqueeze(1).to(device) for inTensor in inputs]

targetTensors = [targetTensor.unsqueeze(1).to(device) for targetTensor in targets]

outputs = nncf_model(inputTensors[0]) if bSingleInputTensor else nncf_model(inputTensors[0],inputTensors[1])

losses = {}

tt=0

for output_name, output in outputs.items():

losses[output_name] = lossDict[output_name](output, targetTensors[tt])

tt+=1

total_loss = sum(losses.values())

val_loss += total_loss.item() * inputTensors[0].size(0)

val_loss /= len(joinedMonitorDataLoader.dataset)

print(f'Epoch {epoch+1}/{numEpochsToRun}, Validation loss: {val_loss}')

avg_training_loss_for_epoch = 0.0

total_dataset_size = 0

for stepLoss, minibatchSize in zip(training_loss_per_step, minibatch_size_per_step):

avg_training_loss_for_epoch += stepLoss * minibatchSize

total_dataset_size += minibatchSize

avg_training_loss_for_epoch /= float(total_dataset_size)

csv_logger.log(epoch, avg_training_loss_for_epoch, val_loss)

print("****** NNCF training finetuning with pruning is completed ******")

Zulkifli_Intel · ‎06-18-2025

Hi Lfan,

Thank you for reaching out.

If possible, can you share the steps you used to compress the model? Also, please share the OpenVINO version and the inference results before and after pruning.

Regards,

Zul

LFan · ‎06-18-2025

Hi Zul,

My OpenVINO version is: 2024.1.0. NNCF version is 2.16.0.

Here is the NNCF compression steps I took (the code snippet was also included in the first post):

Starting with a U-net model already trained.

NNCF Configuration:
- An NNCF configuration dictionary is created, specifying:
  - Input shapes for model tracing.
  - Compression algorithm "filter_pruning" with parameters like pruning target, schedule, and number of steps.
Initialization DataLoader to prepare data for NNCF.
Register Initialization Args:
- The initialization data is registered with the NNCF configuration using register_default_init_args.
Create Compressed Model:
- The model is wrapped with NNCF using create_compressed_model, which returns the compressed model (nncf_model) and compression_ctrl.
Training with Pruning:
- Ran 10 epochs, for each epoch:
  - The pruning scheduler is updated compression_ctrl.scheduler.epoch_step .
  - The model is trained on batches of data, computing losses and updating weights.
  - Pruning statistics (e.g., current pruning level, FLOPs reduction) are printed.
Model Conversion to OpenVINO:
- After training, the pruned model is converted to OpenVINO format using ov.convert_model.
- A pruning transformation is applied to the OpenVINO model.
- The converted model is saved as .xml and .bin files for deployment.

I am happy to share all relevant scripts if needed.

Looking at the pruning statistics (also in the previous post), it seems to indicate that there were no actual FLOPs reduction happened during the pruning (0.0).

I then tested the inference time on the original model as well as the pruned model on 100 inputs, the mean value is very similar between the two models:

Inference time OpenVINO 99 0.0089114286
Inference time OpenVINO Pruned 99 0.0088372314

I also tested the pruned model on the large dataset (>1000 images) and it showed no change in runtime.

I've been struggling with this for the past few weeks and have made little progress. Would it be possible to schedule a meeting to discuss this further? It would be more effective in identifying the problem.

Thank you in advance for your assistance.

Best regards,

Li

Zulkifli_Intel · ‎06-18-2025

Hi Lfan,

That would be great if you could share all the necessary files. I'll check on my side and help refer this issue to the development team.

Regards,

Zul

Zulkifli_Intel · ‎07-30-2025

Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.