Direct Gradient Extraction in OpenVINO IR Format

Xanph · ‎08-04-2024

Context:

A convolutional network is making predictions on whether there is motion occurring on a live stream. This is running on a lightweight (resource-wise) edge device.

Problem & Goal:

Needing the model to pinpoint where the 'motion' (class) is occurring that influenced the model's prediction.

Targeted Possible Solution:

Use the gradients outputted by the third convolution layer to generate Grad-CAM visualisations (a heatmap over the image of the motion class activation).

---

The problem, however, is that OpenVINO IR does not allow for direct access to gradients (based on my research), because the model is designed for efficient inference, understandably.

As a result, the original keras model has been modified to contain the third convolution layer as a second output. So we now have a binary output for motion and a feature map output from the third conv layer.

I then need to approximate what the gradients are by using the Central Differencing Scheme numerical calculation. This requires perturbing the input, re-running inference twice to then calculate the estimated gradient, like so:

(python)

def numerical_gradient(infer_request, input_data, output_index, h=1e-3
    grad = np.zeros_like(input_data)
    for i in range(input_data.size):
        input_data_plus_h = np.copy(input_data)
        input_data_minus_h = np.copy(input_data)
        input_data_plus_h.flat[i] += h
        input_data_minus_h.flat[i] -= h

        # Run inference with perturbed inputs
        infer_request.infer(inputs={input_layer: input_data_plus_h})
        f_x_plus_h = infer_request.get_tensor(dense_output_layer).data
        infer_request.infer(inputs={input_layer: input_data_minus_h})
        f_x_minus_h = infer_request.get_tensor(dense_output_layer).data

        if f_x_plus_h.ndim == 3:
            f_x_plus_h = f_x_plus_h[0, -1, 0]

        if f_x_minus_h.ndim == 3:
            f_x_minus_h = f_x_minus_h[0, -1, 0]

        grad.flat[i] = (f_x_plus_h - f_x_minus_h) / (2 * h)
    return grad

So now the problem is that this implementation involves three inference steps, for each frame inside an LSTM sequence of 50, on a live stream - not very efficient:

First Inference: The initial inference to get the prediction.
Second Inference: A perturbed input with a small positive perturbation for gradient calculation.
Third Inference: A perturbed input with a small negative perturbation for gradient calculation.

As an alternative, I could try using Forward Finite Difference, where it only needs a second inference, but this still isn't the best for an edge device with limited GPU resources.

I wanted to see if there are any other suggestions on how I can reach my goal of getting the model to state where the motion class is appearing in the scene. Alternatively, I can fall back on to background subtraction, for getting the positions.

I also appreciate that there's quite a lot of math in this, for which I am learning on the spot about too!

Many thanks,

Xanph

(Using OpenVINO Nightly)

Vipin_S_Intel · ‎08-05-2024

Hi Flynn, could you please provide us with the following details?

The exact name and build version of the Intel® Toolkit you’re using.
The operating system and its build version.
Whether the product has been installed.
A detailed explanation of your query, along with a screenshot if possible.

To assist you further, we would require these details.

Iffa_Intel · ‎08-20-2024

Hi,

If you still need help with this issue,

Please help to clarify & share :

Your model framework (eg: Tensorflow,ONNX,etc)
Does the model is a custom model? Could you elaborate the custom part?
Your conversion commands
Relevant model files
Which OpenVINO sample app did you use for inferencing? (if custom please share the specific code)
Your issue right now is, you are not satisfied with the model's inferencing result and wish to improvise am I right?

Cordially,

Iffa

Xanph · ‎03-29-2025

Apologies for the delay Iffa.

I'm using Tensorflow 2.19, but was using 2.17 at the time, and this was the below sequential model with the keras api:

model = Sequential(name=f"{model_version}")

model.add(Conv3D(64, kernel_size=3, input_shape=(SEQUENCE_LENGTH, 140, 250, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPool3D(pool_size=(1, 2, 2)))

model.add(Conv3D(128, kernel_size=3, padding='same'))
model.add(Activation('relu'))
model.add(MaxPool3D(pool_size=(1, 2, 2)))

model.add(Conv3D(128, kernel_size=3, padding='same', name="last_conv3d"))
model.add(Activation('relu'))
model.add(MaxPool3D(pool_size=(1, 2, 2)))

model.add(TimeDistributed(Flatten()))

# See pattern recognition from frame 0 -> 49 and 49 -> 0.
model.add(Bidirectional(LSTM(32, return_sequences=True)))

model.add(BatchNormalization())

model.add(Dense(1, activation='sigmoid', dtype='float32'))

model.compile(
optimizer=Adam(learning_rate=0.0001),
loss='binary_crossentropy',
metrics=['accuracy', tf.keras.metrics.AUC()]
)

Code used for converting the model. At the time I was using OpenVINO 2024 nightly:

model_name = f"{version}"
model_path = f'models/best_keras/{model_name}.h5'
model = keras.models.load_model(model_path)

# Save the model in the SavedModel format
saved_model_dir = f'models/tensorflow/{model_name}_tf'
model.save(saved_model_dir, save_format='tf')

# Convert the SavedModel to OpenVINO IR format with multiple outputs
ir_model = ov.convert_model(
saved_model_dir, 
input={"conv3d_input": [1, SEQUENCE_LENGTH, 140, 250, 3]},
output=["last_conv3d", "dense"]
)

# Save the converted IR model
output_dir = "models/intermediate_representation"
ov.save_model(ir_model, f"{output_dir}/ir_{version}.xml")

As for the answer to number 5, I was just using the OpenVINO package directly for inference.

For 6, my goal is to extract the Grad-CAM from the last_conv3d layer of the model, in addition to the last dense layer (the prediction).

Many thanks, and I now have notifications turned on

Iffa_Intel · ‎08-29-2024

Hi,

Thank you for your question. If you need any additional information from Intel, please submit a new question as Intel is no longer monitoring this thread.

Cordially,

Iffa

Direct Gradient Extraction in OpenVINO IR Format

Inference Engine