The "gvaaudiodetect" element reports inference events that are lower than the threshold set

NikhilP · ‎01-17-2023

HI,

I am using DLStreamer 2021.4.X version, we are running an audio pipeline with aclnet model file which is set to a confidence threshold of 0.8:

location=/home/ubuntu/Work/inputVideo/how_are_you_doing.wav ! decodebin ! audioresample ! audioconvert ! audio/x-raw, channels=1,format=S16LE,rate=16000 ! audiomixer output-buffer-duration=100000000 ! gvaaudiodetect model=/home/ubuntu/Work/public/aclnet/FP32/aclnet.xml model-proc=/home/ubuntu/Work/model_proc/aclnet.json threshold=0.8 sliding-window=0.2 ! gvametaconvert ! gvametapublish file-format=json-lines ! fakesink

I see that the inference events are reported but some are even below 0.8. (ones marked in bold in the Results). Is this a bug ? I expect only events that are beyond threshold of 0.8 to be reported.

Results is as below, input clip in zip format is attached

Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
{"channels":1,"events":[{"detection":{"confidence":0.7,"label":"Can opening","label_id":35,"segment":{"end_timestamp":1000000000,"start_timestamp":0}},"end_timestamp":1000000000,"event_type":"Can opening","start_timestamp":0}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":0.67,"label":"Cow","label_id":4,"segment":{"end_timestamp":1200000000,"start_timestamp":200000000}},"end_timestamp":1200000000,"event_type":"Cow","start_timestamp":200000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":0.99,"label":"Speech","label_id":53,"segment":{"end_timestamp":1400000000,"start_timestamp":400000000}},"end_timestamp":1400000000,"event_type":"Speech","start_timestamp":400000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":1600000000,"start_timestamp":600000000}},"end_timestamp":1600000000,"event_type":"Speech","start_timestamp":600000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":1800000000,"start_timestamp":800000000}},"end_timestamp":1800000000,"event_type":"Speech","start_timestamp":800000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":2000000000,"start_timestamp":1000000000}},"end_timestamp":2000000000,"event_type":"Speech","start_timestamp":1000000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":2200000000,"start_timestamp":1200000000}},"end_timestamp":2200000000,"event_type":"Speech","start_timestamp":1200000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":2400000000,"start_timestamp":1400000000}},"end_timestamp":2400000000,"event_type":"Speech","start_timestamp":1400000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":2600000000,"start_timestamp":1600000000}},"end_timestamp":2600000000,"event_type":"Speech","start_timestamp":1600000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":2800000000,"start_timestamp":1800000000}},"end_timestamp":2800000000,"event_type":"Speech","start_timestamp":1800000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":1.0,"label":"Speech","label_id":53,"segment":{"end_timestamp":3000000000,"start_timestamp":2000000000}},"end_timestamp":3000000000,"event_type":"Speech","start_timestamp":2000000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":0.99,"label":"Speech","label_id":53,"segment":{"end_timestamp":3200000000,"start_timestamp":2200000000}},"end_timestamp":3200000000,"event_type":"Speech","start_timestamp":2200000000}],"rate":16000}
{"channels":1,"events":[{"detection":{"confidence":0.99,"label":"Speech","label_id":53,"segment":{"end_timestamp":3400000000,"start_timestamp":2400000000}},"end_timestamp":3400000000,"event_type":"Speech","start_timestamp":2400000000}],"rate":16000}
Got EOS from element "pipeline0".
Execution ended after 0:00:00.051476798
Setting pipeline to NULL ...

NikhilP · ‎01-17-2023

OK, so maybe I see the problem.

The model_proc file (JSON file) for the aclnet model has "threshold" for individual output labels and each of these are set to 0.5 so may be that is taking effect. In case you have a different theory do let me know.

Thank you,

Nikhil

Peh_Intel · ‎01-17-2023

Hi Nikhil,

Yes, you are right. The threshold defined in the model_proc file is taken effect.

For your information, the labels of the “output_postproc” in the model_proc must be strings or objects with index, label and threshold.

Only if the model-proc contains only array of labels, the threshold value that defined when launching gvaaudiodetect only takes effect.

Example:

"output_postproc": [

{

"layer_name": "output",

"converter": "audio_labels",

"labels": [

"Dog",

"Rooster",

"Pig"

]

}

]

I attach the model_proc (required to rename to JSON file) as well. The output results from these two different model_proc are the same. They are just different in the way of setting threshold value.

Regards,

Peh

NikhilP · ‎01-18-2023

Hi Peh,

The aclnet JSON attached by you gives control to the threshold setting of the "gvaaudiodetect", this is what I was looking for.

Thank you, this helps !

Regards,

Nikhil

Peh_Intel · ‎01-19-2023

Hi Nikhil,

I am glad that I was able to help.

This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.

Regards,

Peh