This model is trained with cross entropy loss (= softmax loss), as may you know from sphereface and similar papers this result in embeddings that are not discriminative enough (i.e. there are low intra-class distance and high inter-class distance between them) so theoretically this model is not suitable for this problem setting. But in practice this still might work with certain accuracy, although we have not done this kind of experiments. Anyways, the training code for this model is available at https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/action_recognition so you can try to re-train it with appropriate loss (e.g. AM-SoftMax, Triplet, etc).
Tell us if you will find out something interesting!