Is a SharedMatrix implicitly synchronized by hppWait i.e., hppiGetMatrixData is conceptually not needed in this case? I wonder if this is correct in particular when CPU/GPU copies cannot be omitted (due to the device type / extension used).
The reference manual (ipp_async_manual.pdf) states that "All Intel® IPP Asynchronous C/C++ library functions, except for setup and release functions, are asynchronous.". However, the above question remains unclear to me. For example, the "ipp_async_sobel" source code example calls hppWait, but never uses hppiGetMatrixData (as of the December 2013 Preview of Intel IPP). Is it conceptually more correct to use hppiGetMatrixData even when it performs no actual work due to "zero copy"?
All hppiMatrix instances of all types (ordinary matrix created with hppiCreateMatrix, Zero-Copy matrix created with hppiCreateSharedMatrix and virtual matrix) that are used in a pipeline are synchronized by the hppWait call at the end of the pipeline. It means that host-side buffer associated with hppiMatrix instance during hppiCreate* call will contain updated values, if the matrix was used as a destination in some call in the pipeline. hppiGetMatrixData call is needed only when the corresponding hppiMatrix was created as a virtual matrix and doesn't have associated host-side buffer.
In the manual, it's stated that "Similarly, during execution, the application should treat any data buffers used by these functions as not available (if they are destinations) or locked (if they are sources). The behavior is undefined if the application modifies any of the data buffers before the synchronization is complete." So after the synchronization, data buffers can be used and will contain up-to-date data.