具体的profile调用图如下:
可以看见compute很快,但是构造函数很慢。
nvidia官网看到几篇类似的帖子,但是没有讲明白怎么解决的:
opencv上的参考文档:
https://docs.opencv.org/3.4/d9/d88/group__cudaarithm__arithm.html#gadea99cb15a715c983bcc2870d65a2e78https://devtalk.nvidia.com/default/topic/1014986/gpu-accelerated-libraries/opencv-dft-vs-gpu-dft-performance-/
OpenCV dft vs. gpu::dft Performance https://devtalk.nvidia.com/default/topic/1020341/transfer-data-cpu-gpu-is-an-issue-/Transfer data CPU/GPU is an issue.. ========================================================采用类的方式,避免频繁初始化(但是未验证数据是否准确),性能有所提升,但是仍然比CPU版本的慢。 cv::Ptr<cv::cuda::DFT> dft_handle = cv::cuda::createDFT(d_mul.size(), 0);
dft_handle->compute(d_mul, d_complex_result, stream);