The CPU/GPU code compiled for the CPU is taken as the 1.0 reference. GTX 295 numbers are for 1 GPU. These tests were run on the cudatest computer. Options ASENS, TILT, MKOW, ANGW, and LONG were disabled in the CPU/GPU version to match the configuration of the original and assembly versions. i3mcml achieves a comparable level of performance.
Relative performance of the OpenCL vs. CUDA versions of the code is summarized in this table. OpenCL version is on average ~ 30% slower.
|