CUDA’s programming model differs significantly from singlethreaded CPU code and even the parallel code that some programmers began writing for GPUs before CUDA. In a single-threaded model, the CPU fetches a single instruction stream that operates serially on the data. A superscalar CPU may route the instruction stream through multiple pipelines, but there’s still only one instruction stream, and the degree of instruction parallelism is severely limited by data and resource dependencies. Even the best four-, five-, or six-way superscalar CPUs struggle to average 1.5 instructions per cycle, which is why superscalar designs rarely venture beyond four-way pipelining. Single-instruction multipledata (SIMD) extensions permit many CPUs to extract some data parallelism from the code, but the practical limit is usually three or four operations per cycle.
Tidak ada komentar:
Posting Komentar