官术网_书友最值得收藏!

Summary

To summarize, in this chapter, you were introduced to programming concepts in CUDA C and how parallel computing can be done using CUDA. It was shown that CUDA programs can run on any NVIDIA GPU hardware efficiently and in parallel. So, CUDA is both efficient and scalable. The CUDA API functions over and above existing ANSI C functions needed for parallel data computations were discussed in detail. How to call device code from the host code via a kernel call, configuring of kernel parameters, and a passing of parameters to the kernel were also discussed by taking a simple two-variable addition example. It was also shown that CUDA does not guarantee the order in which the blocks or thread will run and which block is assigned to which multi-processor in hardware. Moreover, vector operations, which take advantage of parallel-processing capabilities of GPU and CUDA, were discussed. It can be seen that, by performing vector operations on the GPU, it can improve the throughput drastically, compared to the CPU. In the last section, various common communication patterns followed in parallel programming were discussed in detail. Still, we have not discussed memory architecture and how threads can communicate with one another in CUDA. If one thread needs data of the other thread, then what can be done is also not discussed. So, in the next chapter, we will discuss memory architecture and thread synchronization in detail.

主站蜘蛛池模板: 盱眙县| 微山县| 房产| 乌恰县| 青海省| 宿松县| 淄博市| 武义县| 和顺县| 乐东| 上思县| 驻马店市| 秦皇岛市| 丹凤县| 莒南县| 灵石县| 孝感市| 井冈山市| 建湖县| 札达县| 广州市| 泉州市| 八宿县| 乌审旗| 寿宁县| 大渡口区| 郴州市| 利津县| 荆州市| 东莞市| 兴和县| 隆子县| 子洲县| 沁源县| 安丘市| 邯郸市| 洞头县| 通州区| 巴马| 三原县| 察哈|