- Hands-On GPU:Accelerated Computer Vision with OpenCV and CUDA
- Bhaumik Vaidya
- 189字
- 2021-08-13 15:48:20
CUDA program structure
We have seen a very simple Hello, CUDA! program earlier, that showcased some important concepts related to CUDA programs. A CUDA program is a combination of functions that are executed either on the host or on the GPU device. The functions that do not exhibit parallelism are executed on the CPU, and the functions that exhibit data parallelism are executed on the GPU. The GPU compiler segregates these functions during compilation. As seen in the previous chapter, functions meant for execution on the device are defined using the __global__ keyword and compiled by the NVCC compiler, while normal C host code is compiled by the C compiler. A CUDA code is basically the same ANSI C code with the addition of some keywords needed for exploiting data parallelism.
So, in this section, a simple two-variable addition program is taken to explain important concepts related to CUDA programming, such as kernel calls, passing parameters to kernel functions from host to device, the configuration of kernel parameters, CUDA APIs needed to exploit data parallelism, and how memory allocation takes place on the host and the device.
- iOS Game Programming Cookbook
- Qt 5 and OpenCV 4 Computer Vision Projects
- What's New in TensorFlow 2.0
- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- Photoshop智能手機APP UI設計之道
- YARN Essentials
- Kotlin Standard Library Cookbook
- Python機器學習基礎教程
- 運用后端技術處理業(yè)務邏輯(藍橋杯軟件大賽培訓教材-Java方向)
- C++從入門到精通(第5版)
- 計算機應用基礎案例教程
- Building Microservices with .NET Core
- 貫通Tomcat開發(fā)
- SQL Server實例教程(2008版)
- 計算機系統(tǒng)解密:從理解計算機到編寫高效代碼