官术网_书友最值得收藏!

CUDA API functions

In the variable addition program, we have encountered some functions or keywords that are not familiar to regular C or C++ programmers. These keywords and functions include __global__  , cudaMalloc, cudaMemcpy, and cudaFree. So, in this section, these functions are explained in detail one by one:

  • __global__ : It is one of three qualifier keywords, along with __device__  and __host__ . This keyword indicates that a function is declared as a device function and will execute on the device when called from the host. It should be kept in mind that this function can only be called from the host. If you want your function to execute on the device and called from the device function, then you have to use the __device__ keyword. The __host__  keyword is used to define host functions that can only be called from other host functions. This is similar to normal C functions. By default, all functions in a program are host functions. Both __host__ and __device__ can be simultaneously used to define any function. It generates two copies of the same function. One will execute on the host, and the other will execute on the device.

 

  • cudaMalloc: It is similar to the Malloc function used in C for dynamic memory allocation. This function is used to allocate a memory block of a specific size on the device. The syntax of cudaMalloc with an example is as follows:
cudaMalloc(void ** d_pointer, size_t size)
Example: cudaMalloc((void**)&d_c, sizeof(int));

As shown in the preceding example code, it allocates a memory block of size equal to the size of one integer variable and returns the pointer d_c, which points to this memory location. 

  • cudaMemcpy: This function is similar to the Memcpy function in C. It is used to copy one block of memory to other blocks on a host or a device. It has the following syntax:
cudaMemcpy ( void * dst_ptr, const void * src_ptr, size_t size, enum cudaMemcpyKind kind )
Example: cudaMemcpy(&h_c, d_c, sizeof(int), cudaMemcpyDeviceToHost);

This function has four arguments. The first and second arguments are the destination pointer and the source pointer, which point to the host or device memory location. The third argument indicates the size of the copy and the last argument indicates the direction of  the copy. It can be from host to device, device to device, host to host, or device to host. But be careful, as you have to match this direction with the appropriate pointers as the first two arguments. As shown in the example, we are copying a block of one integer variable from the device to the host by specifying the device pointer d_c as the source, and the host pointer h_c as a destination.

  •  cudaFree: It is similar to the free function available in C. The syntax of cudaFree is as follows:
cudaFree ( void * d_ptr )
Example: cudaFree(d_c)

It frees the memory space pointed to by d_ptr. In the example code, it frees the memory location pointed to by d_c. Please make sure that d_c is allocated memory, using  cudaMalloc to free it using cudaFree.

There are many other keywords and functions available in CUDA over and above existing ANSI C functions. We will be frequently using only these three functions, and hence they are discussed in this section. For more details, you can always visit the CUDA programming guide.

主站蜘蛛池模板: 城步| 永福县| 富裕县| 大悟县| 峡江县| 罗山县| 淮滨县| 长兴县| 冀州市| 油尖旺区| 安西县| 金乡县| 太仓市| 大方县| 阿尔山市| 温泉县| 鱼台县| 万州区| 大城县| 建瓯市| 开远市| 塔城市| 浏阳市| 旬邑县| 保定市| 榆树市| 巢湖市| 滨州市| 东安县| 新邵县| 通山县| 余江县| 上高县| 西城区| 灵寿县| 张家川| 平乡县| 彰武县| 西贡区| 麻江县| 浦东新区|