官术网_书友最值得收藏!

CUDA API functions

In the variable addition program, we have encountered some functions or keywords that are not familiar to regular C or C++ programmers. These keywords and functions include __global__  , cudaMalloc, cudaMemcpy, and cudaFree. So, in this section, these functions are explained in detail one by one:

  • __global__ : It is one of three qualifier keywords, along with __device__  and __host__ . This keyword indicates that a function is declared as a device function and will execute on the device when called from the host. It should be kept in mind that this function can only be called from the host. If you want your function to execute on the device and called from the device function, then you have to use the __device__ keyword. The __host__  keyword is used to define host functions that can only be called from other host functions. This is similar to normal C functions. By default, all functions in a program are host functions. Both __host__ and __device__ can be simultaneously used to define any function. It generates two copies of the same function. One will execute on the host, and the other will execute on the device.

 

  • cudaMalloc: It is similar to the Malloc function used in C for dynamic memory allocation. This function is used to allocate a memory block of a specific size on the device. The syntax of cudaMalloc with an example is as follows:
cudaMalloc(void ** d_pointer, size_t size)
Example: cudaMalloc((void**)&d_c, sizeof(int));

As shown in the preceding example code, it allocates a memory block of size equal to the size of one integer variable and returns the pointer d_c, which points to this memory location. 

  • cudaMemcpy: This function is similar to the Memcpy function in C. It is used to copy one block of memory to other blocks on a host or a device. It has the following syntax:
cudaMemcpy ( void * dst_ptr, const void * src_ptr, size_t size, enum cudaMemcpyKind kind )
Example: cudaMemcpy(&h_c, d_c, sizeof(int), cudaMemcpyDeviceToHost);

This function has four arguments. The first and second arguments are the destination pointer and the source pointer, which point to the host or device memory location. The third argument indicates the size of the copy and the last argument indicates the direction of  the copy. It can be from host to device, device to device, host to host, or device to host. But be careful, as you have to match this direction with the appropriate pointers as the first two arguments. As shown in the example, we are copying a block of one integer variable from the device to the host by specifying the device pointer d_c as the source, and the host pointer h_c as a destination.

  •  cudaFree: It is similar to the free function available in C. The syntax of cudaFree is as follows:
cudaFree ( void * d_ptr )
Example: cudaFree(d_c)

It frees the memory space pointed to by d_ptr. In the example code, it frees the memory location pointed to by d_c. Please make sure that d_c is allocated memory, using  cudaMalloc to free it using cudaFree.

There are many other keywords and functions available in CUDA over and above existing ANSI C functions. We will be frequently using only these three functions, and hence they are discussed in this section. For more details, you can always visit the CUDA programming guide.

主站蜘蛛池模板: 天水市| 高州市| 商南县| 葫芦岛市| 澜沧| 和顺县| 吴忠市| 定远县| 临武县| 时尚| 扬中市| 武城县| 都江堰市| 东台市| 昌都县| 农安县| 孝义市| 林州市| 重庆市| 图木舒克市| 麟游县| 化隆| 镇宁| 藁城市| 涞水县| 西青区| 台北市| 荔波县| 云南省| 潜山县| 鄂托克旗| 潜山县| 车致| 阿瓦提县| 张家界市| 望城县| 康马县| 抚顺县| 香格里拉县| 濮阳市| 西林县|