官术网_书友最值得收藏!

Thread-related properties

As seen in earlier sections, blocks and threads can be multidimensional. So, it would be nice to know how many threads and blocks can be launched in parallel in each dimension. There is also a limit on the number of threads per multiprocessor and the number of threads per block. This number can be found by using the maxThreadsPerMultiProcessor and the maxThreadsPerBlock. It is very important in the configuration of kernel parameters. If you launch more threads per block than the maximum threads possible per block, your program can crash. The maximum threads per block in each dimension can be identified by the maxThreadsDim. In the same way, the maximum blocks per grid in each dimension can be identified by using the maxGridSize. Both of them return an array with three values, which shows the maximum value in the x, y, and z dimensions respectively. The following code snippet shows how to use thread-related properties from the CUDA code:

printf(" Maximum number of threads per multiprocessor: %d\n",              device_Property.maxThreadsPerMultiProcessor);
printf(" Maximum number of threads per block: %d\n", device_Property.maxThreadsPerBlock);
printf(" Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
device_Property.maxThreadsDim[0],
device_Property.maxThreadsDim[1],
device_Property.maxThreadsDim[2]);
printf(" Max dimension size of a grid size (x,y,z): (%d, %d, %d)\n",
device_Property.maxGridSize[0],
device_Property.maxGridSize[1],
device_Property.maxGridSize[2]);

There are many other properties available in the  cudaDeviceProp structure. You can check the CUDA programming guide for details of other properties. The output from all preceding code sections combined and executed on the NVIDIA Geforce 940MX GPU and CUDA 9.0 is as follows: 

One question you might ask is why you should be interested in knowing the device properties. The answer is that this will help you in choosing a GPU device with more multiprocessors, if multiple GPU devices are present. If in your application the kernel needs close interaction with the CPU, then you might want your kernel to run on an integrated GPU that shares system memory with the CPU. These properties will also help you in finding the number of blocks and number of threads per block available on your device. This will help you with the configuration of kernel parameters. To show you one use of device properties, suppose you have an application that requires double precision for floating-point operation. Not all GPU devices support this operation. To know whether your device supports double precision floating-point operation and set that device for your application, the following code can be used:

#include <memory>
#include <iostream>
#include <cuda_runtime.h>
// Main Program
int main(void)
{
int device;
cudaDeviceProp device_property;
cudaGetDevice(&device);
printf("ID of device: %d\n", device);
memset(&device_property, 0, sizeof(cudaDeviceProp));
device_property.major = 1;
device_property.minor = 3;
cudaChooseDevice(&device, &device_property);
printf("ID of device which supports double precision is: %d\n", device);
cudaSetDevice(device);
}

This code uses two properties available in the cudaDeviceprop structure that help in identifying whether the device supports double precision operations. These two properties are major and minor. CUDA documentation says us that if major is greater than 1 and minor is greater than 3, then that device will support double precision operations. So, the program's device_property structure is filled with these two values. CUDA also provides the cudaChooseDevice API that helps in choosing a device with particular properties. This API is used on the current device to identify whether it contains these two properties. If it contains properties, then that device is selected for your application using the cudaSetDevice API. If more than one device is present in the system, this code should be written inside a for a loop to iterate over all devices.

Though trivial, this section is very important for you in finding out which applications can be supported by your GPU device and which cannot.

主站蜘蛛池模板: 望都县| 浦江县| 和平县| 安泽县| 平和县| 资中县| 平果县| 绥德县| 同江市| 大石桥市| 五莲县| 化州市| 自贡市| 盐池县| 天津市| 靖江市| 淳安县| 游戏| 马关县| 英德市| 会昌县| 宁波市| 林周县| 汉川市| 鸡东县| 马边| 东方市| 井冈山市| 万荣县| 廉江市| 广安市| 武邑县| 阜宁县| 吉林省| 梅河口市| 宜宾县| 衡南县| 甘孜| 武山县| 休宁县| 长海县|