官术网_书友最值得收藏!

Thread-related properties

As seen in earlier sections, blocks and threads can be multidimensional. So, it would be nice to know how many threads and blocks can be launched in parallel in each dimension. There is also a limit on the number of threads per multiprocessor and the number of threads per block. This number can be found by using the maxThreadsPerMultiProcessor and the maxThreadsPerBlock. It is very important in the configuration of kernel parameters. If you launch more threads per block than the maximum threads possible per block, your program can crash. The maximum threads per block in each dimension can be identified by the maxThreadsDim. In the same way, the maximum blocks per grid in each dimension can be identified by using the maxGridSize. Both of them return an array with three values, which shows the maximum value in the x, y, and z dimensions respectively. The following code snippet shows how to use thread-related properties from the CUDA code:

printf(" Maximum number of threads per multiprocessor: %d\n",              device_Property.maxThreadsPerMultiProcessor);
printf(" Maximum number of threads per block: %d\n", device_Property.maxThreadsPerBlock);
printf(" Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
device_Property.maxThreadsDim[0],
device_Property.maxThreadsDim[1],
device_Property.maxThreadsDim[2]);
printf(" Max dimension size of a grid size (x,y,z): (%d, %d, %d)\n",
device_Property.maxGridSize[0],
device_Property.maxGridSize[1],
device_Property.maxGridSize[2]);

There are many other properties available in the  cudaDeviceProp structure. You can check the CUDA programming guide for details of other properties. The output from all preceding code sections combined and executed on the NVIDIA Geforce 940MX GPU and CUDA 9.0 is as follows: 

One question you might ask is why you should be interested in knowing the device properties. The answer is that this will help you in choosing a GPU device with more multiprocessors, if multiple GPU devices are present. If in your application the kernel needs close interaction with the CPU, then you might want your kernel to run on an integrated GPU that shares system memory with the CPU. These properties will also help you in finding the number of blocks and number of threads per block available on your device. This will help you with the configuration of kernel parameters. To show you one use of device properties, suppose you have an application that requires double precision for floating-point operation. Not all GPU devices support this operation. To know whether your device supports double precision floating-point operation and set that device for your application, the following code can be used:

#include <memory>
#include <iostream>
#include <cuda_runtime.h>
// Main Program
int main(void)
{
int device;
cudaDeviceProp device_property;
cudaGetDevice(&device);
printf("ID of device: %d\n", device);
memset(&device_property, 0, sizeof(cudaDeviceProp));
device_property.major = 1;
device_property.minor = 3;
cudaChooseDevice(&device, &device_property);
printf("ID of device which supports double precision is: %d\n", device);
cudaSetDevice(device);
}

This code uses two properties available in the cudaDeviceprop structure that help in identifying whether the device supports double precision operations. These two properties are major and minor. CUDA documentation says us that if major is greater than 1 and minor is greater than 3, then that device will support double precision operations. So, the program's device_property structure is filled with these two values. CUDA also provides the cudaChooseDevice API that helps in choosing a device with particular properties. This API is used on the current device to identify whether it contains these two properties. If it contains properties, then that device is selected for your application using the cudaSetDevice API. If more than one device is present in the system, this code should be written inside a for a loop to iterate over all devices.

Though trivial, this section is very important for you in finding out which applications can be supported by your GPU device and which cannot.

主站蜘蛛池模板: 昆山市| 阜阳市| 沐川县| 防城港市| 龙井市| 定襄县| 如皋市| 沾化县| 敦煌市| 永济市| 丹阳市| 纳雍县| 蛟河市| 淳安县| 怀宁县| 泰州市| 平山县| 凤山市| 习水县| 兴文县| 湘西| 军事| 西乌珠穆沁旗| 开鲁县| 邛崃市| 翁源县| 乌拉特前旗| 庄河市| 英吉沙县| 卫辉市| 沽源县| 宣汉县| 浙江省| 新兴县| 库伦旗| 上蔡县| 清远市| SHOW| 靖江市| 信丰县| 久治县|