官术网_书友最值得收藏!

Global memory

All blocks have read and write access to global memory. This memory is slow but can be accessed from anywhere in your device code. The concept of caching is used to speed up access to a global memory. All memories allocated using cudaMalloc will be a global memory. The following simple example demonstrates how you can use a global memory from your program:

#include <stdio.h>
#define N 5

__global__ void gpu_global_memory(int *d_a)
{
d_a[threadIdx.x] = threadIdx.x;
}

int main(int argc, char **argv)
{
int h_a[N];
int *d_a;

cudaMalloc((void **)&d_a, sizeof(int) *N);
cudaMemcpy((void *)d_a, (void *)h_a, sizeof(int) *N, cudaMemcpyHostToDevice);

gpu_global_memory << <1, N >> >(d_a);
cudaMemcpy((void *)h_a, (void *)d_a, sizeof(int) *N, cudaMemcpyDeviceToHost);

printf("Array in Global Memory is: \n");
for (int i = 0; i < N; i++)
{
printf("At Index: %d --> %d \n", i, h_a[i]);
}
return 0;
}

This code demonstrates how you can write in global memory from your device code. The memory is allocated using cudaMalloc from the host code and a pointer to this array is passed as a parameter to the kernel function. The kernel function populates this memory chunk with values of the thread ID. This is copied back to host memory for printing. The result is shown as follows: 

As we are using global memory, this operation will be slow. There are advanced concepts to speed up this operation which will be explained later on. In the next section, we will explain local memory and registers that are unique to all threads.

主站蜘蛛池模板: 盈江县| 漯河市| 大荔县| 射阳县| 赤城县| 阿拉善盟| 桦甸市| 大邑县| 安陆市| 土默特右旗| 张家口市| 弥勒县| 库车县| 北流市| 永年县| 白银市| 平度市| 武威市| 开化县| 凤山县| 石景山区| 监利县| 罗田县| 宁明县| 内江市| 乌鲁木齐市| 应城市| 镇康县| 莫力| 宁城县| 滨海县| 肇州县| 盖州市| 大庆市| 德惠市| 甘肃省| 镇安县| 南靖县| 萝北县| 内丘县| 麻江县|