- Hands-On GPU:Accelerated Computer Vision with OpenCV and CUDA
- Bhaumik Vaidya
- 240字
- 2021-08-13 15:48:25
Global memory
All blocks have read and write access to global memory. This memory is slow but can be accessed from anywhere in your device code. The concept of caching is used to speed up access to a global memory. All memories allocated using cudaMalloc will be a global memory. The following simple example demonstrates how you can use a global memory from your program:
#include <stdio.h>
#define N 5
__global__ void gpu_global_memory(int *d_a)
{
d_a[threadIdx.x] = threadIdx.x;
}
int main(int argc, char **argv)
{
int h_a[N];
int *d_a;
cudaMalloc((void **)&d_a, sizeof(int) *N);
cudaMemcpy((void *)d_a, (void *)h_a, sizeof(int) *N, cudaMemcpyHostToDevice);
gpu_global_memory << <1, N >> >(d_a);
cudaMemcpy((void *)h_a, (void *)d_a, sizeof(int) *N, cudaMemcpyDeviceToHost);
printf("Array in Global Memory is: \n");
for (int i = 0; i < N; i++)
{
printf("At Index: %d --> %d \n", i, h_a[i]);
}
return 0;
}
This code demonstrates how you can write in global memory from your device code. The memory is allocated using cudaMalloc from the host code and a pointer to this array is passed as a parameter to the kernel function. The kernel function populates this memory chunk with values of the thread ID. This is copied back to host memory for printing. The result is shown as follows:

As we are using global memory, this operation will be slow. There are advanced concepts to speed up this operation which will be explained later on. In the next section, we will explain local memory and registers that are unique to all threads.
- GAE編程指南
- 實用防銹油配方與制備200例
- Data Analysis with IBM SPSS Statistics
- 區塊鏈:以太坊DApp開發實戰
- 數據結構(C語言)
- Building Minecraft Server Modifications
- Hands-On Microservices with Kotlin
- Windows Embedded CE 6.0程序設計實戰
- Kotlin極簡教程
- ASP.NET 4.0 Web程序設計
- Unity Android Game Development by Example Beginner's Guide
- 計算機系統解密:從理解計算機到編寫高效代碼
- Visual Basic.NET程序設計
- SQL Server 2008數據庫應用技術(第2版)
- HTML 5與CSS 3權威指南(第4版·下冊)