Wednesday, October 21, 2009

What is CUDA (GPU Programming) ?

The graphics cards that we use for gaming/visual enhancement has two basic components: a Graphics Processing Unit (GPU) and off-chip DRAM. GPUs are designed for compute intensive jobs, where CPUs are two slow. On the other hand CPUs are designed for data caching and controlling, where GPUs are useless.

GPUs in general have a highly parallel architecture and in particular some of NVIDIA’s GPUs have 240 cores per processor (compare this with modern CPUs: 2, 4 or 8 cores). With such a parallel architecture, GPUs provide excellent computational platform, not only for graphical applications but any application where we have significant data parallelism. The GPUs thus are not limited to its use as a graphics engine but as parallel computing architecture capable of performing floating point operations at the rate of Tera bytes/s. People have realized the potential of GPUs for highly computational tasks, and have been working in general purpose computation on GPUs (GPGPU) for a long time. However, life before NVIDIA’s Compute Unified Device Architecture (CUDA) was extremely difficult for the programmer, since the programmers need to call graphics API (Open GL, Open MP, Open CV etc.). This also has a very slow learning rate. CUDA solved all these problems by providing a hardware abstraction, hiding the inner details of the GPUs, and the programmer is freed from the burden of learning graphics programming. CUDA is C language with some extensions for processing on GPUs. The user writes a C code, while the compiler bifurcates the code into two portions. One portion is delivered to CPU (because CPU is best for such tasks), while the other portion, involving extensive calculations, is delivered to the GPU(s), that executes the code in parallel. Because C is a familiar programming language, CUDA results in very steep learning curve and hence it is becoming a favorite tool for accelerating various applications. NVIDIA's CUDA SDK is being employed in a plethora of fields right from the computational finance to Neural network and fuzzy logic to simulations for Nanotechnology.

CUDA has several advantages over traditional general purpose computation on GPUs (GPGPU) using graphics APIs.

· Scattered reads – code can read to arbitrary addresses in memory.

· It is high level-basically an extension to C language. So the learning rate
is much higher as compared to the traditional GPGPU.

Shared memory – CUDA exposes a fast-shared memory region (16KB
in size) that can be shared amongst threads. This can be used as a
user-managed cache, enabling higher bandwidth than is possible using
texture lookups.

· Faster downloads and readbacks to and from the GPU

· Full support for integer and bit wise operations

In short CUDA lets you exploit these tiny supercomputers i.e GPUs, that ships with your graphics cards, and lets you accelerate your applications significantly ,some time as high as 100 times and even more depending upon how smartly you have exploited the resources of GPUs.