NPTEL : NOC:GPU Architectures and Programming (Computer Science and Engineering)

Co-ordinators : Prof. Soumyajit Dey


Lecture 1 - Review of basic COA w.r.t. performance

Lecture 2 - Review of basic COA w.r.t. performance

Lecture 3 - Review of basic COA w.r.t. performance

Lecture 4 - Review of basic COA w.r.t. performance

Lecture 5 - Intro to GPU architectures

Lecture 6 - Intro to GPU architectures

Lecture 7 - Intro to GPU architectures

Lecture 8 - Intro to GPU architectures

Lecture 9 - Intro to CUDA programming

Lecture 10 - Intro to CUDA programming (Continued...)

Lecture 11 - Intro to CUDA programming (Continued...)

Lecture 12 - Intro to CUDA programming (Continued...)

Lecture 13 - Multi-dimensional mapping of dataspace; Synchronization

Lecture 14 - Multi-dimensional mapping of dataspace; Synchronization (Continued...)

Lecture 15 - Multi-dimensional mapping of dataspace; Synchronization (Continued...)

Lecture 16 - Warp Scheduling and Divergence

Lecture 17 - Warp Scheduling and Divergence (Continued...)

Lecture 18 - Warp Scheduling and Divergence (Continued...)

Lecture 19 - Memory Access Coalescing

Lecture 20 - Memory Access Coalescing (Continued...)

Lecture 21 - Memory Access Coalescing (Continued...)

Lecture 22 - Memory Access Coalescing (Continued...)

Lecture 23 - Memory Access Coalescing (Continued...)

Lecture 24 - Memory Access Coalescing (Continued...)

Lecture 25 - Memory Access Coalescing (Continued...)

Lecture 26 - Memory Access Coalescing (Continued...)

Lecture 27 - Memory Access Coalescing (Continued...)

Lecture 28 - Optimizing Reduction Kernels

Lecture 29 - Optimizing Reduction Kernels (Continued...)

Lecture 30 - Optimizing Reduction Kernels (Continued...)

Lecture 31 - Optimizing Reduction Kernels (Continued...)

Lecture 32 - Optimizing Reduction Kernels (Continued...)

Lecture 33 - Optimizing Reduction Kernels (Continued...)

Lecture 34 - Optimizing Reduction Kernels (Continued...)

Lecture 35 - Kernel Fusion, Thread and Block Coarsening

Lecture 36 - Kernel Fusion, Thread and Block Coarsening (Continued...)

Lecture 37 - Kernel Fusion, Thread and Block Coarsening (Continued...)

Lecture 38 - Kernel Fusion, Thread and Block Coarsening (Continued...)

Lecture 39 - Kernel Fusion, Thread and Block Coarsening (Continued...)

Lecture 40 - Kernel Fusion, Thread and Block Coarsening (Continued...)

Lecture 41 - OpenCL - Runtime System

Lecture 42 - OpenCL - Runtime System (Continued...)

Lecture 43 - OpenCL - Runtime System (Continued...)

Lecture 44 - OpenCL - Runtime System (Continued...)

Lecture 45 - OpenCL - Runtime System (Continued...)

Lecture 46 - OpenCL - Runtime System (Continued...)

Lecture 47 - OpenCL - Runtime System (Continued...)

Lecture 48 - OpenCL - Heterogeneous Computing

Lecture 49 - OpenCL - Heterogeneous Computing (Continued...)

Lecture 50 - OpenCL - Heterogeneous Computing (Continued...)

Lecture 51 - OpenCL - Heterogeneous Computing (Continued...)

Lecture 52 - OpenCL - Heterogeneous Computing (Continued...)

Lecture 53 - OpenCL - Heterogeneous Computing (Continued...)

Lecture 54 - Efficient Neural Network Training/Inferencing

Lecture 55 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 56 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 57 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 58 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 59 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 60 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 61 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 62 - Efficient Neural Network Training/Inferencing (Continued...)

Lecture 63 - Efficient Neural Network Training/Inferencing (Continued...)