NPTEL : NOC:Introduction to Parallel Programming in OpenMP (Computer Science and Engineering)

Co-ordinators : Dr. Yogish Sabharwal


Lecture 1 - Introduction to Parallel Programming

Lecture 2 - Parallel Architectures and Programming Models

Lecture 3 - Pipelining

Lecture 4 - Superpipelining and VLIW

Lecture 5 - Memory Latency

Lecture 6 - Cache and Temporal Locality

Lecture 7 - Cache, Memory bandwidth and Spatial Locality

Lecture 8 - Intuition for Shared and Distributed Memory architectures

Lecture 9 - Shared and Distributed Memory architectures

Lecture 10 - Interconnection networks in Distributed Memory architectures

Lecture 11 - OpenMP: A parallel Hello World Program

Lecture 12 - Program with Single thread

Lecture 13 - Program Memory with Multiple threads and Multi-tasking

Lecture 14 - Context Switching

Lecture 15 - OpenMP: Basic thread functions

Lecture 16 - OpenMP: About OpenMP

Lecture 17 - Shared Memory Consistency Models and the Sequential Consistency Model

Lecture 18 - Race Conditions

Lecture 19 - OpenMP: Scoping variables and some race conditions

Lecture 20 - OpenMP: thread private variables and more constructs

Lecture 21 - Computing sum: first attempt at parallelization

Lecture 22 - Manual distribution of work and critical sections

Lecture 23 - Distributing for loops and reduction

Lecture 24 - Vector-Vector operations (Dot product)

Lecture 25 - Matrix-Vector operations (Matrix-Vector Multiply)

Lecture 26 - Matrix-Matrix operations (Matrix-Matrix Multiply)

Lecture 27 - Introduction to tasks

Lecture 28 - Task queues and task execution

Lecture 29 - Accessing variables in tasks

Lecture 30 - Completion of tasks and scoping variables in tasks

Lecture 31 - Recursive task spawning and pitfalls

Lecture 32 - Understanding LU Factorization

Lecture 33 - Parallel LU Factorization

Lecture 34 - Locks

Lecture 35 - Advanced Task handling

Lecture 36 - Matrix Multiplication using tasks

Lecture 37 - The OpenMP Shared Memory Consistency Model

Lecture 38 - Applications finite element method

Lecture 39 - Applications deep learning

Lecture 40 - Introduction to MPI and basic calls

Lecture 41 - MPI calls to send and receive data

Lecture 42 - MPI calls for broadcasting data

Lecture 43 - MPI non blocking calls

Lecture 44 - Application distributed histogram updation

Lecture 45 - MPI collectives and MPI broadcast

Lecture 46 - MPI gathering and scattering collectives

Lecture 47 - MPI reduction and alltoall collectives

Lecture 48 - Discussion on MPI collectives design

Lecture 49 - Characteriziation of interconnects

Lecture 50 - Linear arrays 2D mesh and torus

Lecture 51 - d dimensional torus

Lecture 52 - Hypercube

Lecture 53 - Trees and cliques

Lecture 54 - Hockney model

Lecture 55 - Broadcast and Reduce with recursive doubling

Lecture 56 - Scatter and Gather with recursive doubling

Lecture 57 - Reduce scatter and All gather with recursive doubling

Lecture 58 - Discussion of message sizes in analysis

Lecture 59 - Revisiting Reduce scatter on 2D mesh

Lecture 60 - Reduce scatter and Allreduce on the Hypercube

Lecture 61 - Alltoall on the Hypercube

Lecture 62 - Lower bounds

Lecture 63 - Pipeline based algorithm for Allreduce

Lecture 64 - An improved algorithm for Alltoall on the Hypercube using E-cube routing

Lecture 65 - Pipeline based algorithm for Broadcast

Lecture 66 - Introduction to parallel graph algorithms

Lecture 67 - Breadth First Search BFS using matrix algebra

Lecture 68 - BFS Shared memory parallelization using OpenMP

Lecture 69 - Distributed memory settings and data distribution

Lecture 70 - Distributed BFS algorithm

Lecture 71 - Performance considerations

Lecture 72 - Prims Algorithm

Lecture 73 - OpenMP based shared memory parallelization for MST

Lecture 74 - MPI based distributed memory parallelization for MST

Lecture 75 - Sequential Algorithm Adaptation from Prims

Lecture 76 - Parallelization Strategy for Prims algorithm

Lecture 77 - Dry run with the parallel strategy

Lecture 78 - Johnsons algorithm with 1D data distribution

Lecture 79 - Speedup analysis on a grid graph

Lecture 80 - Floyds algorithm for all pair shortest paths

Lecture 81 - Floyds algorithm with 2D data distribution

Lecture 82 - Adaptation to transitive closures

Lecture 83 - Parallelization strategy for connected components

Lecture 84 - Analysis for parallel connected components