This blog website contains educational material likes videos,notes pdf of Computer science & engineering field as well Information Technology.This blog contains resume writing tips and other technological contents,resume writing tips,resume sample,technical contents,c-dac course information etc.
OpenMP is a
set of C/C++ pragmas (or FORTRAN equivalents) which provide the
programmer a high-level front-end interface which get translated as
calls to threads (or other similar entities). The key phrase here is
"higher-level"; the goal is to better enable the programmer
to "think parallel," alleviating him/her of the burden and
distraction of dealing with setting up and coordinating threads. For
example, the OpenMP directive.
OpenMP is
an implementation of multithreading,
a method of parallelizing whereby a master
thread (a series of instructions executed consecutively) forks
a specified number of slave
threads and the system divides a task among them. The threads then
run parallely,
with the runtime
environment
allocating threads to different processors.
he section
of code that is meant to run in parallel is marked accordingly, with
a compiler directive that will cause the threads to form before the
section is executed.
Each
thread has an id
attached to it which can be obtained using a function
(called omp_get_thread_num()).
The thread id is an integer, and the master thread has an id of 0.
After the execution of the parallelized code, the threads join
back into the master thread, which continues onward to the end of the
program.
By
default, each thread executes the parallelized section of code
independently. Work-sharing
constructs can be
used to divide a task among the threads so that each thread executes
its allocated part of the code. Both task
parallelism
and data
parallelism
can be achieved using OpenMP in this way.
Core
elements
The core
elements of OpenMP are the constructs for thread creation, workload
distribution (work sharing), data-environment management, thread
synchronization, user-level runtime routines and environment
variables.
Thread
creation
The pragma
omp parallel
is used to fork additional threads to carry out the work enclosed in
the construct in parallel. The original thread will be denoted as
master thread
with thread ID 0.
OpenMP & Reduction
A reduction simplifies something complex into something which is understandable and uncomplicated.
Therefore, we can argue that the following example is a reduction
It reduces the long expression into the simpler one.
Loops
In C++, we can write the previous examples with for loops:
sum=0;for(autoi=0;i<10;i++){sum+=a[i]}
and
product=1;for(autoi=0;i<10;i++){product*=a[i]}
OpenMP and reduction
OpenMP is specialized into parallelization of for loops. But a parallelization of the previous for loops is tricky. There is a shared variable (sum / product / reduction) which is modified in every iteration. If we are not careful when parallelizing such for loops, we might introduce data races.
For example, if we would parallelize the previous for loops by simply adding
#pragma omp parallel for
we would introduce a data race, because multiple threads could try to update the shared variable at the same time.
But for loops which represent a reduction are quite common. Therefore, OpenMP has the special reduction clause which can express the reduction of a for loop.
In order to specify the reduction in OpenMP, we must provide
an operation (+ / * / o)
and a reduction variable (sum / product / reduction). This variable holds the result of the computation.
For example, the following for loop
For
example, the following for loop
sum = 0;
for (auto i = 0; i < 10; i++)
{
sum += a[i]
}
can be parallelized with the reduction
clause where the operation is equal to +
and the reduction variable is equal to sum:
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (auto i = 0; i < 10; i++)
{
sum += a[i]
}
Implementation
with Reduction Clause
How does OpenMP parallelize a for loop declared
with a reduction clause? OpenMP creates a team of threads and then
shares the iterations of the for loop between the threads. Each
thread has its own local copy of the reduction
variable. The thread modifies only the local copy of this variable.
Therefore, there is no data race. When the threads join together, all
the local copies of the reduction variable are combined to the global
shared variable.
For example, let us parallelize the
following for loop
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (auto i = 0; i < 9; i++)
{
sum += a[i]
}
and let there be three threads in the team of threads. Each thread
has sumloc, which is a local copy of the
reduction variable. The threads then perform the following
computations
Thread 1
sumloc_1 = a[0] + a[1] + a[2]
Thread 2
sumloc_2 = a[3] + a[4] + a[5]
Thread 3
sumloc_3 = a[6] + a[7] + a[8]
In the end, when the treads join together, OpenMP reduces local
copies to the shared reduction variable
sum = sumloc_1 + sumloc_2 + sumloc_3
Example
(C program): Display "Hello, world." using multiple
threads.
#include
<stdio.h>
#include
<omp.h>
int
main(void)
{
#pragma
omp parallel
printf("Hello,
world.\n");
return
0;
}
Use
flag -fopenmp to compile using GCC:
$ gcc
-fopenmp hello.c -o hello
Output
on a computer with two cores, and thus two threads:
Hello,
world.
Hello,
world.
Fig---multithreading
where the master thread forks off a number of threads which execute
blocks of code in parallel.
Comments
Post a Comment