Implementation of parallel reduction using Min, Max, Sum Operations.
- Get link
- X
- Other Apps
Experiment
No: HPC-01
TITLE:
- Implement
parallel reduction using Min, Max, Sum Operations.
Problem
Statement: -
To study and implementation of directive based parallel programming
model.
Requirements
(Hw/Sw): PC, gcc, openMP tool, Ubuntu system,C/C++.
Theory:-
OpenMP:
OpenMP is a
set of C/C++ pragmas (or FORTRAN equivalents) which provide the
programmer a high-level front-end interface which get translated as
calls to threads (or other similar entities). The key phrase here is
"higher-level"; the goal is to better enable the programmer
to "think parallel," alleviating him/her of the burden and
distraction of dealing with setting up and coordinating threads. For
example, the OpenMP directive.
OpenMP is
an implementation of multithreading,
a method of parallelizing whereby a master
thread (a series of instructions executed consecutively) forks
a specified number of slave
threads and the system divides a task among them. The threads then
run parallely,
with the runtime
environment
allocating threads to different processors.
he section
of code that is meant to run in parallel is marked accordingly, with
a compiler directive that will cause the threads to form before the
section is executed.
Each
thread has an id
attached to it which can be obtained using a function
(called
omp_get_thread_num()
).
The thread id is an integer, and the master thread has an id of 0.
After the execution of the parallelized code, the threads join
back into the master thread, which continues onward to the end of the
program.
By
default, each thread executes the parallelized section of code
independently. Work-sharing
constructs can be
used to divide a task among the threads so that each thread executes
its allocated part of the code. Both task
parallelism
and data
parallelism
can be achieved using OpenMP in this way.
Core
elements
The core
elements of OpenMP are the constructs for thread creation, workload
distribution (work sharing), data-environment management, thread
synchronization, user-level runtime routines and environment
variables.
The pragma
omp parallel
is used to fork additional threads to carry out the work enclosed in
the construct in parallel. The original thread will be denoted as
master thread
with thread ID 0.
- OpenMP & Reduction
A reduction simplifies something complex into something which is understandable and uncomplicated.
Therefore, we can argue that the following example is a reduction
It simplifies the long expression
into something shorter –
sum
. A similar argument argues that also the following example is a reduction
It reduces the long expression into the simpler one.
Loops
In C++, we can write the previous examples with for loops:
and
OpenMP and reduction
OpenMP is specialized into parallelization of for loops. But a parallelization of the previous for loops is tricky. There is a shared variable (
sum
/ product
/ reduction
) which is modified in every iteration. If we are not careful when parallelizing such for loops, we might introduce data races.
For example, if we would parallelize the previous for loops by simply adding
we would introduce a data race, because multiple threads could try to update the shared variable at the same time.
But for loops which represent a reduction are quite common. Therefore, OpenMP has the special
reduction
clause which can express the reduction of a for loop.
In order to specify the reduction in OpenMP, we must provide
- an operation (
+
/*
/o
) - and a reduction variable (
sum
/product
/reduction
). This variable holds the result of the computation.
For example, the following for loop
For
example, the following for loop
sum = 0;
for (auto i = 0; i < 10; i++)
{
sum += a[i]
}
can be parallelized with the
reduction
clause where the operation is equal to +
and the reduction variable is equal to sum
:sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (auto i = 0; i < 10; i++)
{
sum += a[i]
}
Implementation with Reduction Clause
How does OpenMP parallelize a for loop declared
with a reduction clause? OpenMP creates a team of threads and then
shares the iterations of the for loop between the threads. Each
thread has its own local copy of the reduction
variable. The thread modifies only the local copy of this variable.
Therefore, there is no data race. When the threads join together, all
the local copies of the reduction variable are combined to the global
shared variable.
For example, let us parallelize the
following for loop
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (auto i = 0; i < 9; i++)
{
sum += a[i]
}
and let there be three threads in the team of threads. Each thread
has
sumloc
, which is a local copy of the
reduction variable. The threads then perform the following
computations-
Thread 1
sumloc_1
= a[0] + a[1] + a[2]
-
Thread 2
sumloc_2
= a[3] + a[4] + a[5]
-
Thread 3
sumloc_3
= a[6] + a[7] + a[8]
In the end, when the treads join together, OpenMP reduces local
copies to the shared reduction variable
sum = sumloc_1 + sumloc_2 + sumloc_3
Example
(C program): Display "Hello, world." using multiple
threads.
#include
<stdio.h>
#include
<omp.h>
int
main(void)
{
#pragma
omp parallel
printf("Hello,
world.\n");
return
0;
}
Use
flag -fopenmp to compile using GCC:
$ gcc
-fopenmp hello.c -o hello
Output
on a computer with two cores, and thus two threads:
Hello,
world.
Hello,
world.
Fig---multithreading
where the master thread forks off a number of threads which execute
blocks of code in parallel.
Input :
Output :
Conclusion:We
have implemented parallel reduction using Min, Max, Sum
Operations.
forks
high performance computing
Implement parallel reduction using Min
master thread
Max
MPI
multithreading
OpenMp
parallel programming model
parallele reduction
Sum Operations.
thread
- Get link
- X
- Other Apps
Popular Posts
Vivo Z1 Pro will be the first smartphone that comes with 712 chipset, Learn the specialty of this processor.
Vivo Z1 Pro is going to launch in India on July 3. Certain specifications of the phone include the Snapdragon 712 chipset with Enhanced GPU and 5000 mAh battery. The latest Social Media Review reveals that the company has focused on gamers for its new phone. Better cooling, better connectivity, Net Turbo and Turbo have also given many other features, which will further improve the phone's performance. Vivo Z1 Pro will provide users with a good gaming experience with a special game countdown without any interruption. You can also change your PUBG Voice when you are gaming. After the game has started, the phone will also provide 4D vibration and 3D surround sound. Other specification of the phone has a punch-hole design with 32MP sleigh shooter and a triple setup on the rear. The 18W Fast-Charging Capabilities is also given in the phone with the bigger battery. The phone will work on Android Pie Function OS 9. The main feat
Comments
Post a Comment