Skip to main content

Implementation of parallel reduction using Min, Max, Sum Operations.



Experiment No: HPC-01
TITLE: - Implement parallel reduction using Min, Max, Sum Operations.


Problem Statement: - To study and implementation of directive based parallel programming model.
Objective:
  • To study openMP a parallel programming model.
Requirements (Hw/Sw): PC, gcc, openMP tool, Ubuntu system,C/C++.
Theory:-
OpenMP:
OpenMP is a set of C/C++ pragmas (or FORTRAN equivalents) which provide the programmer a high-level front-end interface which get translated as calls to threads (or other similar entities). The key phrase here is "higher-level"; the goal is to better enable the programmer to "think parallel," alleviating him/her of the burden and distraction of dealing with setting up and coordinating threads. For example, the OpenMP directive.
OpenMP is an implementation of multithreading, a method of parallelizing whereby a master thread (a series of instructions executed consecutively) forks a specified number of slave threads and the system divides a task among them. The threads then run parallely, with the runtime environment allocating threads to different processors.
he section of code that is meant to run in parallel is marked accordingly, with a compiler directive that will cause the threads to form before the section is executed.
Each thread has an id attached to it which can be obtained using a function (called omp_get_thread_num()). The thread id is an integer, and the master thread has an id of 0. After the execution of the parallelized code, the threads join back into the master thread, which continues onward to the end of the program.
By default, each thread executes the parallelized section of code independently. Work-sharing constructs can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both task parallelism and data parallelism can be achieved using OpenMP in this way.

Core elements
The core elements of OpenMP are the constructs for thread creation, workload distribution (work sharing), data-environment management, thread synchronization, user-level runtime routines and environment variables.
Thread creation
The pragma omp parallel is used to fork additional threads to carry out the work enclosed in the construct in parallel. The original thread will be denoted as master thread with thread ID 0.


  • OpenMP & Reduction



Therefore, we can argue that the following example is a reduction
sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8] + a[9]
It simplifies the long expression
a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8] + a[9]
into something shorter – sum. A similar argument argues that also the following example is a reduction
product = a[0] * a[1] * a[2] * a[3] * a[4] * a[5] * a[6] * a[7] * a[8] * a[9]
It reduces the long expression into the simpler one.

Loops

In C++, we can write the previous examples with for loops:
sum = 0;
for (auto i = 0; i < 10; i++)
{
    sum += a[i]
}
and
product = 1;
for (auto i = 0; i < 10; i++)
{
    product *= a[i]
}
OpenMP and reduction
OpenMP is specialized into parallelization of for loops. But a parallelization of the previous for loops is tricky. There is a shared variable (sum / product / reduction) which is modified in every iteration. If we are not careful when parallelizing such for loops, we might introduce data races.
For example, if we would parallelize the previous for loops by simply adding
#pragma omp parallel for 
we would introduce a data race, because multiple threads could try to update the shared variable at the same time.
But for loops which represent a reduction are quite common. Therefore, OpenMP has the special reduction clause which can express the reduction of a for loop.
In order to specify the reduction in OpenMP, we must provide
  • an operation (+ / * / o)
  • and a reduction variable (sum / product / reduction). This variable holds the result of the computation.
For example, the following for loop




For example, the following for loop
sum = 0;
for (auto i = 0; i < 10; i++)
{
sum += a[i]
}
can be parallelized with the reduction clause where the operation is equal to + and the reduction variable is equal to sum:
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (auto i = 0; i < 10; i++)
{
sum += a[i]
}

Implementation with Reduction Clause

How does OpenMP parallelize a for loop declared with a reduction clause? OpenMP creates a team of threads and then shares the iterations of the for loop between the threads. Each thread has its own local copy of the reduction variable. The thread modifies only the local copy of this variable. Therefore, there is no data race. When the threads join together, all the local copies of the reduction variable are combined to the global shared variable.
For example, let us parallelize the following for loop
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (auto i = 0; i < 9; i++)
{
sum += a[i]
}
and let there be three threads in the team of threads. Each thread has sumloc, which is a local copy of the reduction variable. The threads then perform the following computations
  • Thread 1
sumloc_1 = a[0] + a[1] + a[2]
  • Thread 2
sumloc_2 = a[3] + a[4] + a[5]
  • Thread 3
sumloc_3 = a[6] + a[7] + a[8]
In the end, when the treads join together, OpenMP reduces local copies to the shared reduction variable
sum = sumloc_1 + sumloc_2 + sumloc_3




Example (C program): Display "Hello, world." using multiple threads.
#include <stdio.h>
#include <omp.h>
int main(void)
{
#pragma omp parallel
printf("Hello, world.\n");
return 0;
}
Use flag -fopenmp to compile using GCC:
$ gcc -fopenmp hello.c -o hello
Output on a computer with two cores, and thus two threads:
Hello, world.
Hello, world.



Fig---multithreading where the master thread forks off a number of threads which execute blocks of code in parallel.




Input :



Output :


 PARALLEL REDUCTION





Conclusion:We have implemented parallel reduction using Min, Max, Sum 
Operations.

Comments

Popular Posts

Vivo Z1 Pro will be the first smartphone that comes with 712 chipset, Learn the specialty of this processor.