Implementation of parallel algorithm for Vector addition and matrix multiplication
- Get link
- X
- Other Apps
Experiment
No: HPC-04
TITLE:
- Implementation
of parallel algorithm for Vector addition and matrix multiplication
Problem
Statement: - To
study and implementation of directive based parallel programming
model.
Objective:
-
To study open MP a parallel programming model.
Requirements
(Hw/Sw): PC, gcc,
openMP tool, Ubuntu system,C/C++.
Theory:-
OpenMP:
OpenMP
is a set of C/C++ pragmas (or FORTRAN equivalents) which provide the
programmer a high-level front-end interface which get translated as
calls to threads (or other similar entities). The key phrase here is
"higher-level"; the goal is to better enable the programmer
to "think parallel," alleviating him/her of the burden and
distraction of dealing with setting up and coordinating threads. For
example, the OpenMP directive.
OpenMP
is an implementation of multithreading,
a method of parallelizing whereby a master
thread (a series of instructions executed consecutively) forks
a specified number of slave
threads and the system divides a task among them. The threads then
run parallely,
with the runtime
environment
allocating threads to different processors.
Vector
Addition
Vector
addition is the operation of adding two or more vectors together into
a vector
sum.
The
so-called parallelogram law gives the rule for vector addition of two
or more vectors. For two vectors A and B, the vector sum A+B is
obtained by placing them head to tail and drawing the vector from the
free tail to the free head. In Cartesian coordinates, vector addition
can be performed simply by adding the corresponding components of the
vectors, so if
A=(a_1,a_2,...,a_n)
and B=(b_1,b_2,...,b_n),
A+B=(a_1+b_1,a_2+b_2,...,a_n+b_n).
OpenMP
on shared memory multicore machines creates threads
that execute concurrently. The creation of these threads is implicit
and built by the compiler when you insert special directives in the C
code called pragmas.
The code that begins executing main() is considered thread 0. At
certain points in the code, you can designate that more threads
should be used in parallel and exucute concurrently. This is called
forking
threads.
In
the code below, you will see this pragma, which is implicitly forking
the threads to complete the computation on equal chunks of the
orginal array:
#pragma
omp parallel for shared(a, b, c) private(i) schedule(static, 2)
The
shared
keyword indicates that the arrays are shared in the same memory space
for all threads, and the private
keyword indicates that each thread will have its own copy of the
index counter i that it will increment.
-
Data Scoping
-
private (variable list)
-Specifies
variables local to each thread
-
firstprivate(variable list)
-Similar
to the private
-Private
variables are initialized to variable value before the parallel
directive
-
shared (variable list)
-Specifies
variables that are shared among all the threads
-
default (data scoping specifier)
-Default
data scoping specifiermay be shared or none
Parallel
Matrix Multiplication
-
A matrix is a set of numerical and non-numerical data arranged in a fixed number of rows and column. Matrix multiplication is an important multiplication design in parallel computation.
-
The
definition
of matrix multiplication
is that if C
= AB
for an n
× m
matrix A
and an m
× p
matrix B,
then C
is an n
× p
matrix with entries
.
From
this, a simple algorithm can be constructed which loops over the
indices i from 1 through n and j from 1 through
p, computing the above using a nested loop:
The OpenMP-enabled parallel code exploits coarse grain parallelism, which makes use of the cores available in a multicore machine. Basically, we have parallelized the outermost loop which drives the accesses to the result matrix a in the first dimension. Thus, work-sharing among the thrads is such that different threads will calculate different rows of the result matrix a.
The OpenMP-enabled parallel code exploits coarse grain parallelism, which makes use of the cores available in a multicore machine. Basically, we have parallelized the outermost loop which drives the accesses to the result matrix a in the first dimension. Thus, work-sharing among the thrads is such that different threads will calculate different rows of the result matrix a.
- Note that different threads will write different parts of the result in the array a, so we dont’t get any problems during the parallel execution. Note that accesses to matrices b and c are read-only and do not introduce any problems either. In the parallelization strategy, we share all the matrices and everything will work fine;
- in addtion. all the indices of the loops are privatized because every thread needs to do it’s own iterations of the loops.
-
Input: matrices A and B
-
Let C be a new matrix of the appropriate size
-
For i from 1 to n:
-
For j from 1 to p:
-
Let sum = 0
-
For k from 1 to m:
-
Set sum ← sum + Aik × Bkj
-
-
Set Cij ← sum
-
-
Return
Input
:
Output
:
Conclusion:
concurrency
FORTRAN
MPI
multithreading
OpenMp
parallel matrix multiplication
parallel programming model
parallele algorithm
vector
- Get link
- X
- Other Apps
Comments
Post a Comment