Implementation of parallel algorithm for Vector addition and matrix multiplication


Experiment No: HPC-04
TITLE: - Implementation of parallel algorithm for Vector addition and matrix multiplication



Problem Statement: - To study and implementation of directive based parallel programming model.
Objective:
  • To study open MP a parallel programming model.
Requirements (Hw/Sw): PC, gcc, openMP tool, Ubuntu system,C/C++.
Theory:-
OpenMP:
OpenMP is a set of C/C++ pragmas (or FORTRAN equivalents) which provide the programmer a high-level front-end interface which get translated as calls to threads (or other similar entities). The key phrase here is "higher-level"; the goal is to better enable the programmer to "think parallel," alleviating him/her of the burden and distraction of dealing with setting up and coordinating threads. For example, the OpenMP directive.
OpenMP is an implementation of multithreading, a method of parallelizing whereby a master thread (a series of instructions executed consecutively) forks a specified number of slave threads and the system divides a task among them. The threads then run parallely, with the runtime environment allocating threads to different processors.
Vector Addition


Vector addition is the operation of adding two or more vectors together into a vector sum.
The so-called parallelogram law gives the rule for vector addition of two or more vectors. For two vectors A and B, the vector sum A+B is obtained by placing them head to tail and drawing the vector from the free tail to the free head. In Cartesian coordinates, vector addition can be performed simply by adding the corresponding components of the vectors, so if
A=(a_1,a_2,...,a_n) and B=(b_1,b_2,...,b_n),
A+B=(a_1+b_1,a_2+b_2,...,a_n+b_n).














OpenMP on shared memory multicore machines creates threads that execute concurrently. The creation of these threads is implicit and built by the compiler when you insert special directives in the C code called pragmas. The code that begins executing main() is considered thread 0. At certain points in the code, you can designate that more threads should be used in parallel and exucute concurrently. This is called forking threads.






In the code below, you will see this pragma, which is implicitly forking the threads to complete the computation on equal chunks of the orginal array:
#pragma omp parallel for shared(a, b, c) private(i) schedule(static, 2)
The shared keyword indicates that the arrays are shared in the same memory space for all threads, and the private keyword indicates that each thread will have its own copy of the index counter i that it will increment.
  • Data Scoping
  • private (variable list)
-Specifies variables local to each thread
  • firstprivate(variable list)
-Similar to the private
-Private variables are initialized to variable value before the parallel directive
  • shared (variable list)
-Specifies variables that are shared among all the threads
  • default (data scoping specifier)
-Default data scoping specifiermay be shared or none


Parallel Matrix Multiplication
  • A matrix is a set of numerical and non-numerical data arranged in a fixed number of rows and column. Matrix multiplication is an important multiplication design in parallel computation.
The definition of matrix multiplication is that if C = AB for an n × m matrix A and an m × p matrix B, then C is an n × p matrix with entries
.
From this, a simple algorithm can be constructed which loops over the indices i from 1 through n and j from 1 through p, computing the above using a nested loop:








The OpenMP-enabled parallel code exploits coarse grain parallelism, which makes use of the cores available in a multicore machine. Basically, we have parallelized the outermost loop which drives the accesses to the result matrix a in the first dimension. Thus, work-sharing among the thrads is such that different threads will calculate different rows of the result matrix a

  • Note that different threads will write different parts of the result in the array a, so we dont’t get any problems during the parallel execution. Note that accesses to matrices b and c are read-only and do not introduce any problems either. In the parallelization strategy, we share all the matrices and everything will work fine;
  •  in addtion. all the indices of the loops are privatized because every thread needs to do it’s own iterations of the loops.


  • Input: matrices A and B
  • Let C be a new matrix of the appropriate size
  • For i from 1 to n:
  • For j from 1 to p:
    • Let sum = 0
    • For k from 1 to m:
      • Set sum ← sum + Aik × Bkj
    • Set Cij ← sum
  • Return


Input :


Output :






Conclusion: 

Comments

Popular Posts