Parallel Computer Systems, VT12


Assignment 1: FractSMP

In this assignment you will be using parallel computers to compute fractals. You must solve this assignment individually.

Implementation

The assignment is divided into three parts.

Part 1

Make a sequential implementation that generates Mandelbrot fractals. The algorithm is described, besides on earlier years web page (introduction lecture), on several web pages. Input (given as command line arguments):
  • Xmax, Xmin, Ymax, Ymin
  • Resolution (pixels in x and y, should be given as 2 values, e.g., 1200x800)
  • maxiter
  • T (threshold) - maximum distance from origo in the complex plane
The output shall be a picture in ppm-format (P6-variant).

Part 2

Make two parallelizations of the implementation above using OpenMP and pthreads (tutorial 1, tutorial 2), respectively. Added to the input is now:
  • Number of threads
Use some kind of dynamic load balancing (some automatic support in OpenMP) This means that the distribution of the work should be dynamic when the program discovers that some computations takes longer than others. What you want is that every thread in the program uses its processor 100% (from the beginning) and that all threads finishes at the same time (at least close to) even though it is unknown how long a computation will take when you start the iteration.

Describe how your parallelizations are supposed to work. Is it sensible parallelizations? Are there better ways to do it? Are there performance differences between the two implementations? Which did you find easier to do, why?

Part 3

With the help of  "magic box" you can speed the calculation of certain areas up. Look at the picture to the right. We want to compute the whole picture. After the computation of the border points, we find that some of these are different. The Picture is then split into four new pictures (the blue lines). Each of the new pictures are then computed by this recursive schema. If a picture has a border where not all points have the same value, split the picture, else if all border points have the same value we assume that all points inside the picture have the same value as the border points. Look at the green squares. For three of them we see that all border ponts have the same value and we fill them with the same colour, the fourth we split.

There are some risks with magic box also. What if the value of all border points of the whole picture had been the same? Then we would not have seen the fractal at all!

Observe that if only one thread runs the 'magic box' part it is not likely that you will fulfil the requirements for good load balancing (especially in the beginning).

Add more input to the program:

  • Use "magic box" or not
  • Smallest box size (below this limit no recursion occurs, just compute all individual values inside the box)

The parallelization may be done with either OpenMP or Pthreads.

Describe how your parallelization is supposed to work. Is it a sensible parallelization? Is there a better way to do it?

Test runs

You should use one node on Akka (1-8 cores) for your tests. Then you are free to include tests from other machines as well.

The programs should allow an arbitrary number of threads, even though you need only to use up to eight cores in the tests.

One simple way to measure the time is to use the function gettimeofday() in your program (see man page). In my_wtime.c is a function you can use by calling it before and after the part you want to time (and use the difference as the time spent).

More information

For general information about running programs on the clusters at HPC2N, please see the user guides and other web pages. (www.hpc2n.umu.se)

See also the lecture about assignment 1.

OpenMP

At least the Portland, Intel and gcc compilers at HPC2N (on Akka) have support for OpenMP. The option to pgcc (the Portland compiler) is -mp, the option to icc (the Intel compiler) is -openmp and the option to gcc is -fopenmp
Do not forget to add the right modules, i.e., module add pgi or module add intel-compiler, respectively. You can specify the number of threads the program should use by setting an environment variable called OMP_NUM_THREADS.

Submitting jobs

Here you find sample submitfile for OpenMP. and pthreads. Remember to submit your jobs from the pfs file system.

Report

The report should be well written and at least include:
  • How you have solved the load balancing
  • Performance graphs
  • Speed-up graphs
  • How do you expect you implementation to work with more threads?
  • Limitations, problems
  • Deadline: 13.00 on February 20, 2012




http://www.cs.umu.se/kurser/5DV011/VT12/assignments/A1/ass1.html
Responsible for the page: Mikael Rännar
Last changed 2012 Jan 23