Parallel computing 1.11.2010 Exercise 1 No X-exercises this week. Read the questions carefully, answer to all parts of each question. Draw a picture of each exercise. 1) Design a manual message-passing sorting algorithm for a set of persons and A4 papers. Each paper has a name according to which it is sorted. At the beginning, all papers are in a stack by the first person. At the end, the sorted papers should be returned to the first person. A person can perform local operations and give a stack of one or more papers for the neighboring person. The recipient need not to be ready to accept the stack (post-pox messaging). No person can move from his/hers position. Assume that you have a) 10 persons, b) 100 persons located in a single row. Assume that you have A) 100 papers, B) 10 000 papers. Design the sorting algorithm (as fast as possible) to do the real-world sorting (in both a and b sitting arrangements and both A and B paper piles.). Estimate the time needed (in seconds) for all (Aa, Ab, Ba, Bb) cases. Hint: try to avoid situations when only one person is doing something useful. 2) Modify the previous case b) algorithm so that instead of a single row, the persons are located in a 10 by 10 matrix, and are able to pass the papers in 4 directions. Take advantage of the reduced diameter of the human network. Estimate the time again for both A and B cases. 3) Design an algorithm for 10 000 papers and 1000 persons. Estimate again the time required. Persons can move freely and you can (have to) define how they are located. Notice, however, the physical space requirements for 1000 persons. Hint: try to avoid situations when only few people are doing something elaborate. 4) Lets design a 500 GFLOPS Network of Workstations (NoW) -cluster out of commodity PC-hardware. Find out component prices from some web-store (http://www.verkkokauppa.com/, http://www.multitronic.fi/, etc.). We'll measure CPU computing power by the inaccurate formula 1 GHz = 1 GFLOPS. Cores of multicore processors are counted as separate. Try to build as cheap as possible 500 GFLOPS computer. E.g., 42 * 4 * 3 GHz is enough. Graphical co-processors (GPUs) won't count (yet). The computers must be connected (network devices) and bootable (motherboard, memory, power, network). After completing the minimal system, try to estimate what good it would be? What applications it could be used for? What limitations it has? What would be the bottlenecs? How to improve it? 5) Let us now assume that our algorithm is inefficient by factor of (sqrt(P). Then 42 * 4 * 3 GHz / sqrt(168)= 38.88 GFLOPS (ie, not enough). Now the more powerful processors will gain with respect to the cheaper ones. Again, try to build as cheap as possible 500 GFLOPS computer and estimate the usability of it.