file_id.diz
Welcome to the Interactive Parallel Fractal Demo!
Introduction
This program is a demonstration of a graphics application utilizing both high-performance code optimized for the PowerPC and parallelized code, and is a product of the AppleSeed project, which you can find out about at:
http://exodus.physics.ucla.edu/appleseed/appleseed.html
This Demo achieves over 190 MegaFlops on a single G3/300, and on eight G3/266's it achieves well over 1000 MegaFlops. Also, it is interactive, allowing the user to steer, given the hardware, over 1 GFlop of computational power. This demonstration is an example of using the MacMPI library for interprocessor communication.
Requirements: a Power Macintosh or an AppleTalk network of Power Macintoshes running System 8 or higher.
Recommended: a 10BaseT or faster AppleTalk network of Power Macs each running at a similar speed.
Running in parallel
There are two ways to run this code in parallel: Manually and Automatically. The Manual procedure is as follows:
1. On each computer, turn on Program Linking: Go to the File Sharing control panel. Start Program Linking. Close File Sharing.
2. On each computer, allow guests to link: Go to the Users & Groups control panel. Double-click on Guest. Use the pop-up menu in the window that appears to select Sharing. If "Allow guests to link to this computer" is not checked, check it. Close Users & Groups.
3. Set up the non-zero nodes: Copy the Demo to each of the machines you want to run in parallel. There will be one machine, node zero, where you will operate the demo. On all the machines that will not be node zero, create a text file (e.g., with SimpleText) named nodelist with one filled line at the top followed by blank lines, like this:
ppc_link
Make sure the string in the first lines of all the nodelist files are identical. Place the Demo and the nodelist file in the same folder on each computer. Start the Demo on each computer. They should each say "Calling MPI_InitÉ" at this time.
4. Set up node zero: In the same folder as the Demo on node zero, create one last nodelist file. Make the first line identical to that of the other nodelist files, followed by a list of the names of the other computers, one node per line, plus trailing blank lines, like this:
ppc_link
uclapic1
uclapic2
uclapic3
uclapic4
Run the Demo on node zero. This copy should start MacMPI, talk to the other computers, then run a parallel job.
Congratulations! You've just run a code in parallel!
The Automatic method uses the Launch Den Mother and Launch Puppy, which are available for download from the AppleSeed web site above. After following the installation described in the Launch Den Mother/Puppy README, the LDM and LP will do steps 3 and 4 above for you automatically.
(Note: These nodelist files are, through MacMPI, what tell the Demo who it is and what it will do. If the Demo does not find a nodelist file, it will assume it is working alone, and will start as a single node.)
Operating the Interactive Fractal Demo
Operation of the Fractal Demo is quite simple. To navigate through the fractal, click on the image to make it zoom in where you clicked. Hold down option while clicking to zoom out.
The default zooming factor is two, which you can change via the Zoom Factor menu. However, you may select the zoom factor on the fly by holding the mouse button down until a pop-up menu appears, from which you select a zoom factor.
The Maximum Count edits how many iterations the code will compute on a pixel before giving up on that pixel and drawing black, so if you want to reduce the amount of black on the image, increase this value.
The Color Speed value refers to how quickly the color changes. Each pixel in the image took a certain number of iterations to compute. To increase the amount of color change per iteration, increase the Color Speed value. (Note: this code also interpolates between iterations so that all the color transistions will be as smooth as what's allowed by a 24-bit display.)
The Gallery is a list of settings that will have the Demo produce a variety of fractal images. There are nine different fractal spaces available here, each with their own settings. You may choose from this list, or explore the spaces on your own.
Troubleshooting
The typical problem that occurs is that one node fails to start or fails to communicate because it has a faulty cable, etc. It is recommended to experiment with your network of Power Macs, trying one at a time, then a pair at a time to confirm that each node is working properly and can communicate with other nodes. (Note: the diagnostic information usually reported by MacMPI has been eliminated to keep the program under 64k.)
Other notes about the Demo
When the code runs in parallel, each of the nodes are assigned their own section of the image to compute. Node zero computes and displays its own data during computation. The other nodes compute their pixels on their own, and send their data for display on node zero only when they've completely finished their computations.
During computation, the Demo code counts how many total iterations it did and long it took to do those iterations. It uses this information, with a known number of floating-point operations per iteration (I manually counted them in the assembly) for each fractal space, to calculate the average Flop rate. The timing includes everything necessary to generate the image, including iteration computation, iteration counting, interprocessor mesage time, memory copies, iteration to color translation, iteration interpolation, drawing to the screen, and, if necessary, translating between bit depths. However, the Flop count includes the floating-point multiplies, adds, and subtracts inside the iteration loop only, even though there is a lot more going on than just those Flops.
This demo will work with Power Macs of differing speeds, but not always as optimally. Since node zero is doing the most work (computing AND checking for all incoming messages AND drawing them to the screen), you may try making the fastest member node zero. If the speeds differ too much (e.g. a G3/400 and a 6100/60) it may be better to drop the slowest computer. A similar problem develops if the computational load varies from processor to processor. Typically, the black areas take the longest to compute, so if only one processor is assigned an area that happens to have a lot of black pixels, it will slow down completion of the image. The presets have been chosen to minimize this effect.
This demo can also show how important message time versus computation time is, a fundamental issue in all parallel processing. For short problems (e.g. Zoomed Out settings), the code doesn't perform well because most of the total time is taken up in message time. Many of the other presets, however, take much more CPU time, so message time becomes less of a factor. But increasing the number of nodes reduces the total time to compute the problem, increasing the emphasis on message time. This code demonstrates that tradeoff.
©1999 by Dean Dauger 990430