In Section , I related our efforts to build a fast program to test Waring's conjecture for pyramidal numbers. At that point, my code was fast enough that it could complete the job in a few weeks running in the background on a desktop workstation. This option did not appeal to my supercomputing colleague, however.
``Why don't we do it in parallel?'' he suggested. ``After all, you have an outer loop doing the same type of calculation on each integer from 1 to 1,000,000,000. I can split this range of numbers into different intervals and run each one of these on a different processor. Watch, it will be easy.''
He set to work trying to do our computations on an Intel IPSC-860 hypercube using 32 nodes, with 16 megabytes of memory per node. However, instead of getting answers, over the next few weeks I was treated to a regular stream of e-mail about system reliability:
Still, eventually, he rose to the challenge. Waiting until the machine was stable, he locked out 16 processors (half the computer), divided the integers from 1 to 1,000,000,000 into 16 equal-sized intervals, and ran each interval on its own processor. He spent the next day fending off angry users who couldn't get their work done because of our rogue job. The instant the first processor completed analyzing the numbers from 1 to 62,500,000, he announced to all the people yelling at him that the other processors would soon follow.
But they didn't. He failed to realize that the time to test each integer increased as the numbers got larger. After all, it would take longer to test whether 1,000,000,000 could be expressed as the sum of three pyramidal number than it would for 100. Thus at slower and slower intervals, each new processor would announce its completion. Because of the architecture of the hypercube, he couldn't return any of the processors until our entire job was completed. Eventually, half the machine and most of its users were held hostage by one, final interval.
When the job finally completed, the numbers were passed on to the Nobel Prize winner who had requested them. It turns out he had been curious about the problem because his father had made the conjecture back in 1928. There had never been a more important scientific reason to justify the computation in the first place. Indeed, no results from the computation ever appeared in print.
What conclusions can be drawn from this? Before devoting heroic efforts to solve a problem efficiently, make sure that it really needs to be solved, and solved quickly. If you are going to parallelize a problem, be sure to balance the load carefully among the processors. Proper load balancing, using either back-of-the-envelope calculations or the partition algorithm of Section , would have significantly reduced the time we needed the machine, and his exposure to the wrath of his colleagues.