Input description: A set of n items.
Problem description: Arrange the items in increasing order.
Discussion: Sorting is the fundamental algorithmic problem in computer science. Learning the different sorting algorithms is like learning scales for a musician. Sorting is the first step in solving a host of other algorithm problems, as discussed in Section . Indeed, ``when in doubt, sort'' is one of the first rules of algorithm design.
Sorting is also used to illustrate the standard paradigms of algorithm design. The result is that most programmers are familiar with many different sorting algorithms, which sows confusion as to which should be used for a given application. The following criteria can help you decide:
If you have more than 100 items to sort, it is important to use an -time algorithm, like heapsort, quicksort, or mergesort. If you have more than 1,000,000 items to sort, you probably need an external-memory algorithm that minimizes disk access. Both types of algorithms are discussed below.
Occasionally, ties need to be broken by their initial position in the data set. If the 5th and 27th items of the initial data set share the same key in such a case, the 5th item must be before the 27th in the final order. A stable sorting algorithm preserves the original ordering in case of ties. Most of the quadratic-time sorting algorithms are stable, while many of the algorithms are not. If it is important that your sort be stable, it is probably better to explicitly use the initial position as a secondary key rather than trust the stability of your implementation.
The simplest approach to external sorting loads the data into a B-tree (see Section ) and then does an in-order traversal of the tree to read the keys off in sorted order. Other approaches are based on mergesort. Files containing portions of the data are sorting using a fast internal sort, and then these files are merged in stages using 2- or k-way merging. Complicated merging patterns based on the properties of the external storage device can be used to optimize performance.
The best general-purpose sorting algorithm is quicksort (see Section ), although it requires considerable tuning effort to achieve maximum performance. Indeed, you are probably better off using a library function instead of doing it yourself. A poorly written quicksort will likely run more slowly than a poorly written heapsort.
If you are determined to implement your own quicksort, use the following heuristics, which make a big difference in practice:
Before you get started, see Bentley's article on building a faster quicksort [Ben92b].
Implementations: Pascal implementations of all the primary sorting algorithms are available from [MS91]. See Section for details. Timing comparisons show an optimized version of quicksort to be the winner.
Bare bones implementations of all basic sorting algorithms, in C and Pascal, appear in [GBY91]. Most notable is the inclusion of implementations of external memory sorting algorithms. Sedgewick includes similar sort routine fragments in C++. See Section for details.
XTango (see Section ) is an algorithm animation system for UNIX and X-windows, which includes animations of all the basic sorting algorithms, including bubblesort, heapsort, mergesort, quicksort, radix sort, and shellsort. Many of these are quite interesting to watch. Indeed, sorting is the canonical problem for algorithm animation.
Algorithm 410 [Cha71] and Algorithm 505 [Jan76] of the Collected Algorithms of the ACM are Fortran codes for sorting. The latter is an implementation of Shellsort on linked lists. Both are available from Netlib (see Section ).
C language implementations of Shellsort, quicksort, and heapsort appear in [BR95]. The code for these algorithms is printed in the text and available on disk for a modest fee.
A bare bones implementation of heapsort in Fortran from [NW78] can be obtained in Section . A bare bones implementation of heapsort in Mathematica from [Ski90] can be obtained in Section .
Notes: Knuth [Knu73b] is the best book that has been written on sorting and indeed is the best book that will ever be written on sorting. It is now almost twenty-five years old, and a revised edition is promised, but it remains fascinating reading. One area that has developed since Knuth is sorting under presortedness measures. A newer and noteworthy reference on sorting is [GBY91], which includes pointers to algorithms for partially sorted data and includes implementations in C and Pascal for all of the fundamental algorithms.
Expositions on the basic internal sorting algorithms appear in every algorithms text, including [AHU83, Baa88, CLR90, Man89]. Treatments of external sorting are rarer but include [AHU83]. Heapsort was first invented by Williams [Wil64]. Quicksort was invented by Hoare [Hoa62], with careful analysis and implementation by Sedgewick [Sed78]. Von Neumann is credited with having produced the first implementation of mergesort, on the EDVAC in 1945. See Knuth for a full discussion of the history of sorting, dating back to the days of punched-card tabulating machines.
Sorting has a well-known lower bound under the algebraic decision tree model [BO83]. Determining the exact number of comparisons required for sorting n elements, for small values of n, has generated considerable study. See [Aig88, Raw92] for expositions.
This lower-bound does not hold under different models of computation. Fredman and Willard [FW93] present an algorithm for sorting under a model of computation that permits arithmetic operations on keys. Under a similar model, Thorup [Tho96] developed a priority queue supporting operations, implying an sorting algorithm.
Related Problems: Dictionaries (see page ), searching (see page ), topological sorting (see page ).