Input description: A set S of n keys, a query key q.
Problem description: Where is q in S?
Discussion: Searching means different things to different people. Searching for the global maximum or minimum of a function is the problem of unconstrained optimization and is discussed in Section . Chess playing programs search for the best move to make next by using alpha-beta minimax search, which is an exhaustive search of the possible moves using a variation of backtracking (see Section ).
Here we consider the simple task of searching for a key in a list or in an array, which is a fundamental problem associated with information retrieval. Dictionary data structures maintain efficient access to sets of keys under insertion and deletion and are discussed in Section . Typical dictionaries include binary trees and hash tables.
We treat searching here as a problem distinct from dictionaries because simpler and more efficient solutions emerge when our primary interest is static searching. These little data structures can yield large performance improvements when properly employed in an innermost loop. Also, several of the ideas from list and array searching, such as binary search and self-organization, apply to other problems and justify our attention.
There are two basic approaches to array searching: sequential search and binary search. Both are simple, yet have interesting and subtle variations. In sequential search, we simply start from the front of our list or array of keys and compare each successive item against the key until we find a match or reach the end. In binary search, we start with a sorted array of keys. To search for key q, we compare q to the middle key . If q is before , it must reside in the top half of our set; if not, it must reside in the bottom half of our set. By repeating this process on the correct half, we find the key in a total of comparisons. This is a big win over the n/2 comparisons we expect with sequential search. See Section for more on binary search.
Sequential search is the simplest algorithm, and likely to be fastest on up to 10-20 elements. For such tiny problems, forget about binary search. Beyond 50-100 elements, there is no question that binary search will be more efficient than sequential search, even factoring in the cost of the sorting (assuming multiple queries). Other issues do come into play, however, particularly in identifying the proper variant of the algorithm:
However, preordering the list to exploit a skewed access pattern requires knowing the access pattern in advance. For many applications, it can be difficult to obtain such information. Far easier are self-organizing lists, where the order of the keys changes in response to the queries. The simplest and best self-organizing scheme is move-to-front; that is, we move the most recently searched-for key from its current position to the front of the list. Popular keys keep getting boosted to the front, while unsearched-for keys drift towards the back of the list. There is no need to keep track of the frequency of access; just move the keys on demand. Self-organizing lists also exploit locality of reference, since accesses to a given key are likely to occur in clusters. Any key will be maintained near the top of the list during a cluster of accesses, even if other keys have proven more popular in the past.
Self-organization can extend the useful size range of sequential search. However, you should switch to binary search beyond 50-100 elements.
Such a one-sided binary search finds the target at position p+l using at most comparisons, so it is faster than binary search when l < < n, yet it can never be much worse. One-sided binary search is particularly useful in unbounded search problems, such as in numerical root finding.
Although interpolation search is an appealing idea, we caution against it for three reasons: First, you have to work very hard to optimize your search algorithm before you can hope for a speedup over binary search. Second, even if you get lucky and beat binary search, it is unlikely to be by enough to have justified the exercise. Third, your program will be much less robust and efficient when the distribution changes, such as when your application gets put to work on French words instead of English.
Implementations: The basic sequential and binary search algorithms are simple enough to implement that you should likely do them yourself. Still, the routines described below may be useful as models.
Gonnet and Baeza-Yates provides code fragments in C and Pascal for sequential, binary, and interpolation search, as well as for related dictionary structures. LEDA (see Section ) provides a sorted array data type in C++ that supports binary search. Many textbooks include implementations of binary search, including [MS91]. See Section for details.
Notes: Mehlhorn and Tsakalidis [MT90b] give a through survey of the state-of-the-art in modern data structures. Knuth [Knu73a] provides a detailed analysis and exposition on all fundamental search algorithms and dictionary data structures but omits such modern data structures as red-black and splay trees. Gonnet and Baeza-Yates [GBY91] provide detailed references and experimental results for a wide variety of search algorithms.
Manber [Man89] provides an interesting discussion of variations of binary search, including one-sided binary search and searching for an index in A where .
In linear interpolation search on an array of sorted numbers, the next position probed is given by
where q is the query numerical key and S the sorted numerical array. If the keys are drawn independently from a uniform distribution, the expected search time is [YY76]. Expositions on interpolation search include [Raw92].
Nonuniform access patterns can be exploited in binary search trees by structuring them so that popular keys are located near the root, thus minimizing search time. Dynamic programming can be used to construct such optimal search trees in time [Knu73b]. Expositions include [AHU74]. Splay trees are self-organizing tree structures, as discussed in Section .
Related Problems: Dictionaries (see page ), sorting (see page ).