next up previous contents index CD CD Algorithms
Next: Range Search Up: Computational Geometry Previous: Voronoi Diagrams

Nearest Neighbor Search



Input description: A set S of n points in d dimensions; a query point q.

Problem description: Which point in S is closest to q?

Discussion: The need to quickly find the nearest neighbor to a query point arises in a variety of geometric applications. The classic example in two dimensions is designing a system to dispatch emergency vehicles to the scene of a fire.    Once the dispatcher learns the location of the fire, she uses a map to find the firehouse closest to this point so as to minimize transportation delays. This situation occurs in any application mapping customers to service providers.   

Nearest-neighbor search is also important in classification. Suppose we are given a collection of data about people (say age, height, weight, years of education, sex, and income level) each of whom has been labeled as Democrat or Republican. We seek a classifier to decide which way a different person is likely to vote. Each of the people in our data set is represented by a party-labeled point in d-dimensional space. A simple classifier can be built by assigning to the new point the party affiliation of its nearest neighbor.  

   Such nearest-neighbor classifiers are widely used, often in high-dimensional spaces. The vector-quantization method of image compression partitions an image into tex2html_wrap_inline30049 pixel regions. This method uses a predetermined library of several thousand tex2html_wrap_inline30051 pixel tiles and replaces each image region by the most similar library tile. The most similar tile is the point in 64-dimensional space that is closest to the image region in question. Compression is achieved by reporting the identifier of the closest library tile instead of the 64 pixels, at some loss of image fidelity.

Issues arising in nearest-neighbor search include:

The nearest neighbor graph on a set S of n points links each vertex to its nearest neighbor. This graph is a subgraph of the Delaunay triangulation and so can be computed in tex2html_wrap_inline30057 . This is quite a bargain since it takes tex2html_wrap_inline30059 time just to discover the closest pair of points in S.   

As a lower bound, the closest pair problem in one dimension reduces to sorting. In a sorted set of numbers, the closest pair corresponds to two numbers that lie next to each other in sorted order, so we need only check which is the minimum gap between the n-1 adjacent pairs. The limiting case of this occurs when the closest pair are distance zero apart, meaning that the elements are not unique.  

Implementations: Ranger is a tool for visualizing and experimenting with nearest-neighbor and orthogonal-range queries in high-dimensional data sets, using multidimensional search trees. Four different search data structures are supported by Ranger: naive kd-trees, median kd-trees, nonorthogonal kd-trees, and the vantage point tree.    

For each of these, Ranger supports queries in up to 25 dimensions under any Minkowski metric. It includes generators for a variety of point distributions in arbitrary dimensions. Finally, Ranger provides a number of features to aid in visualizing multidimensional data, best illustrated by the accompanying video [MS93]. To identify the most appropriate projection at a glance, Ranger provides a tex2html_wrap_inline30061 matrix of all two-dimensional projections of the data set. Ranger is written in C, using Motif. It runs on Silicon Graphics and HP workstations and is available in the algorithm repository tex2html_wrap_inline30063 algorith.

See Section gif for a complete collection of Voronoi diagram implementations. In particular, LEDA (see Section gif) provides an implementation of 2D Voronoi diagrams in C++, as well as planar point location to make effective use of them for nearest-neighbor search.    

A Pascal implementation of the divide-and-conquer algorithm for finding the closest pair of points in a set of n points appears in [MS91]. See Section gif.

Notes: The best reference on kd-trees and other spatial data structures is two volumes by Samet [Sam90b, Sam90a], where all major variants are developed in substantial detail. Good expositions on finding the closest pair of points in the plane [BS76] include [CLR90, Man89]. These algorithms use a divide-and-conquer approach instead of just selecting from the Delaunay triangulation.

A recent development in higher-dimensional nearest-neighbor search is algorithms that quickly produce a point that if not the nearest neighbor lies provably close to the query point.   A sparse weighted graph structure is built from the data set, and the nearest neighbor is found by starting at a random point and greedily walking in the graph towards the query point. The closest point found over several random trials is declared the winner. Similar data structures hold promise for other problems in high-dimensional spaces. See [AM93, AMN tex2html_wrap_inline32724 94].

Related Problems: Kd-trees (see page gif), Voronoi diagrams (see page gif), range search (see page gif).      

next up previous contents index CD CD Algorithms
Next: Range Search Up: Computational Geometry Previous: Voronoi Diagrams

Mon Jun 2 23:33:50 EDT 1997