        Next: Point Location Up: Computational Geometry Previous: Nearest Neighbor Search

## Range Search Input description: A set S of n points in and a query region Q.

Problem description: Which points from S lie within Q?

Discussion: Range search problems arise in database and geographic information system (GIS) applications. Any data object with d numerical fields, such as person with height, weight, and income, can be modeled as a point in d-dimensional space. A range query describes a region in space and asks for all points or the number of points in the region. For example, asking for all people with income between \$0 and \$10,000, with height between 6'0'' and 7'0'', and weight between 50 and 140 lbs. defines a box containing people whose body and wallets are both thin.

The difficulty of a range search problem depends on several factors:

• How many range queries are you going to perform? - The simplest approach to range search tests each of the n points one by one against the query polygon Q. This works just fine if the number of queries will be small. Algorithms to test whether a point is within a given polygon are presented in Section .
• What shape is your query polygon? - The easiest type of regions to query against are axis-parallel rectangles, because the inside/outside test reduces to verifying whether each coordinate lies within a prescribed range. The figure above illustrates such an orthogonal range query.

If you are querying against a nonconvex polygon, it will pay to partition your polygon into convex pieces or (even better) triangles and then query the point set against each one of the pieces. This is because testing whether a point is inside a convex polygon can be done more quickly and easily than for arbitrary polygons. Algorithms for such convex decompositions are discussed in Section .

• How many dimensions? - The best general-purpose approach to range queries builds a kd-tree on the point set, as discussed in Section . To perform a query, a depth-first traversal of the kd-tree is performed, with each tree node expanded only if the associated rectangle intersects the query region. For sufficiently large or misaligned query regions, the entire tree might have to be traversed, but in general; kd-trees lead to an efficient solution. Although algorithms with efficient worst-case performance are known in two dimensions, kd-trees are likely to work even better in the plane. In higher-dimensions, kd-trees provide the only viable solution to the problem.
• Can I just count the number of points in a region, or do I have to identify them? - For many applications it suffices to count the number of points in a region instead of returning them. Harkening back to the introductory example, we may want to know whether there are more thin/poor people or rich/fat ones. The need to find the densest or emptiest region in space often naturally arises, and the problem can be solved by counting range queries.

A data structure for efficiently answering such aggregate range queries can be based on the dominance ordering of the point set. A point x is said to dominate point y if y lies both below and to the left of x. Let DOM(p) be a function that counts the number of points in S that are dominated by p. The number of points m in the orthogonal rectangle defined by and is given by The second additive term corrects for the points for the lower left-hand corner that have been subtracted away twice.

To answer arbitrary dominance queries efficiently, partition the space into rectangles by drawing a horizontal and vertical line through each of the n points. The set of points dominated is identical for each point in each rectangle, so the dominance count of the lower left-hand corner of each rectangle can precomputed, stored, and reported for any query point within it. Queries reduce to binary search and thus take time. Unfortunately, this data structure takes quadratic space. However, the same idea can be adapted to kd-trees to create a more space-efficient search structure.

Implementations: LEDA (see Section ) provides excellent support for maintaining planar subdivisions in C++. In particular, it supports orthogonal range queries in time, where n is the complexity of the subdivision and k is the number of points in the rectangular region.

Ranger is a tool for visualizing and experimenting with nearest-neighbor and orthogonal-range queries in high-dimensional data sets, using multidimensional search trees. Four different search data structures are supported by Ranger: naive kd-trees, median kd-trees, nonorthogonal kd-trees, and the vantage point tree. For each of these, Ranger supports queries in up to 25 dimensions under any Minkowski metric. It is available in the algorithm repository.

A bare bones implementation in C of orthogonal range search using kd-trees appears in [GBY91]. Sedgewick [Sed92] provides code fragments of the grid method for orthogonal range search in C++. See Section for details on both of them.

Notes: Good expositions on data structures with worst-case performance for orthogonal-range searching [Wil85] include [PS85]. An exposition on kd-trees for orthogonal range queries in two dimensions appears in [PS85]. Their worst-case performance can be very bad; [LW77] describes an instance in two dimensions requiring time to report that a rectangle is empty.

The problem becomes considerably more difficult for nonorthogonal range queries, where the query region is not an axis-aligned rectangle. For half-plane intersection queries, time and linear space suffice [CGL85]; for range searching with simplex query regions (such as a triangle in the plane), lower bounds preclude efficient worst-case data structures. See [Yao90] for a discussion.

Related Problems: Kd-trees (see page ), point location (see page ),        Next: Point Location Up: Computational Geometry Previous: Nearest Neighbor Search

Algorithms
Mon Jun 2 23:33:50 EDT 1997