Input description: A set of items , where item i has size and value . A knapsack capacity C.
Problem description: Find the subset that maximizes the value of given that ; i.e. all the items fit in a knapsack of size C.
Discussion: The knapsack problem arises whenever there is resource allocation with financial constraints. Given a fixed budget, how do you select what things you should buy. Everything has a cost and value, so we seek the most value for a given cost. The term knapsack problem invokes the image of the backbacker who is constrained by a fixed-size knapsack and so must fill it only with the most useful items.
The typical formulation in practice is the 0/1 knapsack problem, where each item must be put entirely in the knapsack or not included at all. Objects cannot be broken up arbitrarily, so its not fair taking one can of coke from a six-pack or opening the can to take just a sip. It is this 0/1 property that makes the knapsack problem hard, for a simple greedy algorithm finds the optimal selection whenever we are allowed to subdivide objects arbitrarily. For each item, we could compute its ``price per pound'', and take as much of the most expensive item until we have it all or the knapsack is full. Repeat with the next most expensive item, until the knapsack is full. Unfortunately, this 0/1 constraint is usually inherent in most applications.
Issues that arise in selecting the best algorithm include:
An important special case of constant ``price-per-pound'' knapsack is the integer partition problem, presented in cartoon form in Figure .
Figure: Integer partition is a variant of the Knapsack problem
Here, we seek to partition the elements of S into two sets A and B such that , or alternately make the difference as small as possible. Integer partition can be thought of as bin packing with two equal-sized bins or knapsack with a capacity of half the total weight, so all three problems are closely related and NP-complete.
The constant `price-per-pound' knapsack problem is often called the subset sum problem, because given a set of numbers, we seek a subset that adds up to a specific target number, i.e. the capacity of our knapsack.
The algorithm works as follows: Let S' be a set of items, and let C[i,S'] be true if and only if there is a subset of S' whose size adds up exactly to i. For the empty set, is false for . One by one we add a new item to S' and update the affected values of C[i,S']. Observe that iff either C[i,S'] or is true, since either we use in our subset or we don't. By performing n sweeps through all C elements, one for each , , and updating the array, we identify which sums of sizes can be realized. The knapsack solution is the largest realizable size. In order to reconstruct the winning subset, we must store the name of the item number that turned C[i] from false to true, for each , and then scan backwards through the array.
The dynamic programming formulation described above ignored the values of the items. To generalize the algorithm, add a field to each element of the array to store the value of the best subset to date summing up to i. We now update not only when C[i] turns from false to true, but when the sum of the cost of plus the cost of is better than the previous cost of C[i].
When the knapsack capacity gets too large for dynamic programming, exact solutions can be found using integer programming or backtracking. A 0/1 integer variable is used to denote whether item i is present in the optimal subset. We maximize given the constraint that . Algorithms and implementations of integer and linear programming are discussed in Section .
When exact solutions prove too costly to compute, heuristics should be used. The simple greedy heuristic inserts items according to the maximum `price per pound' rule, described above. Often this heuristic solution is close to optimal, but it can be arbitrarily bad depending upon the problem instance. The ``price per pound'' rule can also be used to reduce the size of the problem instance in exhaustive search-based algorithms by eliminating ``cheap but heavy'' objects from future consideration.
Another heuristic is based on scaling. Dynamic programming works well if the capacity of the knapsack is a reasonably small integer, say . But what if we have a problem with capacity ? We scale down the sizes of all items by a factor of , round the size down to an integer, and then use dynamic programming on the scaled items. Scaling works well in practice, especially when the range of sizes of items is not too large.
Implementations: Martello and Toth's book [MT90a] comes with a disk of Fortran implementations of a variety of knapsack algorithms. This is likely the best source of code currently available.
Algorithm 632 [MT85] of the Collected Algorithms of the ACM is a Fortran code for the 0/1 knapsack problem, with the twist that it supports multiple knapsacks. See Section .
Pascal implementations of several knapsack algorithms, including backtracking and a refined greedy algorithm, are provided in [SDK83]. See Section for details.
Notes: Martello and Toth's book [MT90a] and survey article [MT87] are the standard references on the knapsack problem, including most theoretical and experimental results. An excellent exposition on integer programming approaches to knapsack problems appears in [SDK83]. See [FP75a] for a computational study of algorithms for 0-1 knapsack problems.
A polynomial-time approximation scheme is an algorithm that approximates the optimal solution of a problem in time polynomial in both its size and the approximation factor . This very strong condition implies a smooth tradeoff between running time and approximation quality. Good expositions on the polynomial-time approximation scheme [IK75] for knapsack and subset sum includes [Baa88, CLR90, GJ79, Man89].
The first algorithm for generalized public key encryption by Merkle and Hellman [MH78] was based on the hardness of the knapsack problem. See [Sch94] for an exposition.
Related Problems: Bin packing (see page ), integer programming (see page ).