Quicksort
Quicksort is popular because it is not difficult to implement, works well for a variety of different kinds of input data, and is substantially faster than any other sorting method in typical applications. It is in-place (uses only a small auxiliary stack), requires time proportional to N log N on the average to sort N items, and has an extremely short inner loop.
§ The basic algorithm.
Quicksort is a divide-and-conquer method for sorting. It works by partitioning an array into two parts, then sorting the parts independently.
The crux of the method is the partitioning process, which rearranges the array to make the following three conditions hold:
- The entry a[j] is in its final place in the array, for some j.
- No entry in a[lo] through a[j-1] is greater than a[j].
- No entry in a[j+1] through a[hi] is less than a[j].
We achieve a complete sort by partitioning, then recursively applying the method to the subarrays. It is a randomized algorithm, because it randomly shuffles the array before sorting it.
Partitioning.
To complete the implementation, we need to implement the partitioning method. We use the following general strategy: First, we arbitrarily choose a[lo] to be the partitioning item—the one that will go into its final position. Next, we scan from the left end of the array until we find an entry that is greater than (or equal to) the partitioning item, and we scan from the right end of the array until we find an entry less than (or equal to) the partitioning item.
The two items that stopped the scans are out of place in the final partitioned array, so we exchange them. When the scan indices cross, all that we need to do to complete the partitioning process is to exchange the partitioning item a[lo] with the rightmost entry of the left subarray (a[j]) and return its index j.
Quicksort.
Quick.java is an implementation of quicksort, using the partitioning method described above.
Implementation details.
There are several subtle issues with respect to implementing quicksort that are reflected in this code and worthy of mention.
- Partitioning inplace. If we use an extra array, partitioning is easy to implement, but not so much easier that it is worth the extra cost of copying the partitioned version back into the original.
- Staying in bounds. If the smallest item or the largest item in the array is the partitioning item, we have to take care that the pointers do not run off the left or right ends of the array, respectively.
- Preserving randomness. The random shuffle puts the array in random order. Since it treats all items in the subarrays uniformly, Quick.java has the property that its two subarrays are also in random order. This fact is crucial to the algorithm's predictability. An alternate way to preserve randomness is to choose a random item for partitioning within partition().
- Terminating the loop. Properly testing whether the pointers have crossed is a bit trickier than it might seem at first glance. A common error is to fail to take into account that the array might contain other keys with the same value as the partitioning item.
- Handling items with keys equal to the partitioning item's key. It is best to stop the left scan for items with keys greater than or equal to the partitioning item's key and the right scan for items less than or equal to the partitioning item's key. Even though this policy might seem to create unnecessary exchanges involving items with keys equal to the partitioning item's key, it is crucial to avoiding quadratic running time in certain typical applications.
- Terminating the recursion. A common mistake in implementing quicksort involves not ensuring that one item is always put into position, then falling into an infinite recursive loop when the partitioning item happens to be the largest or smallest item in the array.
Proposition.
Quicksort uses ~2 N ln N compares (and one-sixth that many exchanges) on the average to sort an array of length N with distinct keys.
Proposition.
Quicksort uses ~N2/2 compares in the worst case, but random shuffling protects against this case.
The standard deviation of the running time is about .65 N, so the running time tends to the average as N grows and is unlikely to be far from the average. The probability that quicksort will use a quadratic number of compares when sorting a large array on your computer is much less than the probability that your computer will be struck by lightning!
Improvements.
Quicksort was invented in 1960 by C. A. R. Hoare, and it has been studied and refined by many people since that time.
- Cutoff to insertion sort. As with mergesort, it pays to switch to insertion sort for tiny arrays. The optimum value of the cutoff is system-dependent, but any value between 5 and 15 is likely to work well in most situations.
- Median-of-three partitioning. A second easy way to improve the performance of quicksort is to use the median of a small sample of items taken from the array as the partitioning item. Doing so will give a slightly better partition, but at the cost of computing the median. It turns out that most of the available improvement comes from choosing a sample of size 3 (and then partitioning on the middle item).
§ Visualization.
QuickBars.java visualizes quicksort with median-of-3 partitioning and cutoff for small subarrays.
§ Entropy-optimal sorting.
Arrays with large numbers of duplicate sort keys arise frequently in applications. In such applications, there is potential to reduce the time of the sort from linearithmic to linear.
One straightforward idea is to partition the array into three parts, one each for items with keys smaller than, equal to, and larger than the partitioning item's key. Accomplishing this partitioning was a classical programming exercise popularized by E. W. Dijkstra as the Dutch National Flag problem, because it is like sorting an array with three possible key values, which might correspond to the three colors on the flag.
Dijkstra's solution is based on a single left-to-right pass through the array that maintains a pointer lt such that a[lo..lt-1] is less than v, a pointer gt such that a[gt+1..hi] is greater than v, and a pointer i such that a[lt..i-1] are equal to v, and a[i..gt] are not yet examined.
Starting with i equal to lo we process a[i] using the 3-way compare given us by the Comparable interface to handle the three possible cases:
- a[i] less than v: exchange a[lt] with a[i] and increment both lt and i
- a[i] greater than v: exchange a[i] with a[gt] and decrement gt
- a[i] equal to v: increment i
Quick3way.java is an implementation of this method.
Proposition.
Quicksort with 3-way partitioning is entropy-optimal.
§ Visualization.
Quick3wayBars.java visualizes quicksort with 3-way partitioning.