;; Randomized Quicksort ;; Quicksort has very unfortunate performance characteristics. It's ;; very fast on randomized data, and uses very little extra memory, ;; but if there's too much order in its input, it becomes a quadratic ;; algorithm which blows the stack. ;; Such inputs are very rare, but very easy to find (there's glory for you!) ;; Luckily there's an easy fix, which is why Quicksort's a very popular sorting algorithm. ;; Here's Quicksort as we left it: (defn swap [v i1 i2] (assoc v i2 (v i1) i1 (v i2))) (defn ^:dynamic partition-by-pivot [v start end] (loop [v v i (inc start) j (inc start)] ;; this extra loop just sets up the indices i and j (if (> j end) [(dec i) (swap v start (dec i))] (if (< (v start) (v j)) (recur v i (inc j)) (recur (swap v i j) (inc i) (inc j)))))) (defn ^:dynamic qsort ([v start end] (if (> (+ 1 start) end) v ;; short array, nothing to do (let [[index newv] (partition-by-pivot v start end) ;; otherwise partition leftsorted (qsort newv start (dec index))] ;; and sort the left half (qsort leftsorted (inc index) end))))) (defn quicksort [v] (let [vec (into [] v)] (qsort vec 0 (dec (count vec))))) ;; Potentially a very fast sorting algorithm, which apart from its recursion stack ;; needs no extra memory. ;; But although it reliably looks good on random input: (time (first (quicksort (shuffle (range 2048))))) "Elapsed time: 435.670347 msecs" ;; If you feed it an array which is already in order: (time (first (quicksort (range 2048)))) "Elapsed time: 2413.521649 msecs" ;; performance drops off horribly (time (first (quicksort (range 4096)))) ;; and the stack blows. ;; The underlying problem is that when the data has a structure which makes the pivots ;; far from the medians of the arrays to be partitioned, the recursion tree unbalances, and ;; rather than doing roughly log 2048=11 recursions, quicksort does 2048 recursions. ;; If only we could get the usual performance characteristics of quicksort on random data ;; on the sort of almost-sorted data that we often encounter in practice. ;; One way to do this, of course, would be to shuffle the data before sorting it. ;; The chances of hitting a bad case with properly shuffled data are astronomically small. ;; It would work, but isn't that an odd thing to do? And shuffling is just as difficult as sorting. ;; Turns out we don't have to. If we pick the pivot at random, then we get the same performance. ;; There are still bad cases, but if the random number generator is uncorrelated with the data to ;; be sorted, we'll never hit one in a million years. ;; How shall we do that? Turns out it's dead easy to frig ;; partition-by-pivot. We just swap the first element of the array ;; with a randomly chosen one before we pick the pivot: (defn ^:dynamic partition-by-pivot [v start end] (let [randomel (+ start (rand-int (- end start))) v (swap v start randomel)] ;; swap first element with randomly chosen one (loop [v v i (inc start) j (inc start)] (if (> j end) [(dec i) (swap v start (dec i))] (if (< (v start) (v j)) (recur v i (inc j)) (recur (swap v i j) (inc i) (inc j))))))) (time (first (quicksort (range 2048)))) "Elapsed time: 666.069407 msecs" ;; The bad cases are still around. But they're very hard to find now. ;; It occurs to me that one way to blow up randomized quicksort might ;; be to use a pseudo-random number generator and a well chosen ;; algorithm to shuffle the data, and then use the same PRNG to ;; quicksort it. So if you've got a clever adversary who knows what ;; you mean by random he should be able to make your quicksorts ;; explode. ;; We're something like a factor of twenty away from clojure's built in sort. ;; That's really not bad for an unoptimized algorithm. (time (first (sort (shuffle (range 2048))))) "Elapsed time: 33.509778 msecs" ;; It should be possible to speed our quicksort up quite a bit (especially if ;; we restrict ourselves to arrays of integers). ;; I leave this as an exercise for the reader.

## Thursday, July 11, 2013

### Randomized Quicksort

### Quicksort

;; Quicksort (shuffle (range 20)) ;; [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] (def original [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19]) ;; The fundamental notion in quicksort is partition-by-pivot ;; The first thing we need to do is to choose the 'pivot'. ;; The obvious choice is the first element of our array, 7 ;; So our fundamental operation will be to reorder the array so that everything less than ;; 7 comes before it, and everything greater than 7 comes after it. ;; It's a fair bet that the ability to swap elements will come in handy: (defn swap [v i1 i2] (assoc v i2 (v i1) i1 (v i2))) (swap [1 2 3] 0 1) ;-> [2 1 3] (swap original 1 19) ;-> [7 19 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 13] ;; Here's a function to partition-by-pivot, taking the first element of the array as ;; the pivot. ;; To understand what it's doing, notice that it leaves the pivot ;; alone at the beginning of the array, and maintains this pattern: ;; [ pivot | less than pivot | more than pivot | unscanned ] ;; in the array as it scans through: ;; Once the scan is ended we can put the pivot in its rightful place with a single swap. (defn ^:dynamic partition-by-pivot [v i j] (if (= j (count v)) ;; if the scan is ended (swap v 0 (dec i)) ;; put the pivot in the right place (if (< (v 0) (v j)) ;; otherwise compare elements (partition-by-pivot v i (inc j)) ;; and move the boundaries/swap elements as appropriate (partition-by-pivot (swap v i j) (inc i) (inc j))))) ;; to preserve the pattern ;; i and j are the indices of the two moving boundaries in the vector. ;; We don't need an index for the pivot since it stays at the front. ;; Actions having a way of speaking louder than words, so: (require 'clojure.tools.trace) (clojure.tools.trace/dotrace [partition-by-pivot] (partition-by-pivot original 1 1)) ;; TRACE t1304: (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 1 1) ;; TRACE t1305: | (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 1 2) ;; TRACE t1306: | | (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 1 3) ;; TRACE t1307: | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 4) ;; TRACE t1308: | | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 5) ;; TRACE t1309: | | | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 6) ;; TRACE t1310: | | | | | | (partition-by-pivot [7 2 12 13 15 14 16 1 3 0 9 18 17 10 11 8 4 6 5 19] 2 7) ;; TRACE t1311: | | | | | | | (partition-by-pivot [7 2 1 13 15 14 16 12 3 0 9 18 17 10 11 8 4 6 5 19] 3 8) ;; TRACE t1312: | | | | | | | | (partition-by-pivot [7 2 1 3 15 14 16 12 13 0 9 18 17 10 11 8 4 6 5 19] 4 9) ;; TRACE t1313: | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 10) ;; TRACE t1314: | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 11) ;; TRACE t1315: | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 12) ;; TRACE t1316: | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 13) ;; TRACE t1317: | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 14) ;; TRACE t1318: | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 15) ;; TRACE t1319: | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 14 16 12 13 15 9 18 17 10 11 8 4 6 5 19] 5 16) ;; TRACE t1320: | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 16 12 13 15 9 18 17 10 11 8 14 6 5 19] 6 17) ;; TRACE t1321: | | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 6 12 13 15 9 18 17 10 11 8 14 16 5 19] 7 18) ;; TRACE t1322: | | | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 6 5 13 15 9 18 17 10 11 8 14 16 12 19] 8 19) ;; TRACE t1323: | | | | | | | | | | | | | | | | | | | (partition-by-pivot [7 2 1 3 0 4 6 5 13 15 9 18 17 10 11 8 14 16 12 19] 8 20) ;; TRACE t1323: | | | | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1322: | | | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1321: | | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1320: | | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1319: | | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1318: | | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1317: | | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1316: | | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1315: | | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1314: | | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1313: | | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1312: | | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1311: | | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1310: | | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1309: | | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1308: | | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1307: | | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1306: | | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1305: | => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; TRACE t1304: => [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; Notice that in the result: ;; [ 5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19] ;; (which is much better ordered than the original) ;; the pivot element 7 is in its correct place (element 7) already. ;; It need never move again. ;; We'd now like to repeat the partition on the two parts of the array on either side of 7 ;; And to do this, we need to know the index where the pivot ended up. ;; Let's firstly modify our function so that it returns the final index of the pivot (defn ^:dynamic partition-by-pivot [v i j] (if (= j (count v)) [(dec i) (swap v 0 (dec i))] ;; just returning the index as well (if (< (v 0) (v j)) (partition-by-pivot v i (inc j)) (partition-by-pivot (swap v i j) (inc i) (inc j))))) (partition-by-pivot original 1 1) ;-> [7 [5 2 1 3 0 4 6 7 13 15 9 18 17 10 11 8 14 16 12 19]] ;; Now we want to partition-by-pivot only the array from 0 to 6, so ;; let's further modify our routine so that it can partition just a ;; part of the array (defn ^:dynamic partition-by-pivot [v start end] (loop [v v i (inc start) j (inc start)] ;; this extra loop just sets up the indices i and j (if (> j end) [(dec i) (swap v start (dec i))] (if (< (v start) (v j)) (recur v i (inc j)) (recur (swap v i j) (inc i) (inc j)))))) ;; So now we can partition subarrays with respect to their first elements: (partition-by-pivot [7 13 12 2 15 14 16 1 3 0 19 18 17 10 11 8 4 6 5 9] 0 19) ;-> [7 [5 2 1 3 0 4 6 7 13 15 19 18 17 10 11 8 14 16 12 9]] (partition-by-pivot ['_ 13 12 2 15 14 16 1 3 0 19 18 17 10 11 8 4 6 5 '_] 1 18) ;-> [12 [_ 5 12 2 1 3 0 10 11 8 4 6 13 15 14 16 19 18 17 _]] (partition-by-pivot ['_ '_ '_ '_ '_ 14 16 1 3 0 19 '_ '_ '_ '_ '_ '_ '_ '_ '_] 5 10) ;-> [8 [_ _ _ _ _ 0 1 3 14 16 19 _ _ _ _ _ _ _ _ _]] ;; Once we've got this function, it's obvious how to write quicksort: ;; To quicksort, partition by a pivot and then quicksort both halves (defn ^:dynamic qsort ([v start end] (if (> (+ 1 start) end) v ;; short array, nothing to do (let [[index newv] (partition-by-pivot v start end) ;; otherwise partition leftsorted (qsort newv start (dec index))] ;; and sort the left half (qsort leftsorted (inc index) end))))) ;; and then the right half ;; This makes the interface a bit more user-friendly (defn quicksort [v] (let [vec (into [] v)] (qsort vec 0 (dec (count vec))))) (quicksort original) ;-> [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 16 18 19] ;; ta-daaah... (clojure.tools.trace/dotrace [qsort partition-by-pivot] (quicksort original)) ;; So how fast is quicksort? ;; Well... (time (first (quicksort (shuffle (range 64))))) "Elapsed time: 4.112325 msecs" (time (first (quicksort (shuffle (range 128))))) "Elapsed time: 9.163108 msecs" (time (first (quicksort (shuffle (range 256))))) "Elapsed time: 32.184819 msecs" (time (first (quicksort (shuffle (range 1024))))) "Elapsed time: 135.90394 msecs" (time (first (quicksort (shuffle (range 2048))))) "Elapsed time: 298.712637 msecs" (time (first (quicksort (shuffle (range 4096))))) "Elapsed time: 733.392983 msecs" (time (first (quicksort (shuffle (range 8192))))) "Elapsed time: 1461.434799 msecs" (time (first (quicksort (shuffle (range 16384))))) "Elapsed time: 3530.573197 msecs" ;; But.... (time (first (quicksort (range 64)))) "Elapsed time: 4.902854 msecs" (time (first (quicksort (range 128)))) "Elapsed time: 14.469927 msecs" (time (first (quicksort (range 256)))) "Elapsed time: 36.480616 msecs" (time (first (quicksort (range 512)))) "Elapsed time: 146.244439 msecs" (time (first (quicksort (range 1024)))) "Elapsed time: 526.623253 msecs" (time (first (quicksort (range 2048)))) "Elapsed time: 2131.757814 msecs" (time (first (quicksort (range 4096)))) ;; stack overflow!! (time (first (quicksort (range 8192)))) ;; And also: (time (first (quicksort (reverse (range 64))))) "Elapsed time: 15.704443 msecs" (time (first (quicksort (reverse (range 128))))) "Elapsed time: 57.875302 msecs" (time (first (quicksort (reverse (range 256))))) "Elapsed time: 170.641883 msecs" (time (first (quicksort (reverse (range 512))))) "Elapsed time: 658.467259 msecs" (time (first (quicksort (reverse (range 1024))))) "Elapsed time: 2569.858778 msecs" (time (first (quicksort (reverse (range 2048))))) "Elapsed time: 11171.441353 msecs" (time (first (quicksort (reverse (range 4096))))) ;; stack overflow!! (time (first (quicksort (reverse (range 8192))))) ;; It looks as though it's linear-ish for random input, but quadratic (with a vast recursion depth) for at least some cases. ;; And those cases are easy to find!

Subscribe to:
Posts (Atom)