;; A futile design process ;; The other day, I found myself wanting a running average of an infinite sequence ;; Here is an example of such a sequence: (def integers (iterate inc 0)) ;; At first, I took the easy way out by just calculating it (defn runavg [sq] (fn [n] (/ (reduce + (take (inc n) sq)) (inc n)))) ;; So here is the fifth running average ((runavg integers) 5) ;; After a while, I had occasion to make a list of the running averages themselves (def runavgs (map (runavg integers) (iterate inc 0))) (take 5 runavgs) ;; this list proved very useful, but Schlemiel the Painter is at work here: (time (count (take 100 (map (runavg integers) (iterate inc 0))))) ;;12ms (time (count (take 1000 (map (runavg integers) (iterate inc 0))))) ;;1200ms ;; (time (count (take 10000 (map (runavg integers) (iterate inc 0))))) ;;120s ;; (time (count (take 100000 (map (runavg integers) (iterate inc 0))))) ;;3 hours ;; This wasn't a problem for my program as first written, but as I pushed it ;; further this became a major bottleneck, and so I decided to optimize the ;; running average to only look at each element once. (defn runavg ([lst] (runavg 0 0 lst)) ([sum count lst] (if (empty? lst) '() (let [newsum (+ sum (first lst)) newcount (inc count)] (cons (/ newsum newcount) (runavg newsum newcount (rest lst))))))) (runavg '(1 2 3 4 5)) ;; Of course, we can make this work for infinite sequences too just by wrapping ;; the function with lazyseq (defn runavg ([lst] (runavg 0 0 lst)) ([sum count lst] (lazyseq (if (empty? lst) '() (let [newsum (+ sum (first lst)) newcount (inc count)] (cons (/ newsum newcount) (runavg newsum newcount (rest lst)))))))) (take 5 (runavg integers)) ;; Performance, predictably is vastly improved. (time (count (take 100 (runavg integers))))) ;;0.3ms (time (count (take 1000 (runavg integers))))) ;;3ms (time (count (take 10000 (runavg integers))))) ;;30ms (time (count (take 100000 (runavg integers))))) ;;300ms ;; Now, in this new algorithm, we have an accumulator, which stores the sum of ;; the elements so far And another which stores the count so far, and this ;; automatically made me think of reduce: (reduce + (range 1 10)) ;;summing with reduce (reduce (fn[a x] (inc a)) (range 1 10)) ;;counting with reduce ;; Of course we can combine these two reductions (reduce (fn [[sum count] x] [(+ sum x) (inc count)]) [0 0] (range 1 10)) ;; But if we want a running average, as above, then we'd really like to see the ;; state of the accumulator at every step, and also, reduce will blow up on an ;; infinite sequence. ;; And so it occurred to me that I could write a lazy version of reduce, which I called ;; reduceseq (defn reduceseq [f acc sq] (if (empty? sq) (list acc) (let [nextacc (f acc (first sq))] (cons acc (reduceseq f nextacc (rest sq)))))) ;; Now, it's a simple matter to lazy this up: (defn reduceseq [f acc sq] (lazyseq (if (empty? sq) (list acc) (let [nextacc (f acc (first sq))] (cons acc (reduceseq f nextacc (rest sq))))))) ;; Here are some trivial reductions (reduceseq + 0 '(0 1 2 3 4 5)) ;; (0 1 3 6 10 15) (take 5 (reduceseq + 0 integers))) ;; (0 0 1 3 6) (take 5 (reduceseq * 1 (drop 1 integers))) ;; (1 1 2 6 24) ;; A new way to define the factorials is always pleasing, and the laziness makes ;; it pretty efficient! ;; Here's it working with the function and accumulator above (reduceseq (fn [[sum count] x] [(+ sum x) (inc count)]) [0 0] (range 5)) ;;([0 0] [0 1] [1 2] [3 3] [6 4] [10 5]) ;; And here's how we could use it to define runavg (defn runningaverage [sq] (map #(apply / %) (drop 1 (reduceseq (fn [[sum count] x] [(+ sum x) (inc count)]) [0 0] sq)))) (runningaverage (range 5)) ;; (0 1/2 1 3/2 2) (take 10 (runningaverage integers)) ;; (0 1/2 1 3/2 2 5/2 3 7/2 4 9/2) ;; So at this point, I was starting to feel pretty smug. ;; A new version of factorial, and a functional abstraction like map, filter, and reduce that ;; seemed to be more general than any of them, and adapted for infinite sequences besides. ;; I'd never seen anything like that before. ;; And it occurred to me that if I was going to generalize reduce, I'd better deal with ;; the case where the first value of the sequence is used as the initial accumulator too: (defn reduceseq ([f acc sq] (lazyseq (if (empty? sq) (list acc) (let [nextacc (f acc (first sq))] (cons acc (reduceseq f nextacc (rest sq))))))) ([f sq] (reduceseq f (first sq) (rest sq)))) ;; reduceseq should work as much like reduce as possible (reduce + 0 '(1 2 3 4 5)) ;; 15 (reduceseq + 0 '(1 2 3 4 5)) ;; (0 1 3 6 10 15) (reduce + '(1 2 3 4 5)) ;; 15 (reduceseq + '(1 2 3 4 5)) ;; (1 3 6 10 15) (reduce + 0 (range 4)) ;;6 (take 5 (reduceseq + 0 (iterate inc 0))) ;; (0 0 1 3 6) (reduce + 0 '()) ;; 0 (reduceseq + 0 '()) ;; (0) ;; So far so good, but I've violated the principle of least surprise here: (reduce + '()) ;; 0 (reduceseq + '()) ;; (nil) ;; Here's a regression test that will allow me to mess about with the function ;; while preserving its good behaviours (defn testreduceseq [reduceseq] (cond (not= (reduceseq + 0 '(1 2 3 4 5)) '(0 1 3 6 10 15)) "fail" (not= (reduceseq + '(1 2 3 4 5)) '(1 3 6 10 15)) "fail" (not= (take 5 (reduceseq + 0 (iterate inc 0))) '(0 0 1 3 6)) "fail" (not= (reduceseq + 0 '()) '(0)) "fail" (not= (reduceseq + '(1)) '(1)) "fail" (not= (reduceseq + 1 '(1)) '(1 2)) "fail" (not= (nth (reduceseq + (iterate inc 0)) 10000) (* 1/2 10000 10001)) "fail" (not= (reduceseq #(assoc %1 %2 (inc (get %1 %2 0))) {} "aab") '({} {\a 1} {\a 2} {\b 1, \a 2})) "fail" (not= (reduceseq + '()) '(0)) "fix me!!" :else "pass")) (testreduceseq reduceseq) ;; We can slightly shorten the function by noticing that we add the accumulator to ;; the sequence whether the sequence is empty or not. (defn reduceseq ([f acc sq] (lazyseq (cons acc (if (empty? sq) '() (let [nextacc (f acc (first sq))] (reduceseq f nextacc (rest sq))))))) ([f sq] (reduceseq f (first sq) (rest sq)))) (testreduceseq reduceseq) ;; It's for some reason considered better style with the lazy functions to use whenlet and seq ;; instead of if and empty? ;; If we swap the arms of the if statement: (defn reduceseq ([f acc sq] (lazyseq (cons acc (iflet [sq (seq sq)] (let [nextacc (f acc (first sq))] (reduceseq f nextacc (rest sq))) '())))) ([f sq] (reduceseq f (first sq) (rest sq)))) (testreduceseq reduceseq) ;; and then notice that (cons a (iflet b c '())) ;; is equivalent to (cons a (whenlet b c)) ;; then we can shorten the function yet further, and put it into this 'house style' (defn reduceseq ([f acc sq] (lazyseq (cons acc (whenlet [sq (seq sq)] (reduceseq f (f acc (first sq)) (rest sq)))))) ([f sq] (reduceseq f (first sq) (rest sq)))) (testreduceseq reduceseq) ;; This actually looks a bit the wrong way round to me, but it's shorter than ;; the original, so it's probably just a matter of getting used to this style in ;; the same way that I'm used to the SICP style of putting the empty case first. ;; Having put it into the house style, we can fix the bug with the empty list ;; and no accumulator: (defn reduceseq ([f acc sq] (cons acc (lazyseq (whenlet [sq (seq sq)] (reduceseq f (f acc (first sq)) (rest sq)))))) ([f sq] (iflet [sq (seq sq)] (reduceseq f (first sq) (rest sq)) (list (f))))) (testreduceseq reduceseq) ;; And as a final flourish, I thought it would be nice to use the destructuring notation (defn reduceseq ([f acc sq] (cons acc (lazyseq (whenlet [[head & tail] (seq sq)] (reduceseq f (f acc head) tail))))) ([f sq] (iflet [[head & tail] (seq sq)] (reduceseq f head tail) (list (f))))) (testreduceseq reduceseq) ;; That works fine, but actually, the function look more 'noisy' like that than before, ;; so I prefer the penultimate version. ;; Feeling pretty happy with this, I thought about it for a bit, and decided that ;; "reductions" would be a better name for reduceseq ;; So my final version was: (defn reductions ([f acc sq] (cons acc (lazyseq (whenlet [sq (seq sq)] (reductions f (f acc (first sq)) (rest sq)))))) ([f sq] (iflet [sq (seq sq)] (reductions f (first sq) (rest sq)) (list (f))))) "WARNING: reductions already refers to: #'clojure.core/reductions in namespace: user, being replaced by: #'user/reductions" ;; Here's the source of the standard version, which was apparently added in version 1.2: (defn reductions "Returns a lazy seq of the intermediate values of the reduction (as per reduce) of coll by f, starting with init." {:added "1.2"} ([f coll] (lazyseq (iflet [s (seq coll)] (reductions f (first s) (rest s)) (list (f))))) ([f init coll] (cons init (lazyseq (whenlet [s (seq coll)] (reductions f (f init (first s)) (rest s))))))) ;; I suppose that there are some positives to take away from this: ;; At least I know that I am thinking along roughly the same lines as the designers ;; of Clojure.... ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Footnote ;; Why doesn't this work any more? ;; (use 'clojure.contrib.trace) ;; (dotrace (reduceseq) (take 5 (reduceseq + 0 '(1 2 3 4 5))))
Blog Archive

▼
2010
(44)

▼
September
(15)
 Clojure 1.3 : First Impression : Clojure vs Scala ...
 Clojure 1.3 alpha release : A pom.xml to try it ou...
 Macros and Type Hints: Metadata and the Medusa.
 Clojure Faster than Machine Code?
 Graphics like it's 1979: How to get clojure to act...
 An Astonishing Macro of Narayan Singhal
 Clojure Swank Server is Insecure by Default : Use ...
 Clojure is Fast
 Clojure Macro Tutorial (Part III: Syntax Quote)
 Clojure Macro Tutorial (Part II: The Compiler Stri...
 Clojure Macro Tutorial (Part I, Getting the Compil...
 defn vs defmacro: What is a macro? Why is it diffe...
 reduceseq: An Exercise in Laziness and Futility
 Learn and Use the Standard Library
 Understanding RSA Public Key Encryption with Clojure

▼
September
(15)
Monday, September 6, 2010
reduceseq: An Exercise in Laziness and Futility
Subscribe to:
Post Comments (Atom)
(runningaverage (range 5)) ;; (0 1/2 1 3/2 2)
ReplyDelete(map #(/ % 2) (range 5)) ;; (0 1/2 1 3/2 2)
;;i think you actually want
(runningaverage (range 5)) ;; (0 1/2 2/3 3/4 4/5)
maybe i'm misunderstanding what you want out of your running average but to me a running average would be the average of all numbers in the sequence up to the nth number.
Hi Brandon, thanks for the comment!
ReplyDeleteI was trying to calculate something like this:
sequence : 0 1 2 3 4 5 6
running sum : 0 1 3 6 10 15 21
element count : 1 2 3 4 5 6 7
quotients : 0 1/2 3/3 6/4 10/5 15/6 21/7
simplified : 0 1/2 1 3/2 2 5/2 3
(It's roughly the area of a triangle over the base of the triangle, x>1/2x*x/x)
How did you get yours?
whoops. i messed up my logic. i guess i was just confused that the running average of integers is simply n/2
ReplyDeleteThat's crazy, your final function turned out to be identical to the version in Clojure. Just a couple different var names and your signatures are in reverse order  everything else is the same.
ReplyDeleteGiven how flexible Clojure is, it's interesting that there's a clearly idiomatic way that emerges.