Search This Blog

Tuesday, February 5, 2019

The Unexpected Appearance of Schlemiel, the Painter


;; The Unexpected Appearance of Schlemiel, the Painter

;; So, it is clear that my model of how things are done was badly broken

;; I am doing some statistics, one day, and so I define:

;; the average of a finite sequence
(defn average [sq] (/ (reduce + sq) (count sq)))

;; and the square of a number
(defn square [x] (* x x))

;; and a way of forgetting about all the fiddly little digits at the end
(defn twosf   [x]  (float (/ (Math/round (* x 100.0)) 100))) 

;; but for the variance I am a little torn between:
(defn variance-one [sq]
  (let [av (average sq)]
    (average (map #(square (- % av)) sq))))

;; ;

(defn variance-two [sq]
  (let [sqdiff #(square (- % (average sq)))]
    (average (map  sqdiff sq))))

;; and (I have a regrettable weakness for the terse...) 
(defn variance-one-liner [sq] (average (map #(square (- % (average sq))) sq)))

;; but what I am not expecting, is this: 

(let [s (repeatedly 1000 #(rand))]
  (twosf (reduce + s)) ;; just to force the sequence to be generated before timing things
  [(time (twosf (reduce + s)))
   (time (twosf (average  s)))
   (time (twosf (variance-one s)))
   (time (twosf (variance-two s)))
   (time (twosf (variance-one-liner s)))])

;; "Elapsed time: 0.535715 msecs"
;; "Elapsed time: 0.834523 msecs"
;; "Elapsed time: 1.417108 msecs"
;; "Elapsed time: 251.650722 msecs"
;; "Elapsed time: 248.196331 msecs"
;; [496.83 0.5 0.09 0.09 0.09]


;; It seems that all these functions are correct, in the sense that they are producing
;; correct-looking answers, and yet:

;; It seems that variance-one is doing what I expect, running down the sequence twice and ending up
;; taking about twice as long as averaging it.

;; But that the other two are taking hundreds of times longer, possibly because they are
;; re-calculating the average of the sequence every time.

;; I had a nice hour or so, thinking about what was going on here, and why, and wonder if you might
;; enjoy the same thoughts, dear readers.

2 comments:

  1. You're mapping a function that contains average over sq, so it's recomputing the average multiple times, I suspect. Theoretically, the compiler should be able to figure out that average is a pure function with no side effects and only perform that calculation once, and is thus constant for the same inputs, but apparently it isn't doing that. In variance-one you're forcing that calculation into the av temporary, so it's only computed once. Ideally, the compiler would be smart enough to transform variance-two into variance-one automatically.

    ReplyDelete
  2. Dave Roberts is right:
    ```
    user=> (defn variance-two-2 [sq]
    #_=> (let [avg (average sq)
    #_=> sqdiff #(square (- % avg))]
    #_=> (average (map sqdiff sq))))
    user=> (let [s (repeatedly 1000 #(rand))]
    #_=> (twosf (reduce + s)) (time (twosf (variance-two-2 s))))
    "Elapsed time: 0.8499 msecs"
    0.08

    user=> (defn variance-one-liner-2 [sq] (let [avg (average sq)] (average (map #(square (- % avg)) sq))))
    user=> (let [s (repeatedly 1000 #(rand))]
    #_=> (twosf (reduce + s)) (time (twosf (variance-one-liner-2 s))))
    "Elapsed time: 1.070023 msecs"
    0.08
    ```

    ReplyDelete

Followers