Search This Blog

Tuesday, February 5, 2019

The Unexpected Appearance of Schlemiel, the Painter

;; The Unexpected Appearance of Schlemiel, the Painter

;; I was doing some statistics one day, and I defined:

;; the average of a finite sequence
(defn average [sq] (/ (reduce + sq) (count sq)))

;; and the square of a number
(defn square [x] (* x x))

;; and a way of forgetting about all the fiddly little digits at the end
(defn twosf   [x]  (float (/ (Math/round (* x 100.0)) 100))) 

;; but for the variance I was a little torn between:
(defn variance-one [sq]
  (let [av (average sq)]
    (average (map #(square (- % av)) sq)))) 
;; and 
(defn variance-two [sq]
  (let [sqdiff #(square (- % (average sq)))]
    (average (map  sqdiff sq))))

;; and (I have a regrettable weakness for the terse...) 
(defn variance-one-liner [sq] (average (map #(square (- % (average sq))) sq)))

;; but I was surprised when I noticed this: 

(let [s (repeatedly 1000 #(rand))]
  (twosf (reduce + s)) ;; just to force the sequence to be generated before timing things
  [(time (twosf (reduce + s)))
   (time (twosf (average  s)))
   (time (twosf (variance-one s)))
   (time (twosf (variance-two s)))
   (time (twosf (variance-one-liner s)))])

;; "Elapsed time: 0.535715 msecs"
;; "Elapsed time: 0.834523 msecs"
;; "Elapsed time: 1.417108 msecs"
;; "Elapsed time: 251.650722 msecs"
;; "Elapsed time: 248.196331 msecs"
;; [496.83 0.5 0.09 0.09 0.09]

;; It seems that all these functions are correct, in the sense that they are producing
;; correct-looking answers, and yet one of them is orders of magnitude faster.

;; What is going on here, and why?


  1. You're mapping a function that contains average over sq, so it's recomputing the average multiple times, I suspect. Theoretically, the compiler should be able to figure out that average is a pure function with no side effects and only perform that calculation once, and is thus constant for the same inputs, but apparently it isn't doing that. In variance-one you're forcing that calculation into the av temporary, so it's only computed once. Ideally, the compiler would be smart enough to transform variance-two into variance-one automatically.

  2. Dave Roberts is right:
    user=> (defn variance-two-2 [sq]
    #_=> (let [avg (average sq)
    #_=> sqdiff #(square (- % avg))]
    #_=> (average (map sqdiff sq))))
    user=> (let [s (repeatedly 1000 #(rand))]
    #_=> (twosf (reduce + s)) (time (twosf (variance-two-2 s))))
    "Elapsed time: 0.8499 msecs"

    user=> (defn variance-one-liner-2 [sq] (let [avg (average sq)] (average (map #(square (- % avg)) sq))))
    user=> (let [s (repeatedly 1000 #(rand))]
    #_=> (twosf (reduce + s)) (time (twosf (variance-one-liner-2 s))))
    "Elapsed time: 1.070023 msecs"