;; Is Clojure Still Fast ? ;; Once upon a time I wrote a blog post saying that clojure was fast. It still is, and optimizing ;; it is now much easier than it used to be, but it doesn't seem to be *quite* as fast as it once ;; was. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; I optimized a program to solve a differential equation in a ;; simpleminded way: ;; The equation was: dy/dt = f(t,y) where f(t, y) = ty and y=0 when t=0 ;; ( the exact solution is: y=e^(t)+t1 ) ;; Here's a program to solve it using Euler's method (defn f [t y] ( t y)) (defn solveit [t0 y0 h its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (f t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) ;; For every go round the loop we have to: ;; compare its with 0, ;; branch depending on the result, ;; add t0 to h, ;; call f with t0 and y0, ;; multiply h and the result, ;; add that to y0, ;; jump. ;; So if this was an assembly language program that worked the way ;; you'd expect, each loop would take 7 cycles. ;; According to /proc/cpuinfo, my netbook runs at 1.662 GHz. (def *cpuspeed* 1.662) ;; We care about how many cycles each iteration of the solver takes: (defmacro cyclesperit [expr its] `(let [start# (. System (nanoTime)) ret# ( ~@expr (/ 1.0 ~its) ~its ) finish# (. System (nanoTime))] (int (/ (* *cpuspeed* ( finish# start#)) ~its)))) ;; With the program as written, this estimate turns out to have been a little optimistic. ;; The figures in the original post were on a desktop machine that I no longer have, which was more ;; powerful than my current netbook. ;; To two significant figures, the results of the timing expression (cyclesperit (solveit 0.0 1.0) 1000000) ;; are: ;; On my old desktop with clojure 1.2 : 2300 cycles ;; On my netbook with clojure 1.2 : 2800 cycles ;; On my netbook with clojure 1.3 : 2500 cycles ;; On my netbook with clojure 1.4 : 2400 cycles ;; So it looks like my netbook is not only slower in clock speed than my desktop was, but also in ;; terms of cycles/iteration. That's not surprising as the netbook has an Atom processor, optimized ;; for low power rather than for speed. ;; But it also looks as though clojure has been speeding up slightly, which has almost made up for that. ;; I'm also assuming that the JVM itself hasn't changed much since the original blog post. The ;; netbook timings were all done today on the same JVM, but the desktop timings are from a while ;; back, so that might account for some differences. ;; So we're looking at a slowdown of about 300 times over what we could probably ;; achieve coding in assembler or in C with a good optimizing compiler (and of ;; course I'm assuming that floating point operations take one cycle each) ;; This is about the sort of speed that you'd expect from a dynamic language ;; without any optimization or type hinting. ;; In the original blog post I went through a number of faster versions: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; In solveit2 I explicitly typed the loop variables, inlined the function call, and made an ;; internal target for recur. That speeded things up considerably: (defn solveit2 [t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (int its)] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) ;; Let's time that and see how it goes: (cyclesperit (solveit2 0.0 1.0) 10000000) ;; On my old desktop with clojure 1.2 : 490 cycles ;; On my netbook with clojure 1.2 : 600 cycles ;; On my netbook with clojure 1.3 : 44 cycles ;; On my netbook with clojure 1.4 : 44 cycles ;; Wow! ;; Again we see that my netbook is around 20% slower, but the performance change between clojure 1.2 ;; and clojure 1.3 was incredible. ;; There's now about a factor of 6 between the clojure version and the imaginary assembler program ;; on the imaginary cpu in my head. And all I've done is to declare the types and inline the functions. ;; This is actually so impressive that I want to examine the three things in detail, to see how the ;; three things combine to cause the speedup. As I remember from clojure 1.2, you needed all three ;; changes to see any great difference. In 1.4 you can type hint separately from making an internal loop target. ;; All timings with clojure 1.4.0. (defn f [t y] ( t y)) (defn solveit [t0 y0 h its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (f t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) (cyclesperit (solveit 0.0 1.0) 10000000) ; 2481 ;; type hints: (defn ^double tf [^double t ^double y] ( t y)) (defn solveit11 [^double t0 ^double y0 ^double h ^long its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h (tf t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) (cyclesperit (solveit10 0.0 1.0) 10000000) ; 289 ;; Inline f (defn solveit12 [ ^double t0 ^double y0 ^double h ^long its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])) (cyclesperit (solveit12 0.0 1.0) 10000000) ; 44 ;; Internal loop target (defn solveit13 [ ^double t0 ^double y0 ^double h ^long its] (loop [t0 t0, y0 y0, h h, its its] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) (cyclesperit (solveit13 0.0 1.0) 10000000) ; 44 ;; Original version (defn solveit2 [t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (int its)] (if (> its 0) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) ;; Let's time that and see how it goes: (cyclesperit (solveit2 0.0 1.0) 10000000) ; 44 ;; This is pretty awesome. Type hints alone are giving us a factor of eight speedup, and inlining ;; the function then gives us another factor of 6. The internal loop target, which was originally a ;; bit of a hack to allow clojure to deduce types doesn't make any difference to the hinted version, ;; but in fact that still works, and gives the compiler enough information to get the same speed. ;; So far I really could not be more impressed. ;; In the original post, I then used a profiler to find out where the loop was running slow, and ;; did some strange things to make it fast: (defn solveit3 [t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (int its)] (if (> its (int 0)) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) (cyclesperit (solveit3 0.0 1.0) 10000000) ;; On my old desktop with clojure 1.2 : 70 cycles ;; On my netbook with clojure 1.2 : 90 cycles ;; On my netbook with clojure 1.3 : 100 cycles ;; On my netbook with clojure 1.4 : 100 cycles ;; Originally, that (int 0) instead of 0 in the comparison made all the difference, causing a huge ;; and unexpected speedup. ;; The odd thing here is that the same program actually runs slightly slower in clojure 1.3/1.4 than ;; it does in clojure 1.2. Not much, and they're all considerably slower than clojure 1.4's best ;; shot so far. ;; What I think is happening here is that that int cast is taking up unnecessary time in the later clojures, ;; whereas in 1.2 it was the final piece of the puzzle as far as the compiler was concerned. ;; To test this: (defn solveit3a [ t0 y0 h its] (loop [t0 (double t0), y0 (double y0), h (double h), its (long its)] (if (> its (long 0 )) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its]))) (cyclesperit (solveit3a 0.0 1.0) 10000000) ; 44 ;; solveit3a runs at the same 44 cycles/iteration as we've been seeing before. I get the ;; impression that modern clojure prefers longs to ints. ;; The final craziness was to take the (int 0) out of the loop entirely: (defn solveit4 [t0 y0 h its] (let [zero (int 0)] (loop [t0 (double t0) y0 (double y0) h (double h) its (int its)] (if (> its zero) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])))) (cyclesperit (solveit4 0.0 1.0) 100000000) ;; And this is where it gets weird: ;; On my old desktop with clojure 1.2 : 23 cycles ;; On my netbook with clojure 1.2 : 32 cycles ;; On my netbook with clojure 1.3 : 45 cycles ;; On my netbook with clojure 1.4 : 45 cycles ;; Clojure 1.3/1.4 is now running at its usual (fast) speed, despite the ints rather than longs in solveit4 ;; But Clojure 1.2 on this program is able to run about 33% faster than the modern versions. ;; So it actually looks as though whatever awesomeness has caused the phenomenal speedup between ;; clojure 1.2 and 1.3 has actually slightly slowed down the optimized version. ;; I'm imagining that this is something to do with using longs rather than ints. ;; But I haven't done any kind of close profiling to see if I can make 1.4 run any faster than 44 cycles/loop. ;; Someone in a comment on my earlier post said that they were seeing 8 cycles/second on their ;; Macintosh, so it might just be that there's something weird about the Atom processor. ;; At any rate, it looks as though optimizing numeric code in Clojure is now dead easy. You just ;; tell it the types, inline function calls, and that's as good as it gets. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; For those in real need of cycles and willing to take risks to save them, there's: (set! *uncheckedmath* true) ;; Which I don't think has an equivalent in 1.2, which cuts a few cycles off: (defn solveit4 [t0 y0 h its] (let [zero (long 0)] (loop [t0 (double t0) y0 (double y0) h (double h) its (long its)] (if (> its zero) (let [t1 (+ t0 h) y1 (+ y0 (* h ( t0 y0)))] (recur t1 y1 h (dec its))) [t0 y0 h its])))) (cyclesperit (solveit4 0.0 1.0) 100000000) ; 37 ;; I do wonder what is going on here. I'm not sure what checking is here to be turned off. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Incidentally, the following two java programs also produce the numbers that are seen above: ;; public class Euler{ ;; public static void main (String[] args){ ;; double cpuspeed = 1.662; ;; int its = 10000000; ;; double t=0; ;; double y=0; ;; double h=1.0/its; ;; Long start = System.nanoTime(); ;; for(int i = 0; i < its; i++){ ;; y = y+h*(ty); ;; t = t+h; ;; } ;; Long finish = System.nanoTime(); ;; System.out.println("y=" + y + " t=" +t); ;; System.out.println("cycles/iteration: " + ((int) ((cpuspeed * (finish  start)) / its))); ;; } ;; } ;; y=0.3678794227282174 t=0.99999999975017 ;; cycles/iteration: 32 ;; public class Euler{ ;; public static void main (String[] args){ ;; double cpuspeed = 1.662; ;; long its = 10000000; ;; double t=0; ;; double y=0; ;; double h=1.0/its; ;; long start = System.nanoTime(); ;; for(long i = 0; i < its; i++){ ;; y = y+h*(ty); ;; t = t+h; ;; } ;; long finish = System.nanoTime(); ;; System.out.println("y=" + y + " t=" +t); ;; System.out.println("cycles/iteration: " + ((int) ((cpuspeed * (finish  start)) / its))); ;; } ;; } ;; y=0.3678794227282174 t=0.99999999975017 ;; cycles/iteration: 37
Blog Archive

▼
2013
(52)

▼
February
(10)
 Rerum Cognoscere Causas
 Clojure's Reader is Unsafe
 Linear Congruential Random Number Generators
 Packing and Unpacking Numbers : Deterministic Rand...
 Clojure Emacs Eval Paste Keyboard Macros / Generat...
 Mathematics as an Experimental Science
 A Naive Bayesian Classifier
 Clojure is Fast / Is Clojure Still Fast? / Clojure...
 FOR vs MAP
 Runtime Require: Downloading a New Dependency and ...

▼
February
(10)
Friday, February 8, 2013
Clojure is Fast / Is Clojure Still Fast? / Clojure is as Fast as Java
Subscribe to:
Post Comments (Atom)
Running on a MBA (Core i5 1.8 GHz) on Clojure 1.4 and Java (build 1.7.0_05b06). Only chang was setting cpuspeed to 1.8.
ReplyDeleteI got 424 on my first run of the unoptimized version, successive runs drop to a steady state of 103. Killing the nrepl started me at 257 and drops to same steady state.
Also ran solveit4 wuthout setting unchecked math, 11 on the initial run dropping to 7 steady state. yay.
Your benchmarking is too simplistic. https://github.com/hugoduncan/criterium or better yet : http://www.azulsystems.com/presentations/artofjavabenchmarking
ReplyDeleteThe newer versions of Clojure have moved to 64bit arithmetic for good, which is why `int` is giving speedup in 1.2 for you. Now, if your post was annotated with line numbers I could mention those in my comment. Another thing to note is when you run the code with JVM arg "server" it automatically JITinlines functions doing primitive arithmetic stuff, so let the JVM warm up for a while before measuring anything. Criterium is the right tool. Nice runthrough though.
ReplyDelete