Learning Clojure: 2011

Thursday, September 22, 2011

Clojure Setup Tutorial with EMACS: clojure, clojure.contrib, swank, slime, maven and maven-clojure-plugin in a couple of seconds

As of 22nd September 2011, this is still working fine, this time on Fedora 14 (substituting yum for apt-get, but that's the only difference)

------------------------------------------------------------------------------------------------------------------------------------

I've just (15th February 2011) had to set this up again for someone on an Ubuntu 10.10 box.

And to my speechless amazement (16th February 2011), it worked absolutely the same way on Radek Ostrowski (a complete stranger) 's mac at dev8d, thus condemning him to years of suffering at the hands of Aquamacs.

This method is still my favourite, still the one I actually use in practice, and still the easiest, and it's been stable for about a year now.

Since I've just confirmed that it still works, I've moved it back to the top.

Install maven:

$sudo apt-get install maven2

Then create a pom.xml file to tell maven which repositories to use. There's an example below to copy and paste.

Once you've got maven installed, and made a pom.xml file, then:

$mvn clojure:repl

Will start a REPL with clojure 1.2 and clojure-contrib 1.2 on the classpath. The REPL should use jLine to give it editing and history on all platforms.

If you like to use EMACS instead, then

$mvn clojure:swank

Will start a swank server, which you can connect to with M-x slime-connect.

That's it from the clojure side.

To set up emacs, and equip it to talk to the clojure swank server:

$sudo apt-get install emacs

Then install the emacs lisp package archive (see http://tromey.com/elpa/) by evaluating this code:

(cut and paste it into the scratch buffer, put the cursor in the middle, and use M-C-x):

--emacs lisp to install elpa---------------

(let ((buffer (url-retrieve-synchronously
        "http://tromey.com/elpa/package-install.el")))
  (save-excursion
    (set-buffer buffer)
    (goto-char (point-min))
    (re-search-forward "^$" nil 'move)
    (eval-region (point) (point-max))
    (kill-buffer (current-buffer))))

---end of emacs lisp to install elpa-------------

Now use M-x package-list-packages to bring up the list of packages, and use i and then x to mark and then install slime, slime-repl, and clojure-mode.

Then connect emacs to the already running clojure image with M-x slime-connect

That should be it. You should now be at a running clojure 1.2 repl inside emacs.

Here's an example of a pom.xml file that pulls in clojure and clojure-contrib 1.2 . Just cut and paste it.

As well as the essentials, I've also added: the clojars.org repository, where many useful clojure packages live; the maven versions plugin, which helps with keeping everything cutting edge; and jline so that command line repls work better (mvn clojure:repl)

I like to have my startup repls conditioned a little, so if you have a startup script that you always want to run, add this snippet

<replScript>startup.clj</replScript>

</configuration>

to the clojure-maven-plugin section so that when maven starts a repl, the code in startup.clj is loaded as the first action. This is a good place to set print-length and print-level, so that they will be set before the swank server starts, which means that you won't hang emacs by evaluating an infinite sequence.

I also like to require everything on the classpath, so that I can use things like find-doc to find out about everything.

pom.xml files look terrifying, but they're really not. It's just that xml is such a godawful verbose way to write things out.

It contains: the addresses of three repositories which hold vital code; the names and versions of four vital jar files that you need; and the names and versions of two helpful maven plugins.

pom.xml

<project>

  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>hello-maven-clojure-swank</artifactId>

  <version>1.0-SNAPSHOT</version>
  <name>hello-maven</name>
  <description>maven, clojure, emacs: together at last</description>

  <repositories>
    <repository>
      <id>clojars</id>
      <url>http://clojars.org/repo/</url>
    </repository>
    <repository>
      <id>clojure</id>
      <url>http://build.clojure.org/releases</url>
    </repository>
    <repository>
      <id>central</id>
      <url>http://repo1.maven.org/maven2</url>
    </repository>
  </repositories>

  <dependencies>
    <dependency>
      <groupId>org.clojure</groupId>
      <artifactId>clojure</artifactId>
      <version>1.2.0</version>
    </dependency>
    <dependency>
      <groupId>org.clojure</groupId>
      <artifactId>clojure-contrib</artifactId>
      <version>1.2.0</version>
    </dependency>
    <dependency>
      <groupId>jline</groupId>
      <artifactId>jline</artifactId>
      <version>0.9.94</version>
    </dependency>
    <dependency>
      <groupId>swank-clojure</groupId>
      <artifactId>swank-clojure</artifactId>
      <version>1.2.1</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
         <groupId>com.theoryinpractise</groupId>
            <artifactId>clojure-maven-plugin</artifactId>
         <version>1.3.3</version>
      </plugin>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
          <artifactId>versions-maven-plugin</artifactId>
        <version>1.2</version>
      </plugin>
    </plugins>

  </build>

</project>

Sunday, May 29, 2011

Numerical Integration: Better Refinements?

;; Numerical Integration: Better Refinements?

;; Here are some very simple functions which we might want to test integration
;; methods on:
(defn square  [x] (* x x))
(defn sine    [x] (Math/sin x))
(defn step    [x] (if (< x 1/2) 0.0 1.0))
(defn inverse [x] (/ x))

;; Here are some Newton-Cotes formulae for approximate integration:

(defn trapezium-rule [f a b]
  (* 1/2 (- b a) (+ (f a) (f b))))

(defn simpson-rule [f a b]
  (let [midpoint (+ a (/ (- b a) 2))]
    (* 1/6 (- b a) (+ (f a) (* 4 (f midpoint)) (f b)))))

(defn simpson38-rule [f a b]
  (let [midpoint1 (/ (+ a a b) 3)
        midpoint2 (/ (+ a b b) 3)]
    (* 1/8 (- b a) (+ (f a) (* 3 (f midpoint1)) (* 3 (f midpoint2)) (f b)))))

(defn booles-rule [f a b]
  (let [midpoint1 (/ (+ a a a b) 4)
        midpoint2 (/ (+ a a b b) 4)
        midpoint3 (/ (+ a b b b) 4)]
    (* 1/90 (- b a) (+ (* 7 (f a)) (* 32 (f midpoint1)) (* 12 (f midpoint2)) (* 32 (f midpoint3)) (* 7 (f b))))))

;; We can use any of these rules to get estimate of the integral of a function over an interval:
(simpson-rule inverse 1 3) ; 10/9

;; If we halve the interval and use the rule over both halves, then we can use the rule to
;; get a better estimate by adding the estimates for the half-intervals
(+
 (simpson-rule inverse 1 2)
 (simpson-rule inverse 2 3)) ; 11/10

;; We can guess at the error involved in the estimate by taking the difference
;; between these two estimates, on the basis that splitting the interval usually
;; makes most of the error go away.

(- (simpson-rule inverse 1 3)
   (+
    (simpson-rule inverse 1 2)
    (simpson-rule inverse 2 3))) ; 1/90

;; So we'd expect that the first estimate is out by a bit more than 1/90, and
;; that the second is out by rather less than 1/90

;; For the inverse function, which can be integrated symbolically, we know the
;; true answer:
(- (Math/log 3) (Math/log 1)) ; 1.0986122886681098
(/ 10.0 9) ; 1.1111111111111112
(/ 11.0 10) ; 1.1

;; So the errors are really:
(- 1.0986122 10/9)  ; -0.0124989111111109  ; which is ~ 1/90
(- 1.0986122 11/10) ; -0.00138780000000005 ; which is ~ 1/900

;; This method of guessing the error is deeply suspect, and can go wrong, but I
;; won't go into details.

;; I think it's good enough for our purposes as long as the functions we want to
;; integrate are reasonably well behaved and we take small enough intervals.

;; So we can easily make a function which gives us the more refined of the two
;; estimates, together with a guess as to how close it is to the truth.
(defn approx-with-error[rule f a b]
  (let [guess (rule f a b)
        midpoint (/ (+ a b) 2)
        better-guess (+ (rule f a midpoint) (rule f midpoint b))
        error-estimate (- guess better-guess)
        abs-error-estimate (if (> error-estimate 0) error-estimate (- error-estimate))]
    [better-guess abs-error-estimate]))


;; Let's try it out on a few cases, on the particularly nasty integral of 1/x over [0.01,100]

;; This is the true answer
(- (Math/log 100) (Math/log 0.01)) ; 9.210340371976184

(approx-with-error trapezium-rule inverse 0.01 100) ; [2500.999775019998 2499.000174980002]
;; We guess 2500, and we think we're out by at most 2499, which is just true
(approx-with-error simpson-rule inverse 0.01 100) ; [835.4437770204454 832.5559396728856]
(approx-with-error simpson38-rule inverse 0.01 100) ; [627.4427811912234 624.2442845054817]
(approx-with-error booles-rule inverse 0.01 100) ; [391.7824297523125 388.1576179566068]

;; When we split the interval into two halves [0.01, 50.05] [50.05,100]
(approx-with-error trapezium-rule inverse 0.01 50.05) ; [1252.2495505293746 1250.2503495705255]
(approx-with-error trapezium-rule inverse 50.05 100) ; [0.7072645364881702 0.041486462512828726]

;; Our guess tells us that the great majority of the error is in the first sub interval
;; We might want to refine that first, before bothering with the other one:
;; We'll now split [0.01, 25.025][25.025,50.05][50.05,100]
(approx-with-error trapezium-rule inverse 0.01 25.025) ; [626.6241012183343 624.6256989814656]
(approx-with-error trapezium-rule inverse 25.025 50.05) ; [0.7083333333333333 0.04166666666666663]
(approx-with-error trapezium-rule inverse 50.05 100) ; [0.7072645364881702 0.041486462512828726]

;; Again, one subinterval seems to be responsible for the majority of our errors.

;; We could keep a list of intervals, sorted by the estimated error, and always refine the one
;; with the largest guessed error.

(defn interval->errorstruct [rule f [a b]]
  (let [[guess error-guess] (approx-with-error rule f a b)]
    [error-guess, guess, [a,b]]))

(def errorstructs (map (partial interval->errorstruct trapezium-rule inverse)
                       [[0.01,25.025][25.025 50.05][50.05 100]]))

errorstructs
;; ([624.6256989814656 626.6241012183343     [0.01   25.025]]
;;  [0.04166666666666663 0.7083333333333333  [25.025 50.05]]
;;  [0.041486462512828726 0.7072645364881702 [50.05  100]])

;; And now we need a function to refine the interval with the largest error

(defn refine[rule f errorstructs]
  (let [sortedstructs (reverse (sort errorstructs))
        [_ _ [a b]] (first sortedstructs)
        remains (rest sortedstructs)
        midpoint (/ (+ a b) 2)
        subinterval1 (interval->errorstruct rule f [a midpoint])
        subinterval2 (interval->errorstruct rule f [midpoint b])] 
        (cons subinterval1 (cons subinterval2 remains))))

;; Now with every call to refine, we refine the interval with the largest error estimate
(refine trapezium-rule inverse errorstructs)
;; ([311.93889676733556 313.93570379188156 [0.01 12.5175]]
;;  [0.04159457274964806 0.7079060863675477 [12.5175 25.025]]
;;  [0.04166666666666663 0.7083333333333333 [25.025 50.05]]
;;  [0.041486462512828726 0.7072645364881702 [50.05 100]])


(def successive-trapezium-refinements (iterate (partial refine trapezium-rule inverse) errorstructs))


;; Here's what it looks like after a few iterations
(nth successive-trapezium-refinements 5)
;; ([18.81475746721543 20.764864658796014 [0.01 0.7917187499999999]]
;;  [0.040533241059784286 0.7015625070966475 [0.7917187499999999 1.5734374999999998]]
;;  [0.04109463731966401 0.7049306354620377 [1.5734374999999998 3.136875]]
;;  [0.041379306898238544 0.7066276281534741 [3.136875 6.26375]]
;;  [0.04152264848816445 0.7074793872568832 [6.26375 12.5175]]
;;  [0.04159457274964806 0.7079060863675477 [12.5175 25.025]]
;;  [0.04166666666666663 0.7083333333333333 [25.025 50.05]]
;;  [0.041486462512828726 0.7072645364881702 [50.05 100]])

        
;; We can get our best guess for the whole thing and our total error estimate
;; by reducing this list

(reduce + (map first  (nth successive-trapezium-refinements 5))) ; 19.104035002910425
(reduce + (map second (nth successive-trapezium-refinements 5))) ; 25.708968772954105


;; After a hundred refinements..
(reduce + (map first  (nth successive-trapezium-refinements 100))) ; 0.010431101535137086
(reduce + (map second (nth successive-trapezium-refinements 100))) ; 9.213824736866899

;; After a thousand refinements..
(reduce + (map first  (nth successive-trapezium-refinements 1000))) ; 1.0913238861095381E-4
(reduce + (map second (nth successive-trapezium-refinements 1000))) ; 9.210376750235199

;; That's not bad, (the real answer is 9.210340371976184), but it's running very slowly.

;; We could try with a higher order rule

(def successive-boole-refinements (iterate (partial refine booles-rule inverse) errorstructs))
(reduce + (map first   (nth successive-boole-refinements 1000))) ; 4.420942778526893E-15
(reduce + (map second  (nth successive-boole-refinements 1000))) ; 9.210340371976176

;; In this case, that seems to work very well, but the run time is appalling.

;; The problem is that we have a longer and longer list of intervals at every
;; step, and every step, we have to sort this list. That's an n^2 algorithm,
;; which won't scale well.

;; What we should do here is use a priority queue. Clojure doesn't have an
;; immutable version, although it's possible to fake one with a sorted map.

;; But rather than do that, I'm going to drop out of the functional paradigm
;; altogther, and use the heap implementation from Java in a mutable fashion,
;; looping and popping and adding.

(defn improve-loop [rule f a b count]
  (let [pq (java.util.PriorityQueue. count (comparator (fn[a b](> (first a)(first b)))))]
    (.add pq (interval->errorstruct rule f [a b]))
    (loop [pq pq count count]
      (if (zero? count) pq
          (let [[err val [a b]] (.poll pq)
                midpoint (/ (+ a b) 2)
                aa (interval->errorstruct rule f [a midpoint])
                bb (interval->errorstruct rule f [midpoint b])]
            (doto pq
              (.add aa)
              (.add bb))
            (recur pq (dec count)))))))

;; Now we can do our calculation much faster
(defn integrate [rule f a b count]
  (let [pq (improve-loop rule f a b count)]
    [(reduce + (map first pq))
     (reduce + (map second pq))]))

;; We'll ask for a thousand refinements, and get back the error estimate, and the answer.
(integrate booles-rule inverse 0.01 100 1000) ; [4.455637248046429E-15 9.21034037197618]


;; Let's try the same integral over the very nasty range [0.0000001, 10000000] which caused serious
;; problems for our previous methods.
;; The real answer is
(- (Math/log 10000000) (Math/log 0.00000001)) ; 34.538776394910684

;; And our approximations are:
(integrate booles-rule inverse 0.00000001 10000000 10) ; [3.797743055486256E10 3.797743056542089E10]
(integrate booles-rule inverse 0.00000001 10000000 100) ; [3.3430724324184924E-5 34.53877704296225]
(integrate booles-rule inverse 0.00000001 10000000 1000) ; [4.549938203979309E-11 34.53877639491147]
(integrate booles-rule inverse 0.00000001 10000000 10000) ; [9.361001557239845E-16 34.53877639491065]

;; For the non-stiff integrals that we started playing with, Boole's rule is great:
;; It's exact for quadratics, and several higher powers
(integrate booles-rule square 0 2 10) ; [0 8/3]
(integrate booles-rule (fn[x] (* x x x x)) 0 2 10) ; [0 32/5]
(integrate booles-rule (fn[x] (* x x x x x)) 0 2 10) ; [0 32/3]

;; and very good for higher powers, even with very few refinements
(integrate booles-rule (fn[x] (* x x x x x x)) 0 2 10) ; [969/8589934592 471219269093/25769803776]
(integrate booles-rule (fn[x] (* x x x x x x)) 0 2 20) ; [2127/1099511627776 60316066438099/3298534883328]

;; convergence is great for sine
(integrate booles-rule sine 0 Math/PI 10) ; [1.7383848804897184E-9 1.9999999999725113]
(integrate booles-rule sine 0 Math/PI 100) ; [3.1931922384043077E-15 1.9999999999999991]
(integrate booles-rule sine 0 Math/PI 1000) ; [2.526233413538588E-17 1.999999999999999]
(integrate booles-rule sine 0 Math/PI 10000) ; [6.32722651455846E-18 2.0]


;; But I'm still quite worried about the error estimate that we made. It's only
;; a guess, and it can be a bad guess.  Here are some functions that are
;; deliberately designed to screw things up.

;; This function is extremely vibratey near the origin.
(defn sineinverse[x] (Math/sin (/ x)))

;; The error estimates are clearly wrong here, but the answers seem to settle down to something that looks plausible.

(integrate booles-rule sineinverse 0.001 10 1) ; [0.09752245288534744 3.189170812427795]
(integrate booles-rule sineinverse 0.001 10 10) ; [0.014802407142066881 2.725700351059874]
(integrate booles-rule sineinverse 0.001 10 100) ; [2.666579898821515E-4 2.7262259059929814]
(integrate booles-rule sineinverse 0.001 10 1000) ; [7.84363268117651E-9 2.7262019887881457]
(integrate booles-rule sineinverse 0.001 10 10000) ; [8.311387750656713E-15 2.726201989096135]

;; I'm slightly reassured that if we use the trapezium rule, which should be
;; slower converging but less sensitive to high derivatives, we seem to settle down to the same thing:

(integrate trapezium-rule sineinverse 0.001 10 1) ; [0.3493754261290617 2.9603246206316127]
(integrate trapezium-rule sineinverse 0.001 10 10) ; [0.12037643221528535 2.759621850911819]
(integrate trapezium-rule sineinverse 0.001 10 100) ; [0.011584111090323689 2.728470051290524]
(integrate trapezium-rule sineinverse 0.001 10 1000) ; [7.174438961790802E-4 2.7265014884997414]
(integrate trapezium-rule sineinverse 0.001 10 10000) ; [1.0854830311799172E-5 2.7262023437746654]

;; Since I don't actually know what the integral of sin(1/x) is, I've no idea
;; whether this answer is correct. Since both rules seem to settle down to the
;; same answer, I tend to believe that it is.

;; Here's a weird function, which looks like it should be even worse that sin(1/x) on its own
(defn strange[x] (- (Math/sin (/ x)) (/ (Math/cos (/ x)) x)))

;; In fact it's the derivative of x sin(1/x), so we can calculate the real answer over [0.001, 10]
;; which should be:
(- (* 10 (Math/sin 1/10)) (* 0.001 (Math/sin 1000))) ; 0.9975072869277496

;; Interestingly, the error estimates look sound for this one:
(integrate booles-rule strange 0.001 10 1) ; [109.91706304856582 -108.12753277035351]
(integrate booles-rule strange 0.001 10 10) ; [0.07641821305025362 -1.0276123964492345]
(integrate booles-rule strange 0.001 10 100) ; [0.0798435700032961 1.0469088424961843]
(integrate booles-rule strange 0.001 10 1000) ; [2.0359056110968434E-6 0.9975072871949854]
(integrate booles-rule strange 0.001 10 10000) ; [1.9224976990340685E-12 0.997507286927752]

;; Since we seem to have dealt well with some nasty functions, we might be
;; getting confident in our rule. I know I was!

;; But this innocuous looking function
(defn sine80squared[x] (square (Math/sin (* x 80))))

;; Is a complete nightmare. It looks as though the method is converging well:
(integrate booles-rule sine80squared 0 Math/PI 1) ; [1.1091279850485843E-28 3.7074689566598855E-28]
(integrate booles-rule sine80squared 0 Math/PI 10) ; [0.013089969389960716 0.7853981633974437]
(integrate booles-rule sine80squared 0 Math/PI 100) ; [1.7991469747360833E-12 0.7853981633974478]
(integrate booles-rule sine80squared 0 Math/PI 1000) ; [3.207733520089899E-13 0.7853981633974484]
(integrate booles-rule sine80squared 0 Math/PI 10000) ; [3.1095566849572033E-15 0.7853981633974481]

;; But if we use a different rule, it also seems to converge, but to a completely different answer
(integrate trapezium-rule sine80squared 0 Math/PI 1) ; [3.226319244612108E-28 4.262878793991289E-28]
(integrate trapezium-rule sine80squared 0 Math/PI 10) ; [0.01573134053904405 0.19774924859401588]
(integrate trapezium-rule sine80squared 0 Math/PI 100) ; [5.4528883422580115E-5 0.19634904414812832]
(integrate trapezium-rule sine80squared 0 Math/PI 1000) ; [4.6327740574637914E-7 0.19634954102557595]
(integrate trapezium-rule sine80squared 0 Math/PI 10000) ; [5.200068066397881E-9 0.19634954084967793]

;; In fact both answers are wrong. We can calculate the real integral of this
;; function over the interval [0,pi] which should be:

(/ (Math/PI) 2) ; 1.5707963267948966

;; In fact if we use our much earlier 'divide every interval evenly' algorithm:
(defn iterated-rule [rule f a b N]
  (if (= N 0)
    (rule f a b)
    (let [midpoint (+ a (/ (- b a) 2))]
      (+ (iterated-rule rule f a midpoint (dec N))
         (iterated-rule rule f midpoint b (dec N))))))

;; We very quickly get surprisingly good answers:
(iterated-rule trapezium-rule sine80squared 0 Math/PI 1) ; 1.13079223568638E-28
(iterated-rule trapezium-rule sine80squared 0 Math/PI 2) ; 4.262878793991289E-28
(iterated-rule trapezium-rule sine80squared 0 Math/PI 3) ; 3.5626606693542546E-28
(iterated-rule trapezium-rule sine80squared 0 Math/PI 4) ; 3.6535380881685736E-28
(iterated-rule trapezium-rule sine80squared 0 Math/PI 5) ; 1.5707963267948966
(iterated-rule trapezium-rule sine80squared 0 Math/PI 6) ; 1.570796326794897
(iterated-rule trapezium-rule sine80squared 0 Math/PI 7) ; 1.5707963267948972
(iterated-rule trapezium-rule sine80squared 0 Math/PI 8) ; 1.5707963267948966
(iterated-rule trapezium-rule sine80squared 0 Math/PI 9) ; 1.5707963267948957
(iterated-rule trapezium-rule sine80squared 0 Math/PI 10) ; 1.5707963267948963

;; With Booles' rule the story is the same:
(iterated-rule booles-rule sine80squared 0 Math/PI 1) ; 3.1974110932118413E-28
(iterated-rule booles-rule sine80squared 0 Math/PI 2) ; 3.7074689566598855E-28
(iterated-rule booles-rule sine80squared 0 Math/PI 3) ; 2.2340214425527414
(iterated-rule booles-rule sine80squared 0 Math/PI 4) ; 1.5358897417550046
(iterated-rule booles-rule sine80squared 0 Math/PI 5) ; 1.570796326794895
(iterated-rule booles-rule sine80squared 0 Math/PI 6) ; 1.5707963267948961
(iterated-rule booles-rule sine80squared 0 Math/PI 7) ; 1.5707963267948966
(iterated-rule booles-rule sine80squared 0 Math/PI 8) ; 1.5707963267948968
(iterated-rule booles-rule sine80squared 0 Math/PI 9) ; 1.5707963267948963


;; So the nice rule that we'd come up with, which worked so well for the stiff problem
;; of integrating 1/x near the origin, is completely broken on something that the obvious
;; recursion integrates (suspiciously) well.

;; The problem is that we used an error estimate that we can't trust to control
;; the refinement process.

;; If we guess that a certain interval contains hardly any error, then it will
;; never get refined at all, so we'll never find out that our guess is wrong.

Thursday, May 26, 2011

Numerical Integration: Better Refinements?

;; Numerical Integration: Better Refinements?

;; Here are some Newton-Cotes formulae for approximate integration:

(defn trapezium-rule [f a b]
  (* 1/2 (- b a) (+ (f a) (f b))))

(defn simpson-rule [f a b]
  (let [midpoint (+ a (/ (- b a) 2))]
    (* 1/6 (- b a) (+ (f a) (* 4 (f midpoint)) (f b)))))

(defn simpson38-rule [f a b]
  (let [midpoint1 (/ (+ a a b) 3)
        midpoint2 (/ (+ a b b) 3)]
    (* 1/8 (- b a) (+ (f a) (* 3 (f midpoint1)) (* 3 (f midpoint2)) (f b)))))

(defn booles-rule [f a b]
  (let [midpoint1 (/ (+ a a a b) 4)
        midpoint2 (/ (+ a a b b) 4)
        midpoint3 (/ (+ a b b b) 4)]
    (* 1/90 (- b a) (+ (* 7 (f a)) (* 32 (f midpoint1)) (* 12 (f midpoint2)) (* 32 (f midpoint3)) (* 7 (f b))))))


;; And here is a way to apply them to (power 2 N) subintervals
(defn iterated-rule [rule f a b N]
  (if (= N 0)
    (rule f a b)
    (let [midpoint (+ a (/ (- b a) 2))]
      (+ (iterated-rule rule f a midpoint (dec N))
         (iterated-rule rule f midpoint b (dec N))))))

;; Here are some very simple functions which we might want to test integration
;; methods on:
(defn square  [x] (* x x))
(defn sine    [x] (Math/sin x))
(defn step    [x] (if (< x 1/2) 0.0 1.0))
(defn inverse [x] (/ x))

;; We might notice that with our Newton-Cotes formulae, the change in the
;; estimate when we make a refinement is often around the same size as the
;; actual error.

;; Let's try using that to produce a method where we can say what error we'd
;; like, and if the change from refining the guess is too big, we should refine
;; more. And if we do refine more, we'll ask for half the desired error on each
;; of the two subintervals, which should mean that the total error is at least
;; comparable to the error we asked for!

(defn adaptive-rule-recurse [rule f a b desired-error]
  (let [guess (rule f a b)
        midpoint (/ (+ a b) 2)
        better-guess (+ (rule f a midpoint) (rule f midpoint b))
        error-estimate (- guess better-guess)
        abs-error-estimate (if (> error-estimate 0) error-estimate (- error-estimate))]
    (if (< abs-error-estimate desired-error) better-guess
        (let [half-desired-error (/ desired-error 2)]
          (+ (adaptive-rule-recurse rule f a midpoint half-desired-error)
             (adaptive-rule-recurse rule f midpoint b half-desired-error))))))

(adaptive-rule-recurse trapezium-rule square 0. 2 0.1) ; 2.6875
(adaptive-rule-recurse trapezium-rule square 0. 2 0.01) ; 2.66796875
(adaptive-rule-recurse trapezium-rule square 0. 2 0.001) ; 2.6669921875

(adaptive-rule-recurse trapezium-rule step 0.0001 1 0.1) ; 0.5
(adaptive-rule-recurse trapezium-rule step 0.0001 1 0.01) ; 0.5
(adaptive-rule-recurse trapezium-rule step 0.0001 1 0.001) ; 0.5
(adaptive-rule-recurse trapezium-rule step 0.0001 1 0.0001) ; 0.5

;; Here we're using the trapezium rule on the integral that we were previously unable to get
;; a good answer for: remember that the correct answer is 9.210340371976182

(adaptive-rule-recurse trapezium-rule inverse 0.0001 1 0.1)    ; 9.234002964342716
(adaptive-rule-recurse trapezium-rule inverse 0.0001 1 0.01)   ; 9.211881820961814
(adaptive-rule-recurse trapezium-rule inverse 0.0001 1 0.001)  ; 9.210518109406467
(adaptive-rule-recurse trapezium-rule inverse 0.0001 1 0.0001) ; 9.210358164670637

;; At this point, we're getting much better answers than we ever got before, but they've started
;; taking noticeable time to compute.

;; However, we still retain the option of using the higher order formulae:

(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.1) ; 9.210347324857782
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.01) ; 9.210345994304586
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.001) ; 9.210344014376869
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.0001) ; 9.21034116413936
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.00001) ; 9.210340404907965
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.000001) ; 9.210340376142819
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.0000001) ; 9.210340372376441
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.00000001) ; 9.210340372016345
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.000000001) ; 9.21034037198001
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.0000000001) ; 9.210340371976589
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.00000000001) ; 9.210340371976214
(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.000000000001) ; 9.210340371976185

;; These answers come back instantaneously, even when we're calculating the
;; previously impossible answer to the question to 12 decimal places.

;; But beware! If we ask for too much accuracy, then the effect of floating
;; point noise means that our recursion may never terminate:

(adaptive-rule-recurse booles-rule inverse 0.0001 1 0.0000000000001) ; freezes REPL!


;; At this point, I'm very tempted to say:

(defn integrate [f a b]
  (adaptive-rule-recurse booles-rule f a b 0.00000001))

;; Famously, the integral of 1/x diverges (very slowly) as x gets large
;; which makes it quite difficult to calculate for large intervals without using special tricks.

;; The integral from 1 to 1000000 of 1/x is log 1000000 - log 1

(- (Math/log 1000000) (Math/log 1)) ; 13.815510557964274

;; Let's try calculating that with our new integrate function:

(time
 (integrate (fn[x] (/ 1.0 x)) 1 1000000))

;; "Elapsed time: 293.219062 msecs"
;; 13.81551055800455

;; I think that's pretty good.

;; Unfortunately, we can still lock up the computer by asking the wrong question

(- (Math/log 100000000) (Math/log 1)) ; 18.420680743952367
;; but:
(time
 (integrate (fn[x] (/ 1.0 x)) 1 100000000)) ; Freezes REPL


;; Can you work out what's going on and what we could do about it?

Numerical Integration: Better Rules?

;; Numerical Integration: Better Rules?

;; So far we've found a couple of approximations to 'the area under a graph'.

(defn trapezium-rule [f a b]
  (* 1/2 (- b a) (+ (f a) (f b))))

(defn simpson-rule [f a b]
  (let [midpoint (+ a (/ (- b a) 2))]
    (* 1/6 (- b a) (+ (f a) (* 4 (f midpoint)) (f b)))))

;; And a way of repeatedly splitting intervals and applying the rules to the sub-intervals
;; to produce better and better approximations.
(defn iterated-rule [rule f a b N]
  (if (= N 0)
    (rule f a b)
    (let [midpoint (+ a (/ (- b a) 2))]
      (+ (iterated-rule rule f a midpoint (dec N))
         (iterated-rule rule f midpoint b (dec N))))))


;; For these two functions, which are nice and smooth, with derivatives that
;; don't get too large, these rules produce good approximations
(defn square[x] (* x x))
(defn sine [x] (Math/sin x))

;; For this one, which isn't smooth, the approximations converge slowly:
(defn step [x]  (if (< x 1/2) 0.0 1.0))

;; For this, which is smooth, but which blows up near 0, the approximations are bad
(defn inverse [x] (/ x))


;; One approach which is often taken is to use 'more accurate' rules.

;; The trapezium rule and Simpson's rule are members of a family called 'Newton-Cotes' formulas.
;; The more complicated a Newton-Cotes formula is, the more polynomials it can integrate exactly.

;; But in fact, for nice smooth functions, they tend to be more accurate the
;; more polynomials they can integrate exactly.

;; Here are two more examples, and their estimates for the integral of sine over
;; a half circle (which should be exactly 2):

(defn simpson38-rule [f a b]
  (let [midpoint1 (/ (+ a a b) 3)
        midpoint2 (/ (+ a b b) 3)]
    (* 1/8 (- b a) (+ (f a) (* 3 (f midpoint1)) (* 3 (f midpoint2)) (f b)))))

(simpson38-rule sine 0 Math/PI) ; 2.040524284763495

(defn booles-rule [f a b]
  (let [midpoint1 (/ (+ a a a b) 4)
        midpoint2 (/ (+ a a b b) 4)
        midpoint3 (/ (+ a b b b) 4)]
    (* 1/90 (- b a) (+ (* 7 (f a)) (* 32 (f midpoint1)) (* 12 (f midpoint2)) (* 32 (f midpoint3)) (* 7 (f b))))))

(booles-rule sine 0 Math/PI) ; 1.9985707318238355

;; We can use the same approach to getting better estimates by subdividing: For
;; sine, as well as getting better estimates to start with, these rules have
;; high rates of convergence. It doesn't take many subdivisions before we start
;; to run into the limits of floating point accuracy.

(iterated-rule booles-rule sine 0 Math/PI 0) ; 1.9985707318238355
(iterated-rule booles-rule sine 0 Math/PI 1) ; 1.9999831309459855
(iterated-rule booles-rule sine 0 Math/PI 2) ; 1.9999997524545716
(iterated-rule booles-rule sine 0 Math/PI 3) ; 1.9999999961908446
(iterated-rule booles-rule sine 0 Math/PI 4) ; 1.9999999999407074
(iterated-rule booles-rule sine 0 Math/PI 5) ; 1.999999999999074
(iterated-rule booles-rule sine 0 Math/PI 6) ; 1.9999999999999853
(iterated-rule booles-rule sine 0 Math/PI 7) ; 1.9999999999999993
(iterated-rule booles-rule sine 0 Math/PI 8) ; 1.9999999999999998
(iterated-rule booles-rule sine 0 Math/PI 9) ; 1.9999999999999998


(iterated-rule simpson38-rule sine 0 Math/PI 0) ; 2.040524284763495
(iterated-rule simpson38-rule sine 0 Math/PI 1) ; 2.002009846628558
(iterated-rule simpson38-rule sine 0 Math/PI 2) ; 2.000119386415226
(iterated-rule simpson38-rule sine 0 Math/PI 3) ; 2.000007370036249
(iterated-rule simpson38-rule sine 0 Math/PI 4) ; 2.000000459216732
(iterated-rule simpson38-rule sine 0 Math/PI 5) ; 2.0000000286790867
(iterated-rule simpson38-rule sine 0 Math/PI 6) ; 2.0000000017921
(iterated-rule simpson38-rule sine 0 Math/PI 7) ; 2.000000000112001
(iterated-rule simpson38-rule sine 0 Math/PI 8) ; 2.0000000000069997
(iterated-rule simpson38-rule sine 0 Math/PI 9) ; 2.000000000000438


;; For the step function, however, boole's rule isn't really that much better
;; than the trapezium rule

(iterated-rule booles-rule step 0 1 0) ; 0.5666666666666667
(iterated-rule booles-rule step 0 1 1) ; 0.5388888888888889
(iterated-rule booles-rule step 0 1 2) ; 0.5194444444444445
(iterated-rule booles-rule step 0 1 3) ; 0.5097222222222222
(iterated-rule booles-rule step 0 1 4) ; 0.5048611111111111
(iterated-rule booles-rule step 0 1 5) ; 0.5024305555555555
(iterated-rule booles-rule step 0 1 6) ; 0.5012152777777777
(iterated-rule booles-rule step 0 1 7) ; 0.5006076388888889
(iterated-rule booles-rule step 0 1 8) ; 0.5003038194444445
(iterated-rule booles-rule step 0 1 9) ; 0.5001519097222222

;; Both seem to need double the number of points to halve their error.

(iterated-rule trapezium-rule step 0 1 0) ; 0.5
(iterated-rule trapezium-rule step 0 1 1) ; 0.75
(iterated-rule trapezium-rule step 0 1 2) ; 0.625
(iterated-rule trapezium-rule step 0 1 3) ; 0.5625
(iterated-rule trapezium-rule step 0 1 4) ; 0.53125
(iterated-rule trapezium-rule step 0 1 5) ; 0.515625
(iterated-rule trapezium-rule step 0 1 6) ; 0.5078125
(iterated-rule trapezium-rule step 0 1 7) ; 0.50390625
(iterated-rule trapezium-rule step 0 1 8) ; 0.501953125
(iterated-rule trapezium-rule step 0 1 9) ; 0.5009765625

;; Performance is similarly bad for boole's rule on the inverse function

(iterated-rule booles-rule inverse 0.0001 1 0) ; 779.9400477089192
(iterated-rule booles-rule inverse 0.0001 1 1) ; 391.7824297523124
(iterated-rule booles-rule inverse 0.0001 1 2) ; 198.04899407170583
(iterated-rule booles-rule inverse 0.0001 1 3) ; 101.52648013049377
(iterated-rule booles-rule inverse 0.0001 1 4) ; 53.607079025964396
(iterated-rule booles-rule inverse 0.0001 1 5) ; 29.984599583713347
(iterated-rule booles-rule inverse 0.0001 1 6) ; 18.501553032592827
(iterated-rule booles-rule inverse 0.0001 1 7) ; 13.071085113384171
(iterated-rule booles-rule inverse 0.0001 1 8) ; 10.635953348374398
(iterated-rule booles-rule inverse 0.0001 1 9) ; 9.647557264415854


(iterated-rule trapezium-rule inverse 0.0001 1 0) ; 4999.99995
(iterated-rule trapezium-rule inverse 0.0001 1 1) ; 2500.9997750199977
(iterated-rule trapezium-rule inverse 0.0001 1 2) ; 1251.8327765203333
(iterated-rule trapezium-rule inverse 0.0001 1 3) ; 627.5916420975113
(iterated-rule trapezium-rule inverse 0.0001 1 4) ; 315.8156999790102
(iterated-rule trapezium-rule inverse 0.0001 1 5) ; 160.27209317742054
(iterated-rule trapezium-rule inverse 0.0001 1 6) ; 82.84288624590312
(iterated-rule trapezium-rule inverse 0.0001 1 7) ; 44.46707207824598
(iterated-rule trapezium-rule inverse 0.0001 1 8) ; 25.61044440290146
(iterated-rule trapezium-rule inverse 0.0001 1 9) ; 16.499072595032356


;; So although it seems these higher order Newton-Cotes formulae are much more
;; accurate and faster converging on well behaved functions, they don't seem to
;; help much in integrating anything tricky.

Wednesday, May 25, 2011

Numerical Integration: Harder Functions

;; Numerical Integration: Harder Functions

;; So far we've found a couple of approximations to 'the area under a graph'.

(defn trapezium-rule [f a b]
  (* 1/2 (- b a) (+ (f a) (f b))))

(defn simpson-rule [f a b]
  (let [midpoint (+ a (/ (- b a) 2))]
    (* 1/6 (- b a) (+ (f a) (* 4 (f midpoint)) (f b)))))

;; And a way of repeatedly splitting intervals and applying the rules to the sub-intervals
;; to produce better and better approximations.
(defn iterated-rule [rule f a b N]
  (if (= N 0)
    (rule f a b)
    (let [midpoint (+ a (/ (- b a) 2))]
      (+ (iterated-rule rule f a midpoint (dec N))
         (iterated-rule rule f midpoint b (dec N))))))

;; We know that Simpson's rule is exact for our original function

(defn square[x] (* x x))

(simpson-rule square 0 2) ; 8/3
(iterated-rule simpson-rule square 0 2 1) ; 8/3
(iterated-rule simpson-rule square 0 2 2) ; 8/3

;; How do we do with some other functions?

;; Another function where we can calculate the answer directly is sine
(defn sine [x] (Math/sin x))

;; The integral of sine over a half-turn (i.e. over the interval [0,pi]) is exactly 2.

;; here are the first few approximations with the trapezium rule:
(take 10 (map (partial iterated-rule trapezium-rule sine 0 Math/PI) (iterate inc 0)))
;; (1.9236706937217898E-16 1.5707963267948968 1.8961188979370398
;; 1.9742316019455508 1.993570343772339 1.9983933609701445 1.9995983886400377
;; 1.9998996001842024 1.9999749002350526 1.9999937250705755)

;; After an inauspicious start, the answer is clearly settling down to 2.

;; Simpson's rule also settles down to 2, but it's much quicker.
(take 10 (map (partial iterated-rule simpson-rule sine 0 Math/PI) (iterate inc 0)))
;; (2.094395102393196 2.0045597549844216 2.000269169948388 2.0000165910479364
;; 2.000001033369414 2.0000000645300027 2.000000004032258 2.000000000252003
;; 2.000000000015751 2.000000000000985)

;; So it looks like we're onto a winner so far.

;; Another type of function that doesn't do so well is:

(defn step [x]  (if (< x 1/2) 0 1))

;; It should be fairly obvious that the integral of this over [0,1] is 0.5, but
;; in fact the convergence of the methods is very slow compared to what we had
;; for sine or square.

(take 10 (map (partial iterated-rule trapezium-rule step 0. 1) (iterate inc 0)))
;; (0.5 0.75 0.625 0.5625 0.53125 0.515625 0.5078125 0.50390625 0.501953125 ...

(take 10 (map (partial iterated-rule simpson-rule step 0. 1) (iterate inc 0)))
;; (0.8333333333333336 0.5833333333333335 0.5416666666666667 0.5208333333333335
;; 0.5104166666666667 0.5052083333333335 0.5026041666666667 0.5013020833333335
;; 0.5006510416666667 0.5003255208333335 )

;; Notice how if we make the tiniest possible change to the function, which doesn't change the integral at all:
(defn evil-step [x]  (if (<= x 1/2) 0 1))

;; the approximations are all changed quite a lot, although they do seem to
;; still converge to the correct answer.

(take 10 (map (partial iterated-rule trapezium-rule evil-step 0. 1) (iterate inc 0)))
;; (0.5 0.25 0.375 0.4375 0.46875 0.484375 0.4921875 0.49609375 0.498046875 0.4990234375)
(take 10 (map (partial iterated-rule simpson-rule evil-step 0. 1) (iterate inc 0)))
;; (0.1666666666666667 0.4166666666666668 0.4583333333333335 0.4791666666666668
;; 0.4895833333333335 0.4947916666666668 0.4973958333333335 0.4986979166666668
;; 0.4993489583333335 0.4996744791666668)



;; What about if we're integrating a (moderately) badly behaved function?:

(defn inverse [x] (/ x))

;; the real answer, by devious mathematical trickery is (log a) - (log b)

;; So if we integrate over the region [0.0001, 1], where the values of 1/x are
;; sometimes very large, we should get:

(- (Math/log 1) (Math/log 0.0001)) ; 9.210340371976182

(take 10 (map (partial iterated-rule trapezium-rule inverse 0.0001 1) (iterate inc 0)))
;; (4999.99995 2500.9997750199977 1251.8327765203333 627.5916420975113
;; 315.8156999790102 160.27209317742054 82.84288624590312 44.46707207824598
;; 25.61044440290146 16.499072595032356)

(take 10 (map (partial iterated-rule simpson-rule inverse 0.0001 1) (iterate inc 0)))
;; (1667.9997166933313 835.4437770204453 419.5112639565708 211.89038593950997
;; 108.42422424355732 57.03315060206397 31.67513402236028 19.324901844453297
;; 13.461948659075995 10.812578055293251)

;; It looks like both the rules, after starting off with appalling first
;; guesses, are converging to something, and maybe even to the right answer, but
;; the convergence is very slow.  Remember that to get the tenth element of this
;; sequence we're splitting the interval up into 1024 pieces, and we haven't
;; even got the answer right to one significant figure.

;; Numerical Integration can give very misleading answers if it's used naively.
;; Quite often, the results of a careless numerical simulation will just 'look
;; wrong' to a trained eye, and in those cases it's usually the eye that's in
;; the right.

;; One way to be reassured that your answer is roughly correct is to alter
;; expected degree of accuracy of the method, and to check that the answer
;; doesn't change by much!

Numerical Integration: Better Approximations

;; Numerical Integration: Better Approximations

;; Remember our example function:

(defn square[x] (* x x))

;; We're trying to find ways to calculate the 'area under its graph between 0
;; and 2', or the 'integral over the interval [0,2]'

;; A better estimate of this area under the graph is to imagine a trapezium
;; which has its base corners at (0,0) and (2,0), and the top corners at
;; (0,(square 0)) (2, (square 2)), and calculate the area of that.

;; More generally, if the interval we're interested is [a,b], and the function's
;; values there are fa and fb, then the area of the trapezium will just be:

(defn trapezium [a fa b fb]
  (* 1/2 (- b a) (+ fa fb)))

;; Another way to think about that is that it's the length of the interval
;; multiplied by the average of the values of the function at the ends.

;; So another approximation to the integral of f over [a,b] is:

(defn trapezium-rule [f a b]
  (trapezium a (f a) b (f b)))

(trapezium-rule square 0 2) ; 4

;; We can make another approximation by using the trapezium rule on the two
;; subintervals [0,1] and [1,2] and adding the results

(+ (trapezium-rule square 0 1)
   (trapezium-rule square 1 2)) ; 3

;; And an even better one by splitting those subintervals in half

(+ (trapezium-rule square 0 1/2)
   (trapezium-rule square 1/2 1)
   (trapezium-rule square 1 3/2)
   (trapezium-rule square 3/2 2)) ; 11/4

;; And so on...
(defn iterated-rule [rule f a b N]
  (if (= N 0)
    (rule f a b)
    (let [midpoint (+ a (/ (- b a) 2))]
      (+ (iterated-rule rule f a midpoint (dec N))
         (iterated-rule rule f midpoint b (dec N))))))

;; This converges fairly nicely:
(map (partial iterated-rule trapezium-rule square 0 2) (range 10))
;; (4 3 11/4 43/16 171/64 683/256 2731/1024 10923/4096 43691/16384 174763/65536)

(map (partial - 8/3)
     (map (partial iterated-rule trapezium-rule square 0 2) (range 10)))
;; (-4/3 -1/3 -1/12 -1/48 -1/192 -1/768 -1/3072 -1/12288 -1/49152 -1/196608)

;; We now only need a thousand samples of the function to get the answer
;; accurate to one part in 100000.

;; But an even nicer approximation is Simpson's rule, which involves fitting a
;; parabola to the two end points and the midpoint, and calculating the area
;; under that.

;; That's equivalent to taking a weighted sum of the values of f at the
;; beginning, midpoint, and end of the interval, with weights 1:4:1

(defn simpson-rule [f a b]
  (let [midpoint (+ a (/ (- b a) 2))]
    (* 1/6 (- b a) (+ (f a) (* 4 (f midpoint)) (f b)))))

;; For the square function, which is itself a parabola, this rule actually
;; calculates the area exactly!
(simpson-rule square 0 2) ; 8/3

;; That about wraps it up for numerical approximations to the integrals of
;; quadratics, but they are easy to calculate exactly anyway.

Learning Clojure

About Me

Blog Archive

Search This Blog

Thursday, September 22, 2011

Clojure Setup Tutorial with EMACS: clojure, clojure.contrib, swank, slime, maven and maven-clojure-plugin in a couple of seconds

Sunday, May 29, 2011

Numerical Integration: Better Refinements?

Thursday, May 26, 2011

Numerical Integration: Better Refinements?

Numerical Integration: Better Rules?

Wednesday, May 25, 2011

Numerical Integration: Harder Functions

Numerical Integration: Better Approximations

Followers