;; Rerum Cognoscere Causas ;; A question of epistemology, with reference to the detection of causality, incorporating a slightly smug trick for generating the same random things over and over again. ;; In Al'no'oth, people are rated by their strength (STR) and intelligence (INT) ;; Both are easily determined with simple tests. For instance, a ;; person of STR 11 will win 45% of arm-wrestles against a person of ;; STR 12. ;; The philosophers of Al'no'oth have long suspected that the Gods ;; determine STR and INT by rolling six-sided dice: (def ^:dynamic *randomizer* (java.util.Random. 0)) (defn rand-int [n] (.nextInt *randomizer* n)) (defn D6 [] (inc (rand-int 6))) ;; In fact, the traditional belief is that the Gods use three dice: (defn three-D6 [] (reduce + (for [i (range 3)] (D6)))) ;; But although all unenhanced humans have scores between 3 and 18 in ;; both characteristics, affirming the suspicions of the philosophers, ;; the higher scores are slightly more common than they should be were ;; 3 dice simply added. ;; Another method the Gods might use would be to roll four dice, and ;; to discard the lowest of them: (defn three-from-four-D6 [] (reduce + (drop 1 (sort (for [i (range 4)] (D6)))))) ;; At this point, the philosophers disagree. ;; Those of the first e'dition maintain that all scores are generated using 3 dice. (defn first-edition [] {:str (three-D6) :int (three-D6)}) ;; Those of the second e'dition maintain that, occasionally, by caprice, the Gods use the modified procedure. (defn mixed [] (if (zero? (rand-int 10)) (three-from-four-D6) (three-D6))) (defn second-edition [] {:str (mixed) :int (mixed)}) ;; Those of the third e'dition believe that one person in ten is a ;; "player-character". These characters are awe'some. ;; Awe'someness is not directly detectable, but a player-character has ;; all his characteristics generated using the modified procedure. (defn third-edition [] (if (zero? (rand-int 10)) {:str (three-from-four-D6) :int (three-from-four-D6)} {:str (three-D6) :int (three-D6)})) ;; With huge effort, and a very large number of arm-wrestles and ;; riddling contests, a great sage has tabulated the characteristics ;; of an entire village: (def village (binding [*randomizer* (java.util.Random. 0)] (doall (repeatedly 100 (case (rand-int 3) 0 first-edition 1 second-edition 2 third-edition))))) village ;-> ({:str 13, :int 18} {:str 11, :int 18} {:str 14, :int 15} {:str 6, :int 12} {:str 14, :int 13} {:str 18, :int 10} {:str 15, :int 11} {:str 12, :int 15} {:str 7, :int 8} {:str 16, :int 12} {:str 8, :int 7} {:str 9, :int 14} {:str 10, :int 9} {:str 11, :int 10} {:str 5, :int 10} {:str 7, :int 9} {:str 9, :int 13} {:str 12, :int 9} {:str 13, :int 9} {:str 5, :int 9} {:str 8, :int 13} {:str 9, :int 11} {:str 13, :int 14} {:str 12, :int 14} {:str 12, :int 17} {:str 14, :int 9} {:str 10, :int 11} {:str 18, :int 17} {:str 11, :int 9} {:str 8, :int 9} {:str 15, :int 13} {:str 8, :int 5} {:str 11, :int 9} {:str 10, :int 8} {:str 9, :int 12} {:str 5, :int 11} {:str 10, :int 7} {:str 9, :int 14} {:str 11, :int 9} {:str 11, :int 12} {:str 12, :int 13} {:str 15, :int 9} {:str 12, :int 12} {:str 6, :int 13} {:str 5, :int 4} {:str 12, :int 13} {:str 15, :int 10} {:str 14, :int 14} {:str 11, :int 4} {:str 12, :int 9} {:str 10, :int 12} {:str 7, :int 12} {:str 8, :int 11} {:str 10, :int 10} {:str 9, :int 8} {:str 8, :int 12} {:str 7, :int 9} {:str 13, :int 3} {:str 14, :int 9} {:str 8, :int 9} {:str 10, :int 11} {:str 15, :int 4} {:str 10, :int 11} {:str 8, :int 10} {:str 15, :int 10} {:str 8, :int 13} {:str 12, :int 5} {:str 8, :int 16} {:str 4, :int 8} {:str 10, :int 18} {:str 12, :int 12} {:str 11, :int 10} {:str 12, :int 8} {:str 12, :int 13} {:str 8, :int 12} {:str 9, :int 12} {:str 12, :int 10} {:str 15, :int 10} {:str 8, :int 11} {:str 7, :int 11} {:str 4, :int 8} {:str 12, :int 11} {:str 13, :int 9} {:str 14, :int 13} {:str 5, :int 9} {:str 17, :int 10} {:str 8, :int 13} {:str 9, :int 10} {:str 5, :int 14} {:str 15, :int 12} {:str 13, :int 13} {:str 11, :int 8} {:str 8, :int 6} {:str 12, :int 8} {:str 10, :int 3} {:str 14, :int 9} {:str 15, :int 12} {:str 15, :int 14} {:str 6, :int 10} {:str 16, :int 13}) ;; How should the philosophers reason amongst themselves, should their object be to determine the System of their World?
Search This Blog
Wednesday, February 27, 2013
Rerum Cognoscere Causas
I don't actually know the answer to this question, or even whether it can be answered. Yet.
Monday, February 25, 2013
Clojure's Reader is Unsafe
;; Clojure's Reader is Unsafe ;; dependencies: ;; [org.clojure/clojure "1.4.0"] ;; [org.clojure/tools.reader "0.7.0"] ;; By a weird coincidence this week I wanted to read a file of data ;; coming from a web app, and I was about to use Clojure's reader to ;; do it. I remembered that there was some variable that I needed to ;; bind to stop it executing arbitrary code, so I went googling. ;; And found that there's been a fair bit of debate on just this topic recently. ;; Essentially, Clojure's reader is the thing that turns strings into ;; data structures when you're reading in a program or typing at the ;; REPL. And you have it available programmatically. (read-string "(+ 1 2)") ; (+ 1 2) ;; Not so impressive, you say? ;; Consider: (first "(+ 1 2)") ; \( (second "(+ 1 2)") ; \+ (count "(+ 1 2)") ; 7 (type "(+ 1 2)") ; java.lang.String (type (first "(+ 1 2)")) ; java.lang.Character ;; "(+ 1 2)" is a String, a flat data structure of seven Characters (first (read-string "(+ 1 2)")) ; + (second (read-string "(+ 1 2)")) ; 1 (count (read-string "(+ 1 2)")) ; 3 (type (read-string "(+ 1 2)")) ; clojure.lang.PersistentList (type (first (read-string "(+ 1 2)"))) ; clojure.lang.Symbol ;; Clearly some sort of sea-change has occurred. ;; Even better, the value of a string is a similar string (eval "(+ 1 2)") ; "(+ 1 2)" (identical? (eval "(+ 1 2)") "(+ 1 2)") ; true ;; But the thing that read-string makes is a program which can run (eval (read-string "(+ 1 2)")) ; 3 ;; Another way to make such a program is: (eval (list (quote +) (quote 1) (quote 2))) ; 3 ;; In my innocence, I was going to use this magic to read some data out of a file that had been produced ;; by a web server. (def file "{:username leethaxor :score 5 :alignment :chaotic-evil}") (:username (read-string file)) ;-> leethaxor ;; IT TURNS OUT THAT YOU SHOULD NEVER DO THIS, EVEN THOUGH THE ABILITY ;; TO DO THIS IS ONE OF THE THINGS THAT IS NICE ABOUT LISPS. ;; There are various reasons why: ;; Firstly, note that programs running on a computer can affect that computer ;; If you happen to have a file called precious.txt in the directory ;; where clojure is running, then you are strongly advised not to ;; execute any of the following code. Especially if it contains your ;; first novel and you haven't got round to backing up yet this year. (clojure.java.shell/sh "touch" "precious.txt") ;-> {:exit 0, :out "", :err ""} (clojure.java.shell/sh "cat" "precious.txt") ;-> {:exit 0, :out "", :err ""} ;; Secondly, note that there is special syntax to cause things to happen while ;; reading: (read-string " #=(clojure.java.shell/sh \"rm\" \"precious.txt\") ") ;-> {:exit 0, :out "", :err ""} ;; That did not get turned into a data structure. That got executed. precious.txt is gone. (clojure.java.shell/sh "cat" "precious.txt") ;-> {:exit 1, :out "", :err "cat: precious.txt: No such file or directory\n"} ;; Luckily we can rebuild it: (clojure.java.shell/sh "touch" "precious.txt") ;-> {:exit 0, :out "", :err ""} (clojure.java.shell/sh "cat" "precious.txt") ;-> {:exit 0, :out "", :err ""} ;; And thirdly note that not everyone on the internet is on your side: (def file-of-evil " {:username #=(clojure.java.shell/sh \"rm\" \"precious.txt\") :score 5 :alignment :chaotic-evil}") (read-string file-of-evil) ;-> {:username {:exit 0, :out "", :err ""}, :score 5, :alignment :chaotic-evil} ;; This is a thing that you're supposed to guard against when reading data structures. (clojure.java.shell/sh "touch" "precious.txt") ;-> {:exit 0, :out "", :err ""} ;; It is traditional to do that like this: (binding [*read-eval* false] (read-string "(+ 1 2)")) ;-> (+ 1 2) (binding [*read-eval* false] (read-string file-of-evil)) ; -> RuntimeException EvalReader not allowed when *read-eval* is false. clojure.lang.Util.runtimeException (Util.java:170) ;; And in my innocence, I was about to actually do this, which would have been a mistake: (def string-of-evil "#java.io.FileWriter[\"precious.txt\"]") (spit "precious.txt" "precious content") ;-> nil (slurp "precious.txt") ;-> "precious content" (binding [*read-eval* false] (clojure.core/read-string string-of-evil)) ;-> #<FileWriter java.io.FileWriter@17b2712> (slurp "precious.txt") ;-> "" ;; bugger ;; It turns out that even with *read-eval* bound to false, the Clojure ;; reader can be persuaded to execute arbitrary java constructors, and ;; some of those have side-effects. ;; I believe that that particular loophole is going to be fixed in the ;; next version of clojure, but the general thinking seems to be that ;; the reader is just intrinsically an unsafe thing and that you ;; shouldn't use it to read strings which you aren't sure are friendly. ;; Luckily, there is a library function that does what I thought the ;; reader would do, and that is apparently Absolutely Guaranteed by ;; Act of Parliament not to do bad things: (require 'clojure.tools.reader.edn) (spit "precious.txt" "precious content") ;-> nil (slurp "precious.txt") ;-> "precious content" (clojure.tools.reader.edn/read-string string-of-evil) ;; -> ExceptionInfo No reader function for tag java.io.FileWriter clojure.core/ex-info (core.clj:4227) (clojure.tools.reader.edn/read-string file-of-evil) ;; -> ExceptionInfo No reader function for tag = clojure.core/ex-info (core.clj:4227) ;; But which is still usable for the purposes of good: (clojure.tools.reader.edn/read-string file) ;-> {:username leethaxor, :score 5, :alignment :chaotic-evil} (clojure.tools.reader.edn/read-string "(+ 1 2)") ;-> (+ 1 2) (eval (clojure.tools.reader.edn/read-string "(+ 1 2)")) ; 3 (slurp "precious.txt") ;-> "precious content" ;; So that's Super News! ;; We've got a function, read-string, which can turn strings into data structures while doing no harm whatsoever, ;; and another function, read-string, which can turn strings into data structures while potentially causing arbitrary side effects depending on the content of the strings. ;; Clearly nothing could possibly go wrong with this arrangement, ;; especially since read-string will be fixed in clojure 1.5 so that ;; binding *read-eval* to false around it will make it safe, as far as ;; anyone knows, even though the advice is still never to use it to ;; read untrusted strings, for some reason.
Wednesday, February 20, 2013
Linear Congruential Random Number Generators
;; Linear Congruential Random Number Generators ;; I was wondering how pseudo-random numbers are actually generated. ;; Apparently it's still done as I remember it done on the ZX80, using multiply, add and chop. ;; There was a nice article on Rosetta Code : ;; http://rosettacode.org/wiki/Linear_congruential_generator ;; But it didn't have a clojure example, so I made an account and added one: ;; This code took me a couple of minutes to write, and replicates a couple of classic generators (defn iterator [a b c] (fn[x] (mod (+ (* a x) b) (bit-shift-left 1 c)))) (def bsd (drop 1 (iterate (iterator 1103515245 12345 31) 0))) (def ms (drop 1 (for [x (iterate (iterator 214013 2531011 31) 0)] (bit-shift-right x 16)))) (take 10 bsd) ;-> (12345 1406932606 654583775 1449466924 229283573 1109335178 1051550459 1293799192 794471793 551188310) (take 10 ms) ;-> (38 7719 21238 2437 8855 11797 8365 32285 10450 30612) ;; What's really nice is that the natural expression of this idea in ;; Clojure is (I think) one of the neatest pieces of code on there. ;; Java uses a similar generator, but it takes some proving since it won't let you see the internal state directly: (defn java [seed] (drop 1 (iterate (iterator 0x5DEECE66D 0xB 48) seed ))) (def java-random-ints (map #(unchecked-int (bit-shift-right (long %) 16)) (java 0x5DEECE66DN))) (take 10 java-random-ints) ;-> (-1155484576 -723955400 1033096058 -1690734402 -1557280266 1327362106 -1930858313 502539523 -1728529858 -938301587) ;; We can also get them direct from java: (def java-util-random (let [r (java.util.Random. 0)] (for [i (range)] (.nextInt r)))) (take 10 java-util-random) ;-> (-1155484576 -723955400 1033096058 -1690734402 -1557280266 1327362106 -1930858313 502539523 -1728529858 -938301587) ;; And compare a few: (drop-while zero? (take 100000 (map - java-util-random java-random-ints))) ;-> () ;; Incidentally while messing around here I discovered how to call a protected method of a java class from clojure: (.nextInt (java.util.Random. 0) ) ;-> -1155484576 (.next (java.util.Random. 0) 32) ; java.lang.IllegalArgumentException: No matching method found: next for class java.util.Random ;; It's lying. ;; But I know it's in there somewhere: (defn myrandom [seed] (proxy [java.util.Random][seed] (next [a] (proxy-super next a)))) (def r (myrandom 0)) ;-> #'user/r (.nextInt (myrandom 0)) ;-> -1155484576 (.next (myrandom 0) 32) ;-> -1155484576 ;; Stupid language.
Packing and Unpacking Numbers : Deterministic Randomness
;; Packing and Unpacking Numbers ;; This post is kind of pointless, since while writing it I discovered a better way to do the same thing. ;; But I also think it's kind of cool, so I'm putting it up in case anyone else finds a use for it. ;; Suppose we've got a function which gives random results (I mean, deliberately...) (defn randomfunction [] (list (case (rand-int 2) 0 '+ 1 '-) (rand-int 100) (list (case (rand-int 2) 0 '+ 1 '-) (rand-int 100) (rand-int 100)))) (randomfunction) ;-> (+ 43 (- 71 75)) (randomfunction) ;-> (- 18 (- 13 22)) (randomfunction) ;-> (- 36 (+ 60 3)) (randomfunction) ;-> (+ 56 (+ 34 4)) (randomfunction) ;-> (- 87 (- 61 76)) ;; And it comes to testing it: (= (randomfunction) '(+ 26 (- 69 0))) ;-> false ;; We'd prefer it to be deterministic. ;; So we can separate the concerns thus: (defn deterministicfunction [a b c d e] (list (case a 0 '+ 1 '-) b (list (case c 0 '+ 1 '-) d e))) (defn randomfunction [] (deterministicfunction (rand-int 2) (rand-int 100) (rand-int 2) (rand-int 100) (rand-int 100))) ;; tadaa! (= (deterministicfunction 0 26 1 69 0) '(+ 26 (- 69 0))) ;-> true (randomfunction) ;-> (+ 30 (+ 52 95)) (randomfunction) ;-> (- 97 (- 57 61)) (randomfunction) ;-> (+ 88 (+ 26 15)) ;; Although useful, this is a right royal pain in the neck, and horribly error-prone ;; It occurred to me that it might be made less painful by setting the ;; seed of the random number generator before running tests, but I ;; also had a need to make 'deterministic random' values in the code ;; itself, and having everything hanging off a global variable that ;; any old function occasionally resets struck me as a recipe for ;; disaster. ;; So I thought, well why not pack all the random parameters into one big number: (def << bit-shift-left) ;; If all the random ranges are powers of two then this is easy: (let [[a b c d] [(rand-int 8) (rand-int 4) (rand-int 8) (rand-int 2)]] [[a b c d] (+ (<< a (+ 1 3 2)) (<< b (+ 1 3)) (<< c 1) (<< d 0))]) ; [[2 2 4 0] 168] ; [[3 1 4 0] 216] ; [[5 3 6 0] 380] ;; Looking at the last one ;; 5/3/6/0 is ;; 101/11/110/0 in binary ;; and binary 101111100 is: 2r101111100 ; 380 ;; in decimal ;; To extract the numbers, we can do: (mod 380 2) ; 0 (/ (- 380 (mod 380 2)) 2) ; 190 (mod 190 8) ; 6 (/ (- 190 (mod 190 8)) 8) ; 23 (mod 23 4) ; 3 (/ (- 23 (mod 23 4)) 4) ; 5 (mod 5 8) ; 5 ;; And this motivates the following recursion: (defn ^:dynamic unpack [rl n] (if (empty? rl) '() (let [[rh & rr] rl] (cons (mod n rh) (unpack rr (/ (- n (mod n rh)) rh)))))) ;; Which behaves as I was hoping (unpack '() 0) ;-> () (unpack '() 1) ;-> () (unpack '(8) 1) ;-> (1) (unpack '(8) 9) ;-> (1) (unpack '(8 8) 9) ;-> (1 1) (unpack '(2 8 4 8) 380) ;-> (0 6 3 5) (unpack '(2 8 4 8) 216) ;-> (0 4 1 3) (unpack '(2 8 4 8) 168) ;-> (0 4 2 2) ;; And that then motivates the following packing function (defn ^:dynamic pack [rlist vlist] (if (empty? rlist) 0 (let [[rh & rr] rlist [vh & vr] vlist] (+ (* rh (pack rr vr)) vh)))) ;; Which also works (pack '()()) ;-> 0 (pack '(8) '(1)) ;-> 1 (pack '(8 8) '(1 1)) ;-> 9 (pack '(2 8 4 8) '(0 6 3 5)) ;-> 380 (pack '(2 8 4 8) '(0 4 1 3)) ;-> 216 (pack '(2 8 4 8) '(0 4 2 2)) ;-> 168 ;; But the cool thing is that it works whether or not the ranges are powers of two: (use 'clojure.tools.trace) (let [a (for [i (range (inc (rand-int 7)))] (rand-int 30)) b (for [i a] (rand-int i))] (clojure.tools.trace/dotrace [unpack pack] (unpack a (pack a b)))) ;; TRACE t3191: (pack (12 23 5 5 8 3) (4 9 3 4 0 0)) ;; TRACE t3192: | (pack (23 5 5 8 3) (9 3 4 0 0)) ;; TRACE t3193: | | (pack (5 5 8 3) (3 4 0 0)) ;; TRACE t3194: | | | (pack (5 8 3) (4 0 0)) ;; TRACE t3195: | | | | (pack (8 3) (0 0)) ;; TRACE t3196: | | | | | (pack (3) (0)) ;; TRACE t3197: | | | | | | (pack nil nil) ;; TRACE t3197: | | | | | | => 0 ;; TRACE t3196: | | | | | => 0 ;; TRACE t3195: | | | | => 0 ;; TRACE t3194: | | | => 4 ;; TRACE t3193: | | => 23 ;; TRACE t3192: | => 538 ;; TRACE t3191: => 6460 ;; TRACE t3198: (unpack (12 23 5 5 8 3) 6460) ;; TRACE t3199: | (unpack (23 5 5 8 3) 538) ;; TRACE t3200: | | (unpack (5 5 8 3) 23) ;; TRACE t3201: | | | (unpack (5 8 3) 4) ;; TRACE t3202: | | | | (unpack (8 3) 0) ;; TRACE t3203: | | | | | (unpack (3) 0) ;; TRACE t3204: | | | | | | (unpack nil 0) ;; TRACE t3204: | | | | | | => () ;; TRACE t3203: | | | | | => (0) ;; TRACE t3202: | | | | => (0 0) ;; TRACE t3201: | | | => (4 0 0) ;; TRACE t3200: | | => (3 4 0 0) ;; TRACE t3199: | => (9 3 4 0 0) ;; TRACE t3198: => (4 9 3 4 0 0) ;; (4 9 3 4 0 0) ;; Which I found slightly smug-making ;; So now we can say: (defn deterministicfunction [big-rand] (let [[a b c d e] (unpack [2 100 1 100 100] big-rand)] (list (case a 0 '+ 1 '-) b (list (case c 0 '+ 1 '-) d e)))) (defn randomfunction [] (deterministicfunction (rand-int (* 2 100 2 100 100)))) ;; Behold the many benefits of this scheme: ;; We can enumerate all possible varieties (deterministicfunction 1) ;-> (- 0 (+ 0 0)) (deterministicfunction 2) ;-> (+ 1 (+ 0 0)) (deterministicfunction 3) ;-> (- 1 (+ 0 0)) ;; We can generate examples using a single number (deterministicfunction 10345778979) ;-> (- 89 (+ 94 88)) ;; Or we can get random examples (randomfunction) ;-> (+ 99 (+ 72 8)) ;; If we are not desperately fussed about the statistical properties of things ;; we do not need to worry about the exact range of the random number at all: ;; I think that this will produce only tiny biases: (defn randomfunction [] (deterministicfunction (rand-int (<< 1 30)))) ;; And it still looks pretty random to me (randomfunction) ;-> (+ 76 (+ 10 87)) (randomfunction) ;-> (+ 76 (+ 81 3)) (randomfunction) ;-> (- 27 (+ 18 63))
Tuesday, February 19, 2013
Clojure Emacs Eval Paste Keyboard Macros / Generating your Regression Tests Automatically while Writing your Functions
;; Clojure Emacs Eval Paste Keyboard Macros / ;; Generating your Regression Tests Automatically while Writing your Functions ;; There's a phenomenally useful feature of emacs when using clojure ;; and nrepl where you can evaluate the result of an ;; expression and paste the result directly into a buffer. ;; However my attempts to use it in keyboard macros have always led to frustration. ;; I asked about it on stack overflow: ;; http://stackoverflow.com/questions/14959155/how-can-i-make-emacs-keyboard-macros-work-properly-when-pasting-the-results-of-c#14960321 ;; and a nice chap called sds helped me come up with the following work-around: ;; Say I've defined the function: (defn << [a b] (if (zero? b) a (<< (* 2 a) (dec b)))) ;; And now I want to check that it works: ;; I evaluate the following two forms in emacs lisp (in the *scratch* buffer) (defun clojure-eval-paste-test () (interactive) (next-line) (move-beginning-of-line nil) (insert "(is= ") (forward-sexp) (insert " ") (nrepl-eval-last-expression 't) (sleep-for 0.1) (insert ")")) (global-set-key [f5] #'clojure-eval-paste-test) ;; Now I type: (<< 1 0) (<< 1 1) (<< 1 2) (<< 2 0) (<< 2 2) (<< 2 3) ;; And then I move point to just above the first line, and press f5 six times, and I get: (is= (<< 1 0) 1) (is= (<< 1 1) 2) (is= (<< 1 2) 4) (is= (<< 2 0) 2) (is= (<< 2 2) 8) (is= (<< 2 3) 16) ;; Which both reassures me that it does work, and provides the skeleton of a regression test for it! ;; There are more elegant ways of doing this, presumably, one being to ;; do as sds suggests, and code up the function using lower level ;; calls rather than the (sleep-for 0.1) thing that will obviously go ;; wrong if the eval takes too long. ;; And another way would be to fix it so that nrepl-eval-last-expression actually works in keyboard macros. ;; But this will do for now, and gets rid of a papercut bug that's been annoying me for ages.
Monday, February 18, 2013
Mathematics as an Experimental Science
;; Mathematics as an experimental science ;; Beta-Bernoulli distribution / Machine Learning ;; I'm reading Kevin Murphy's Machine Learning: A Probabilistic Perspective. ;; As one of the exercises in chapter three, I've just proved that: ;; "For the Beta-Bernoulli model, the probability of the data is the ;; ratio of the normalizing constants of the prior distribution and ;; the posterior distribution". ;; What I think this actually means is: ;; If you start with Coins which are biased, and the bias is chosen ;; uniformly (so that for instance a fair coin is just as likely as a ;; coin which comes up heads 1/3 of the time, or a coin which always ;; comes up heads). ;; And you pick out the fair ones by tossing them all 20 times, and ;; throw away all the ones that don't come up heads exactly 10 times ;; and tails 10 times. ;; Then, because your test has not quite guaranteed you fair coins, ;; but only coins which are quite a bit fairer than you started with: ;; You are ever so slightly more likely to see three heads (or three ;; tails) in three tosses than you would be if the coins were truly ;; fair. ;; In fact the bias is now the same as if the coins had been chosen ;; from the Beta(11,11) distribution. ;; And so the chances of getting three heads in a row is not, as you ;; might have naively expected, one in eight, ;; But rather B(14,11)/B(11,11), where B is the Beta function ;; And that works out to be G(14)G(11)G(22)/G(25)G(11)G(11), where G is the Gamma function. ;; And that is 13!10!21!/24!10!10!, where ! is the factorial function ;; And that is (11*12*13)/(22*23*24), where / and * need no introduction if you have got this far in the post. ;; And that is: (/ (* 11 12 13) 22 23 24) ;-> 13/92 ;; or (float (/ 13 92)) ; 0.14130434 ;; Which is to say, slightly more than one in eight. ;; And I am inordinately pleased with myself, not for having proved ;; this result, which was easy, but for having worked out what the ;; mysterious squiggles in the book might actually mean in practice. ;; And it occurs to me that Mathematics is in fact an experimental ;; science, which makes definite predictions about physical things, ;; such as the movements of electrons in my computer when I type the ;; following things: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Let us make coins with biases distributed uniformly between 0 and 1: (defn make-coin [] (let [theta (rand)] (fn [] (if (< (rand) theta) :h :t)))) ;; Here is one: (def a (make-coin)) ;; Let us test it (frequencies (for [i (range 20)] (a))) ;-> {:h 11, :t 9} ;; It is no good. To the scrapheap with it: (ns-unmap *ns* 'a) ;; Let us instead make a large number of coins: (def coins (for [i (range)] (make-coin))) ;; And throw away all the ones that do not satisfy our criterion (def fair-coins (filter (fn[coin] (= {:t 10 :h 10 } (frequencies (for [i (range 20)] (coin))))) coins)) ;; Let us toss each of our fair coins three times: (def results (for [coin fair-coins] (frequencies (for [i (range 3)] (coin))))) ;; And add up all the results (def collected-results (drop 1 (reductions (fn[m f] (assoc m f (inc (get m f 0)))) {} results))) ;; And calculate the empirical distribution of those results: (def empirical-distribution (map (fn[m] [(float (/ (m {:h 3} 0) (reduce + (vals m)))) (float (/ (m {:t 3} 0) (reduce + (vals m))))]) collected-results)) ;; It does seem to me that after a while, these numbers settle down to something near 14% (doseq [e empirical-distribution] (println e))
Thursday, February 14, 2013
A Naive Bayesian Classifier
Anyway this very evening Hermann Hauser of this parish is telling me that Machine Learning is the coming next big thing and those skills are highly in demand.
So if anybody wants to give me a fuckload of money for this old rope, they must feel free
So if anybody wants to give me a fuckload of money for this old rope, they must feel free
;; A Naive Bayesian Classifier ;; Ed Jackson ( http://boss-level.com ) and I are currently working ;; our way through Kevin Murphy's book: ;; Machine Learning: A Probabilistic Perspective. ;; Ed tells me that there is big money in this Machine Learning game. ;; We have just read chapter 3, which involves a hair-raising amount ;; of Greek, but when all is said and done boils down to this: ;; We got Martians, who tend to be thin, tall and green (defn make-martian [] #{:martian (if (< (rand) 0.2) :fat :thin) (if (< (rand) 0.7) :tall :short) (if (< (rand) 0.8) :green :blue)}) ; And we got Venusians, who tend to be fat and blue (defn make-venusian [] #{:venusian (if (< (rand) 0.8) :fat :thin) (if (< (rand) 0.5) :tall :short) (if (< (rand) 0.3) :green :blue)}) ; They visit Harry's (defn make-space-bar-patron [] (if (< (rand) 0.3) (make-martian) (make-venusian))) ; Harry takes notes (def client-database (for [i (range 10000)] (make-space-bar-patron))) ; Like this: (take 5 client-database) (#{:tall :martian :blue :thin} #{:blue :venusian :short :fat} #{:blue :venusian :short :thin} #{:tall :martian :green :fat} #{:martian :blue :short :thin}) ; And tabulates them so: (def martians (filter :martian client-database)) (def venusians (filter :venusian client-database)) (count martians) ; 3054 (count (filter :thin martians)) ; 2454 (count (filter :green martians)) ; 2438 (count (filter :tall martians)) ; 2177 (count venusians) ; 6946 (count (filter :fat venusians)) ; 5628 (count (filter :blue venusians)) ; 4919 (count (filter :tall venusians)) ; 3535 ; A guy walks into Harry's (def new-guy (make-space-bar-patron)) ; But he is not wearing his uniform (reduce disj new-guy [:martian :venusian]) ; #{:blue :short :fat} ; And so a sweepstake comes into being: (defn probability [characteristic given-class prior-for prior-against] (let [class (if given-class (filter given-class client-database) client-database) for (count (filter characteristic class)) against (- (count class) for) total-for (+ prior-for for) total-against (+ prior-against against)] (/ total-for (+ total-for total-against)))) (def m (* (probability :martian nil 1 1) ; most people come in here are venusians (probability :blue :martian 1 1) ; and martians tend to be green (probability :short :martian 1 1) ; also he's short. Martians are tall, often (probability :fat :martian 1 1))) ; and fat. All the fat guys is from Venusburg (def v (* (probability :venusian nil 1 1) ; most likely he's from venus (probability :blue :venusian 1 1) ; the blueness bears this out (probability :short :venusian 1 1) ; sometimes they are short, sometimes they are tall (probability :fat :venusian 1 1))) ; but they are mostly fat (float (/ m (+ m v))) ; 0.017495114 (float (/ v (+ m v))) ; 0.9825049 ;; 45:1 he's from Venus ! new-guy ; #{:blue :short :venusian :fat} ;; As indeed he is.
Subscribe to:
Posts (Atom)
