Search This Blog

Friday, December 3, 2021

A Festive Theorem

#!/usr/bin/env clojure

;; A Festive Theorem

;; About a week ago, a friend of mine who is not noted for his interest in pure mathematics, and
;; whose identity I will conceal behind the alias "Sipper" caught me in the Wetherspoons and showed
;; me a number theory proof (or at least most of one, we couldn't actually finish the argument off)

;; I've never been interested in number theory. It seems both useless and boring, but the central
;; idea of this proof was intriguing, so today I sat down, again in Wetherspoons, with quite a lot
;; of paper and coffee and an all-day brunch (all for £7.30, I love Spoons, if you want to advertise
;; on this blog just get in touch), and hammered the damned thing out until I was happy with it.

;; The proof's really visual and beautiful, but not constructive, but it suggested an algorithm to
;; me, so I'm going to try to get that algorithm working, and then show why it works by visualizing
;; what it's doing.

;; I have no idea whether anyone will find this interesting, but since it's the only thing in all of
;; number theory that's ever struck me as worth knowing, and Sips found it interesting too, I'm going
;; to write it up.

;; In this first bit, a statement of the Theorem, which itself took me a while to get my head round.

;; make the repl stop after the first 20 elements of a sequence
(set! *print-length* 20)

;; If you square an even number, then it will be divisible by 4
;; But if you square an odd number, then it will leave a residue of 1 mod 4

(defn square [x] (* x x))
(def mod4 #(rem % 4))

(map mod4 (map square (range)))  ;; (0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ...)

;; If that's not obvious, think about it for a few minutes and see whether you can make it so....

;; This means that if you take an even and an odd number, and square them, and add the squares, then
;; the residue mod 4 will always be 1

(mod4 (+ (square 2)  (square 5)))     ;; 1    ( 2*2 + 5*5 = 29 = 28 + 1 = 4 * 7 + 1)
(mod4 (+ (square 10) (square 17)))    ;; 1    (etc)

;; Let's check this is actually true for lots of numbers:

(def naturals (map inc (range))) ;; (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...)

(defn odd [n] (- (* 2 n) 1))
(def odds (map odd naturals))   ;; (1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 ...)

(defn even [n] (* 2 n))
(def evens (map even naturals)) ;; (2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 ...)

;; Here are all possible pairs of natural numbers
(def pairs (for [sum naturals i (take (dec sum) naturals)] (list i, (- sum i)))) ;; ((1 1) (1 2) (2 1) (1 3) (2 2) (3 1) (1 4) (2 3) (3 2) (4 1) (1 5) (2 4) (3 3) (4 2) (5 1) (1 6) (2 5) (3 4) (4 3) (5 2) ...)

;; From which we can construct all odd-even pairs
(def odd-even-pairs (for [[i,j] pairs] (list (odd i) (even j)))) ;; ((1 2) (1 4) (3 2) (1 6) (3 4) (5 2) (1 8) (3 6) (5 4) (7 2) (1 10) (3 8) (5 6) (7 4) (9 2) (1 12) (3 10) (5 8) (7 6) (9 4) ...)

;; From which we can construct all sums of squares of odd and even numbers
(def odd-even-squares (for [[i,j] odd-even-pairs] (+ (square i) (square j)))) ;;(5 17 13 37 25 29 65 45 41 53 101 73 61 65 85 145 109 89 85 97 ...)

;; paranoid check, if I did that right they should all be 1 mod 4
(map mod4 odd-even-squares) ;; (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...)


;; So, since both : it's obvious ; and also it looks like it might actually be true, I think we can take it to the bank that:

;; If o is an odd number, and e is an even number, then o*o+e*e is 1 mod 4

;; This has apparently been pretty obvious to everyone who's ever thought about it, and is just one
;; of those number-theory facts that always seem to fascinate people who aren't me.

;; But it occurred to Albert Girard in 1632 to wonder if the converse was true:

;; If you have a number that is 1 mod 4, can you always split it into odd and even squares?

;; Let's look for counterexamples:
(def early-ones (sort (take 10000 odd-even-squares))) ;; (5 13 17 25 29 37 41 45 53 61 65 65 73 85 85 89 97 101 109 113 ...)

(def candidates (map #(+ (* 4 %) 1) naturals)) ;; (5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 ...)


;; This is such a hack, I am shame....
(require 'clojure.set)
(clojure.set/difference (set (take 100 candidates)) (set early-ones)) ;; (9 21 33 49 57 69 77 81 93 105 121 129 133 141 161 165 177 189 201 209 ...)

;; So it looks like the converse is not true.

;; 9, for instance, is 4*2+1, so it's a candidate, but it looks like you can't split it into odd and even squares.
;; Let's try by hand, just going through all the possible odd numbers

;; 9 = 1*1 + 8 ; 8 is not the square of anything
;; 9 = 3*3 + 0 ; 0 is not the square of a natural number

;; This seems a bit dodgy to me, 9 = 3 * 3 + 0 * 0, so does it work if we count 0 as an even number? This will turn out not to matter too much.....

;; Look at 21
;; 21 = 1*1 + 20 ; 20 is not the square of anything
;; 21 = 3*3 + 12 ; 12 is not the square of anything
;; 21 = 5*5 - 4  ; oops, 5*5 is too big, although it does look like 21 = 5*5 - 2*2, but that wasn't the question. Sum of squares, not difference of squares.

;; OK 33 then
;; 33 = 1*1 + 32 ; 32 is not the square of anything
;; 33 = 3*3 + 24 ; 24 is not the square of anything
;; 33 = 5*5 + 8  ;  8 is not the square of anything
;; 33 = 7*7 - 16 ; oh bloody hell, 33 = 7*7 - 4*4

;; Anyway, that's not the question. Neither 21 nor 33 are the sum of two squares., even though 21 = 5*4+1 and 33 = 4*8+1

;; Looking at the list of the ones that did work
early-ones ;; (5 13 17 25 29 37 41 45 53 61 65 65 73 85 85 89 97 101 109 113 ...)

;; One might notice that there are an awful lot of prime numbers on that list. (although they're not all primes....)

;; In fact
(defn divisors [n] (filter #(= 0 (rem n %)) (map inc (range n))))

(divisors 12) ;; (1 2 3 4 6 12)

(defn prime? [n] (= (count (divisors n)) 2))
(filter prime? (range 200)) ;; (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 ...)


(def candidate-primes (filter prime? candidates)) ;; (5 13 17 29 37 41 53 61 73 89 97 101 109 113 137 149 157 173 181 193 ...)

(clojure.set/difference (set (take 100 candidate-primes)) (set early-ones)) ;; #{}

(clojure.set/difference (set (take 1000 candidate-primes)) (set early-ones)) ;; #{}

;; If there's a prime of form 4*n+1 that isn't the sum of one odd and one even square, then it's fairly big

;; It looks as though, at least for the first thousand candidates, if a number is both prime, and of form 4*n+1,
;; then it is expressible as an even square plus an odd square.

;; Albert Girard, in 1632, presumably after a fair amount of dicking about with bits of paper,
;; conjectured that this might be so for all candidate primes.

;; On December 25th 1640, Pierre de Fermat wrote to his friend Marin Mersenne that he had proved
;; that this was true, but he did not trouble to include the proof or indeed ever tell anyone how he
;; did it.

;; On the 12th of April 1749, Leonhard Euler wrote to his friend Christian Goldbach that he had
;; proved that this was true, and then actually went and published his (intricate and difficult) proof.

;; In recognition of Fermat's invaluable contribution, this result is universally known as

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;             Fermat's Christmas Theorem.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Let p be a prime number

;; Then p is expressible as the sum of odd and even squares
;; if and only if p is 4*n+1 for some natural number n








Thursday, December 2, 2021

A test program

#!/usr/bin/env clojure

;; It looks like clojure has become a full citizen of Debian

;; In emacs, a syntax highlighted program can be converted to html with htmlize-buffer, and then if
;; you top and tail that and copy it to blogger it comes up nicely. And this seems to still be
;; working after all this time

;; So here is my first published clojure program for a while...

;; If you're using Debian 11, you can run it by installing clojure with
;; $ sudo apt install clojure

;; saving this file as test.clj

;; and then in a terminal window making the file executable
;; $ chmod +x test.clj
;; and then running it
;; $ ./test.clj

(println "hello")

(defn factorial [n]
  (if (< n 1) 1
      (* n (factorial (- n 1)))))

(println '(factorial 10) (factorial 10))

;; If all is working then you should see something like
;; $ ./test.clj
;; hello
;; (factorial 10) 3628800

Emacs, even

The truly terrifying bit of getting any language to work on your machine is getting it to work with EMACS. 

Last time I tried to get clojure working, this was the stage that I gave up at. Everything was just broken, and so I wrote the program I was going to write in python instead.

But all it took this time was:

(Emacs 27.1, the version that comes with Debian 11/Bullseye)

M-x package install

clojure-mode

(churn)

and then I can load my test.clj program from earlier, and it's syntax highlighted. 

M-x package-list-packages

tells me that it's clojure-mode 5.13.0, from melpa-stable .

I can save the program, and run 

./test.clj 

from the command line in a terminal, and it just works, with a slightly irritating 2 second delay.

I might hope for more:

I remember once I could do things like putting the cursor in an expression and pressing Alt-Ctrl-x to evaluate that expression in a running REPL.

but this will do for now. 

I can always copy and paste things to a REPL, after all.


Clojure Shell Scripts on Debian

So I managed to install a very recent clojure through Debian's package manager, and it just worked.

And my next thought is: "If it's somehow become a proper part of Debian, does that mean I can do shell scripts?"

so

cat >test.clj

#!/usr/bin/env clojure
(print "hello")

chmod +x test.clj

./test.clj

(2 second pause...)

hello

Compare with the python equivalent:

cat >test.py

#!/usr/bin/env python
print("hello")


chmod +x test.py

./test.py
hello

The only difference is the lack of a two second pause. Python responds instantly.

I can live with a two second pause.

I am totally amazed and pleased by this. It feels much more like a proper programming language than it used to. It's too good to be true.

I wonder if we've got a command line argument parser yet.

Hello Again World (2nd December 2021) Setting Up Clojure on Debian 11

It's been a while since I've used Clojure, and I've upgraded my system a fair bit since then, so I'm going to try to start again from scratch.


My computer is running latest stable Debian

cat /etc/debian_version
11.1

(version 11.1, aka bullseye)

My first instinct is to install via the package manager:

apt search clojure

clojure/stable,now 1.10.2-1 all
  Lisp dialect for the JVM

clojure.org tells me that the latest version is 1.10.3, so the debian version will do.

sudo apt install clojure

clojure
Clojure 1.10.2
user=> 

That seems far too easy, does it work? 

First, can I do hello world at all?

user=> (print "hello")
hellonil

user=> (defn factorial [n] (if (< n 2) n (* n factorial (- n 1))))

#'user/factorial

user=> (factorial 10)
Execution error (ClassCastException) at user/factorial (REPL:1).
class user$factorial cannot be cast to class java.lang.Number (user$factorial is in unnamed module of loader clojure.lang.DynamicClassLoader @56ccd751; java.lang.Number is in module java.base of loader 'bootstrap')

Oh my God, what???

.....stares in bewildered disbelief and starts cursing Java and its pointless verbose complexity before seeing the problem and feeling silly...

user=> (defn factorial [n] (if (< n 2) n (* n (factorial (- n 1)))))
#'user/factorial
user => (factorial 10)
3628800


So that looks great, and it was very easy. Things have improved hugely since I last tried this!

I could hope for better error messages, something like 'can't multiply a number by a function' would have been more helpful.


 


 






Tuesday, February 5, 2019

The Unexpected Appearance of Schlemiel, the Painter

;; The Unexpected Appearance of Schlemiel, the Painter

;; I was doing some statistics one day, and I defined:

;; the average of a finite sequence
(defn average [sq] (/ (reduce + sq) (count sq)))

;; and the square of a number
(defn square [x] (* x x))

;; and a way of forgetting about all the fiddly little digits at the end
(defn twosf   [x]  (float (/ (Math/round (* x 100.0)) 100))) 

;; but for the variance I was a little torn between:
(defn variance-one [sq]
  (let [av (average sq)]
    (average (map #(square (- % av)) sq)))) 
 
;; and 
 
(defn variance-two [sq]
  (let [sqdiff #(square (- % (average sq)))]
    (average (map  sqdiff sq))))

;; and (I have a regrettable weakness for the terse...) 
(defn variance-one-liner [sq] (average (map #(square (- % (average sq))) sq)))

;; but I was surprised when I noticed this: 

(let [s (repeatedly 1000 #(rand))]
  (twosf (reduce + s)) ;; just to force the sequence to be generated before timing things
  [(time (twosf (reduce + s)))
   (time (twosf (average  s)))
   (time (twosf (variance-one s)))
   (time (twosf (variance-two s)))
   (time (twosf (variance-one-liner s)))])

;; "Elapsed time: 0.535715 msecs"
;; "Elapsed time: 0.834523 msecs"
;; "Elapsed time: 1.417108 msecs"
;; "Elapsed time: 251.650722 msecs"
;; "Elapsed time: 248.196331 msecs"
;; [496.83 0.5 0.09 0.09 0.09]


;; It seems that all these functions are correct, in the sense that they are producing
;; correct-looking answers, and yet one of them is orders of magnitude faster.

;; What is going on here, and why?

Thursday, December 13, 2018

Reinforcement Learning : Exploration vs Exploitation : Multi-Armed Bandits


;; Reinforcement Learning : Exploration vs Exploitation : Multi-Armed Bandits

;; I'm reading the excellent:

;; Reinforcement Learning: An Introduction
;; by Richard S. Sutton and Andrew G. Barto

;; The book's website, on which is available a complete pdf, is here:
;; http://www.incompleteideas.net/book/the-book.html

;; In Chapter 2, they introduce multi-armed bandits as a simplified model problem 

;; On the basis that you don't understand anything you can't explain to a computer, I thought I'd code it up:

;; Here is a 2 armed bandit
(defn bandit [action]
  (case action
    :arms? [:right :left]
    :right (if (< (rand) 0.5) 4 0)
    :left (if (< (rand) 0.2) 5 0)
    :oops!!))

;; We can ask it how many arms it's got, and what they're called
(bandit :arms?) ; [:right :left]

;; And we can pull those arms. Rewards are variable.
(bandit :right) ; 4 ; 4 ; 4 ; 0 ; 0 ; 0 ; 0
(bandit :left) ; 5 ; 0 ; 0 ; 0 ; 5 ; 0 ; 5 ; 0

;; Once we pull an arm, we'll have an action/reward pair
(bandit :right) ; 4
;; the pair would be:
[:right 4]

;; Here's a function that yanks an arm at random, and gives us such a pair
(defn random-yank [bandit]
  (let [a (rand-nth (bandit :arms?))]
    [a (bandit a)]))

(random-yank bandit) ; [:left 0]
(random-yank bandit) ; [:right 4] 

;; And a utility function to take the average of a sequence. We need to be able to provide a default value if the sequence is empty.
(defn average
  ([seq default] (if (empty? seq) default (/ (reduce + seq) (count seq))))
  ([seq] (average seq 0)))

;; with some examples
(average [1 2 3 4 5]) ; 3
(average (list) 10) ; 10
(average (list 1) 2) ; 1
(average [] 100) ; 100


;; If we just pull arms at random we get an average reward of about 1.5

(float (average (map second (repeatedly 1000 #(random-yank bandit))))) ; 1.49

;; Since we can see the code for this particular bandit, we know that
;; the expected value of pulling the right arm is 2 (a half-chance of
;; a reward of 4) and the expected reward for the left arm is 0.2*5 = 1

;; So if we were seeking to maximize reward, we'd probably be best to pull the right arm all the time.

(float (average (map bandit (repeat 10000 :right)))) ; 1.9912 
(float (average (map bandit (repeat 10000 :left )))) ; 0.985


;; The interesting question is, if we don't know how the bandit works, how should we design an algorithm that gets the most reward?
;; (Or at least do better than yanking arms at random!)

;; One thing our algorithm is going to have to do is keep some state to record what happens.
;; Let's start by recording the results of all pulls to date:

;; At first, we know nothing, so we can set up a table to represent that we know nothing
(defn initial-state [bandit]
  (into {} (for [k (bandit :arms?)] [k (list)])))

;; We haven't pulled either arm yet
(initial-state bandit) ; {:right (), :left ()}

;; When we get a new action reward/pair, we'll add the result to our state
(defn update-state [state [action reward]]
  (update-in state [action] #(conj % reward)))

;; here are some examples of using update-state
(update-state {:right (), :left ()} [:right 2]) ; {:right (2), :left ()}
(reduce update-state {:right (), :left ()} [[:right 2] [:left 3]  [:right 4] [:right 5]]) ; {:right (5 4 2), :left (3)}

;; here's how we can use it to record the result of ten random yanks
(reduce update-state
        (initial-state bandit)
        (repeatedly 10 #(random-yank bandit))) ; {:right (4 4 0 0 0), :left (0 0 0 0 5)}


;; Once we actually have some data, we can make estimates of the expected rewards

;; mapvals applies a function to every value in a map, returning a new map with the same keys
(defn mapvals [m f] (into {} (for [[k v] m] [k (f v)]))) 

;; examples
(mapvals {} inc) ; {}
(mapvals {:a 1} inc) ; {:a 2}
(mapvals {:a 1, :b 2} inc) ; {:a 2, :b 3}
(mapvals {:a 1, :b 2, :c 3} #(* % %)) ; {:a 1, :b 4, :c 9}


;; In the book, Q_t(a) is the current estimate (at time t)
;; We'll use as our estimate of the value of an action the average value seen so far, or zero if we have no information

(defn Q [state] (mapvals state #(average % 0)))

;; examples
(Q '{:right (5 4 2), :left (3)}) ; {:right 11/3, :left 3}
(Q '{:right (5 4 2), :left ()}) ; {:right 11/3, :left 0}
(Q (initial-state bandit)) ; {:right 0, :left 0} 
(Q (update-state (initial-state bandit) (random-yank bandit))) ; {:right 0, :left 2}

;; let's check that we get roughly what we expect in the long run
(Q (reduce update-state (initial-state bandit)
        (repeatedly 10000 #(random-yank bandit)))) ; {:right 9832/5015, :left 1027/997}


;; If we have estimates of the value of each arm, then a good way to
;; use them is to pull the arm with the highest estimate.

;; This is called 'exploitation', as opposed to 'exploration', which
;; is when you try things you think may be suboptimal in order to get
;; information

;; The 'greedy' action is the one with the highest expected value. Of
;; course there may be more than one greedy action especially at first.

;; To help with this, another utility function:

;; max-keys finds the keys with the highest value in a map, and returns a list with just these keys and values
(defn max-keys [m]
  (let [slist (reverse (sort-by second m)) 
        [_ max] (first slist)]
    (take-while #(= (second %) max) slist)))

;; examples
(max-keys {}) ; ()
(max-keys {1 0}) ; ([1 0])
(max-keys {1 0, 2 0}) ; ([2 0] [1 0])
(max-keys {1 0, 2 1}) ; ([2 1])
(max-keys {1 0, 2 1, 3 -1 , 4 -3, 5 2, 6 2}) ; ([6 2] [5 2])

;; if there is a tie for the greedy action, we can choose at random between the candidates
;; And so we can go from estimates to greedy action like this:
(defn greedy-action [estimates]
  (first (rand-nth (max-keys estimates))))

;; examples
(greedy-action '{:right 10, :left 3}) ; :right
(greedy-action '{:right 10, :left 3 :centre 20}) ; :centre
(greedy-action '{:right 10, :left 3 :centre 3}) ; :right
(greedy-action '{:right 3, :left 3 :centre 3}) ; :right

(greedy-action (Q '{:right (5 4 2), :left (3)})) ; :right
(greedy-action (Q '{:right (), :left (3)})) ; :left
(greedy-action (Q (initial-state bandit))) ; :left

;; after a lot of random pulls, the greedy action should reliably be the one with the highest expected payoff
(greedy-action (Q (reduce update-state (initial-state bandit)
        (repeatedly 10000 #(random-yank bandit))))) ; :right 

;; OK, so we have our stage set, a way of recording what's happened, and some helpful functions defined.


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Our first try at a learning algorithm will be 'by hand', as it were.

;; We'll always make the 'greedy' choice.

;; At first, we have no records to go on
(initial-state bandit) ; {:right (), :left ()}

;; expected values for both levers are therefore zero
(Q (initial-state bandit)) ; {:right 0, :left 0}

;; so the greedy action will get chosen at random
(greedy-action (Q (initial-state bandit))) ; :left

;; in this case, we've chosen :left, and the bandit's response is
(bandit :left) ; 0 

;; we record it
(update-state (initial-state bandit) [:left 0]) ; {:right (), :left (0)}

;; and we have a new state
'{:right (), :left (0)}

;; new estimates
(Q '{:right (), :left (0)}) ; {:right 0, :left 0}

;; and again, we choose at random
(greedy-action (Q '{:right (), :left (0)})) ; :left

;; the bandit is not feeling very generous
(bandit :left) ; 0

(update-state '{:right (), :left (0)} [:left 0]) ; {:right (), :left (0 0)}

;; new state:
'{:right (), :left (0 0)}

;; new estimates
(Q '{:right (), :left (0 0)}) ; {:right 0, :left 0}

;; this time we choose :right
(greedy-action (Q '{:right (), :left (0 0)})) ; :right

;; and the bandit pays out! 
(bandit :right) ; 4

(update-state '{:right (), :left (0 0)} [:right 4]) ; {:right (4), :left (0 0)}

;; the greedy action will be :right now, because we have evidence that right is better.
(greedy-action (Q '{:right (4), :left (0 0)})) ; :right

;; You get the idea......

;; Let's automate that....

;; Given a state and a bandit, we decide an action and the bandit
;; responds, producing an action/reward pair, and a new state

(defn greedy-algorithm [bandit state]
  (let [action (greedy-action (Q state))
        reward (bandit action)]
    [[action reward] (update-state state [action reward])]))


(greedy-algorithm bandit (initial-state bandit)) ; [[:left 0] {:right (), :left (0)}]

;; To get something we can iterate:

(defn step [[[a r] state]]
  (greedy-algorithm bandit state))

(iterate step [ [:dummy :dummy] (initial-state bandit)])

;; ([[:dummy :dummy] {:right (), :left ()}]
;;  [[:left 5] {:right (), :left (5)}]
;;  [[:left 0] {:right (), :left (0 5)}]
;;  [[:left 0] {:right (), :left (0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 0 0 0 0 0 0 5)}]
;;  [[:left 5] {:right (), :left (5 0 0 0 0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 5 0 0 0 0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 5 0 0 0 0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 5 0 0 0 0 0 0 0 0 0 0 5)}]
;;  [[:left 0] {:right (), :left (0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 5)}]

;; In this case, the greedy algorithm happens to get a payout on its
;; first try, and decides that it will pull that arm for ever. It
;; never even tries the other arm.

;; Try again:

(iterate step [ [:dummy :dummy] (initial-state bandit)])
;;([[:dummy :dummy] {:right (), :left ()}]
;;  [[:right 0] {:right (0), :left ()}]
;;  [[:right 0] {:right (0 0), :left ()}]
;;  [[:left 0] {:right (0 0), :left (0)}]
;;  [[:right 4] {:right (4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 4 4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 4 4 4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 0] {:right (0 4 4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 0] {:right (0 0 4 4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 0 0 4 4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 0] {:right (0 4 0 0 4 4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 4] {:right (4 0 4 0 0 4 4 4 4 4 4 0 0), :left (0)}]
;;  [[:right 0] {:right (0 4 0 4 0 0 4 4 4 4 4 4 0 0), :left (0)}]

;; In this case, it tried the right arm a couple of times, then had a
;; go with the left arm, then went back to the right arm, won a
;; payout, and then got hung up on pulling the right arm repeatedly.



;; We've got a couple of problems here!

;; First is that the algorithm has clearly got into a state where it
;; always pulls the left arm (in the first case), and the right
;; arm (in the second case).

;; It can't be doing the right thing in both cases.

;; Secondly the state is growing linearly, as the algorithm remembers
;; all previous results. That's giving us algorithmic complexity
;; problems and the calculation will get slower and slower, and
;; eventually run out of memory.













Followers