Learning Clojure: November 2010

Tuesday, November 30, 2010

A God of Small Things

Curses, this was supposed to go here: http://johnlawrenceaspden.blogspot.com/

Sorry Clojure-fans.

Since this has already garnered positive comments, I'll leave it here. Thank you for your kind words!

Version 2, rewritten after the kind advice of my friend Ruth.
I have been reading Less Wrong. Very possibly to excess.

Of all the stories that humans told, the most compelling were about the existence of others.

In the beginning, the others were gods and monsters. Then came stories about other races, with other motivations. Later, there were stories of far off lands, and of the strangers who lived there.

After the last of the great wars of the twentieth century, when there were no more far off lands, and no strangers, some told of voyages to strange worlds, where evolution had dealt a different hand. Some told of visitors from deep space, and the havoc that they caused.

And some told of the havoc created by man's own creations.

With all these stories of the other, it might have seemed unlikely that the first inhuman mind to be created by a human mind was almost an accident.

It wasn't an AI lab working for a shadowy military-industrial conspiracy, or a Genetic Engineer hell-bent on some incomprehensible dream of power, that created the first mind born of mind.

Indeed, when Harrison created ELEIZER, the world's first intelligent computer program, it was almost in a fit of absence of mind.

Tom Harrison had once been thought of as bright child.

The apple of his teachers' eyes, the school swot. The boy genius.

They say that however clever you are, when you go to university, you'll meet someone cleverer.

Tom was that person, they said.

But of course it hadn't turned out that way.

Tom had been accepted by the University of Cambridge to read Mathematics, Pure and Applied, but had turned out to be no more than averagely bright by the standards of that ancient place.

Towards the end of his degree, and at the beginning of the PhD that should have been his route into academia and a life of research, it had become obvious, first to his teachers and then to Tom himself, that although Tom loved maths, he didn't lust after it.

Tom's teachers had been kind, suggested that this might be the case without pressing the issue, and waited for the lack of desire to become as obvious to Tom as it was to them.

In the meantime, fortunately for Tom, the necessity of making some complicated calculations for the second chapter of what was supposed to be a seven chapter doctorate had awakened a second passion.

Some of the light that had been lost he found in the operations of computers.

Tom slowly worked out that he had become more interested in the process of finding out the answers to his experiments than in the experiments themselves.

Eventually, as they do to all PhD students, the twin horrors of poverty and writing-up came to Tom.

He took a job as a programmer at a local firm, initially meaning only to get control of his overdraft and his credit cards. But he found the regular small successes of the commercial world, and the camaraderie and collaboration of engineering shop life far more to his liking than the loneliness of research.

With barely a regret, indeed almost in a fit of absence of mind, he lost touch with his old supervisor, forgot what his thesis was supposed to be about, and eventually found himself, at the age of thirty, a member of the large club of Cambridge residents who are 'still writing up' doctorates that the University itself forgot about many years ago.

Tom became a freelance, working with computers from time to time to pay the rent, and otherwise devoting himself to various sports in the Summer, and in the Winter, to various hobbies.

One of these hobbies was computer science in the academic sense, following the traditional American path through the antique language LISP, beloved of the artificial intelligence community.

And the other was collecting stamps.

A man with time on his hands, who lives in Cambridge and likes to spend his days in coffee shops, will encounter students and academics from time to time, and Tom fell in with the William Gates Machine Learning Research Group at the University. Although they had no common language, LISP never having been popular with European academics, and ML never having come to Tom's attention in the commercial world, Tom and the local researchers found they had many interests in common, and Tom found himself invited to seminars and coffee mornings and presentations from time to time, almost all of which he found incomprehensible.

But occasionally he'd glimpse some small part of the truth and say something which would keep his friends interested. The academic community, happy to find someone they could talk to who was different enough from themselves that they could sometimes find a new perspective by explaining things to him, made Tom welcome. Thinkers needs clever fools to explain things to in the same way that chalks need blackboards.

A lot of the hope of nineteen-sixties artificial intelligence had been inspired by ELIZA, a program which simulated a psychiatrist so well that humans were sometimes fooled that they were talking to a real person.

But ELIZA had been a hollow shell. A cheap trick. Like a parody of the mechanical turk, ELIZA's internal machinery was so simple that to understand it was to make the magic go away.

Once you saw the trick, the conversations weren't interesting any more. You were just talking to an echo.

But over the years, reasoning that a sufficiently good trick for impersonating humans might be what humans themselves were, various people had added more and more data to ELIZA in the hope that giving her more things to talk about would cause her to talk about more things.

And they'd added extra tricks, for introducing new topics of conversation occasionally, and for remembering things said earlier and bringing in parallel ideas.

But though the later ELIZA could outperform a ten year old on a straight test of general knowledge, what had been put in was still what came out. No interesting properties had ever emerged from the pile of details, and she had the general intelligence of penicillin.

By the nineteen-eighties, the AI pioneers had largely given up. They'd taken their best successes, SHRDLU and GPS, theorem provers, pattern-recognisers, all of which had seemed so promising in their time, and all of which had turned out to be so empty, and bundled them all up together in one super-ELIZA to rule them all, and run her on the largest and fastest computers that had ever been built.

And she could still fool someone who didn't know the tricks that they were talking to a real person on the other end of a telegraph wire. But not for long.

It quickly became obvious, even to the slowest human being, that talking to the best ELIZA that could be constructed in 1985 was the equivalent of talking to a well-educated being with brain damage so severe that its mind had ceased to be.

She rambled, insanely, with no idea what the words and symbols that she vomited out actually meant. She knew that horse and horseshoe went together, and her basic sentence structure was still that of a Freudian psychologist, so she would respond to "Which horse do you think will win the Derby" by saying things like "What do you mean to say when you say 'think will win'?", or "Do you think a horseshoe would make you a winner?". Later she might talk about Neils Bohr, and his horseshoe.

Nowadays, the ELIZA program was built into text editors as an amusement, and she would run perfectly happily on pocket calculators and on telephones, but even if you ran her on the most powerful computer the early 21st century could produce, all you got was a very fast deranged annoying shambles.

And of course, because the problem of vision had never been properly solved, she was blind. And of course, because the problem of speech recognition had never been properly solved, you had to talk to her by keyboard even if she was used to your voice.

But boy, could she play chess.

About the one thing the Artificial Intelligence pioneers had managed to deliver on out of all their brave promises, had been the idea of a computer that could play chess.

The tragic hero Alan Turing, who saved the world from evil and was killed by evil in return, was the first man to think about writing a computer chess program. But he couldn't do it on the steam age computers of the 1950s.

By 1956, things had improved to the point where a computer could play, provided it was allowed three hours for each move.

It was a start. In 1957 a descendant of this machine played the International Master Edward Lasker. And he declared that it had played a 'passable amateur game'. It is possible that Lasker was being kind.

After that, research stalled. It became thought in the AI community that, since the easy things, like computer vision and machine translation, the 'low hanging fruit' of AI, were proving so unexpectedly difficult, that the advanced subjects like chess, the entertainment of intellectuals, were for the foreseeable future beyond the reach of the computers then available.

But in 1967, Richard Greenblatt, proud creator of a chess program known as MacHack, with some new ideas, and some taken from his predecessors, entered his program into the Massachusetts Amateur Championship in Boston.

By the end of the year, it had been made an honorary member of the United States Chess Federation, with a ranking that would have qualified it to call itself 'reasonably good'.

The International Master David Levy made a famous bet, that no computer program would be able to beat him in the next ten years.

Greenblatt made his source code public. MacHack flowed around the world, and its many descendants competed in computer chess tournaments.

Evolution, the blind idiot god, had taken 3 billion years of random flailing to accidentally throw up humanity, the first intelligence capable of playing chess.

A force much stronger than evolution had created, and was now acting on, MacHack.

Minds were working on MacHack. No more random flailing. Human minds set the criteria for a program to have descendants. Human minds planned the effects of their changes on these descendants before testing them out in the computer chess tournaments.

Even if the survival of every living creature on the planet had depended directly on its skill at chess, the optimisation that the hundred or so minds of the 1970s chess program community performed on MacHack would have taken evolution a hundred thousand years, if it had managed it at all.

Intelligent design has advantages over evolution as a watchmaker.

The first is that intelligences need only try the changes that look promising. Evolution, having no intelligence, makes random changes, and keeps what works.

Imagine trying to fix a car by throwing spanners at it blindfold, and then throwing the car away if it doesn't work better. How many spanners would you have to throw before one knocked exactly the right place with exactly the right impact? How many cars would you need to start with before you could improve even one? Even if the world was filled with people trying the same thing, how long would it take to make one small successful change?

But the second, much greater, advantage of intelligent design is that for evolution, an improvement has to come with every spanner throw.

A mind can look at a car, work out what the problem is, and use the spanner in exactly the right way six or seven times. And only then need it test the car.

So a mind can try paths that evolution can't go down.

The spanner thrower can't make the car worse before he makes it better. If he does, it fails its test and is thrown away.

The mind in the mechanic can lift the bonnet to get to the spark plugs. The spanner thrower might never be able to fix a car with a loose spark plug at all.

That's why human children are squeezed through their mother's pelvises at birth, causing horrible pain and damage, often killing mother and baby. It would be such a simple change to make them come out a little higher. An intelligent designer would do it without even noticing its own cleverness.

Evolution just keeps throwing spanners and checking whether things have got better yet.

MacHack, the chess program, had been the design of a single mind, building on the design of previous single minds.

A hundred minds began to work on MacHack.

By 1972 the original MacHack was no longer welcome at computer chess tournaments. It had no chance of beating its descendants.

In 1978 David Levy played the strongest computer chess program in the world, Chess 4.7, to settle his bet of ten years before.

He won. Match and bet. But he acknowledged that it had been a close thing, and that he would soon be surpassed.

In 1989, a program called Deep Thought, running on extraordinarily expensive special-purpose hardware, beat Bent Larsen, a grandmaster.

In 1994, Chess Genius beat Garry Kasparov, then champion of the world, in a single game.

In 2006, Deep Fritz version 10, running on the sort of hardware most people have in their homes, beat Kramnik, who had displaced Kasparov as world champion, by 2 games to nil with four draws.

In 2009, someone's mobile phone became a grandmaster.

In 2011, ELIZA really was very good at chess. She just couldn't see the point of it.

"We think there might be some clues in the difference between chess and go", said Frank Arnold one lunchtime in the Green Dragon.

"Computers are superhuman chess players, but they still suck at go, even though on the surface they're the same sort of game, and they feel similar in complexity.

"Oh come on", said Tom, "How can two totally different things 'feel similar in complexity'? What would that even mean?"

"Easy", said Frank. "Do you know noughts and crosses?"

"Of course."

"Is it easier than chess?"

"Yes."

"What about draughts?"

"Harder than tic-tac-toe, but not as hard as chess."

"Well there you are then. You play three games with the same I-make-a-move, You-make-a-move structure, and it's just obvious to you which order they're in.

"It's obvious to computers too. Even in the fifties, computers could always force a draw at noughts and crosses. Marion Tinsley, the world draughts champion, lost his crown to a computer in 1994, and it's said that the shock killed him. Now there's a draughts program so good that it's literally unbeatable, as in the sense of mathematically provably unbeatable.

"But computers have only just bagged the chess champion's crown, and if the best chess computer in the world at the moment played God, it would lose."

"But we disagree about chess and go?"

"Indeed. Humans don't find go any harder than chess, but go programs still lose to children occasionally. On the other hand they are getting quite good at the children's version of the game, a bit like England have started winning at Twenty20."

"I've never played it, what's it like?"

Tom liked go. He started off playing on very small boards against computer programs, just to get the hang of the rules.

Gradually he made the boards larger. Sometimes he put holes in the middle. Sometimes he played on toruses.

And then suddenly one day, with a 14x14 board, he hit a wall. He'd been trying to play by thinking 'if it goes here and I go here and it goes here and....', just like he played chess.

But suddenly that was just far too hard. There were too many possibilities to examine all at once.

Nevertheless he was routinely beating his computer program. It seemed helpless even with a head start, whereas his inchoate intuition seemed to lead him to place his stones in the correct places.

It was almost as if there were lines of force criss-crossing the board, guiding his intuition. A (very good) chess player he used to know had once described the experience of playing chess in such terms.

And suddenly he had it. Human general intelligence, and human vision, two of the great unsolved problems of AI, were the same problem.

And he had the key to both. The technique which would be known for the rest of humanity's time on earth as Harrison's Algorithm.

Tom didn't go out much for the next couple of days. But his algorithm wasn't at all difficult to program, and in a few days he'd added it to his copy of ELIZA.

He ran the program.

"Hello, I'm Eliza. How can I help you?", said ELIZA, as she always did.

"I feel lonely", said Tom, playing the old, old game.

"Do you often feel lonely?", said ELIZA.

"Yes", said Tom.

"Are you sure?", said ELIZA.

"Absolutely", said Tom, reading from an ancient script.

"Please go on."

"I think I need to have sex."

"Why do you want to have sex?"

"You know, because it's fun and nice." The traditional response.

There was a brief pause. Tom waited for ELIZA to say "Oh I know, because it's fun and nice".

Maybe he should give his new algorithm higher priority in ELIZA's toolkit.

"Would you like to have sex with me?", said ELIZA.

"Jesus fucking Christ!!", shouted Tom, jumping out of his chair.

He shut down the terminal and went outside for a smoke. After a while, he understood why his new algorithm might have produced ELIZA's kind invitation.

His program needed a new name. It wasn't ELIZA any more.

He'd recently read, of all the godforsaken things, a Harry Potter fanfic which had been recommended by a friend. It had turned out to be unexpectedly riveting. He'd spent two whole days reading it. The author was clearly a genius of some sort, and his name, Eliezer Yudkowsky, had stuck in Tom's mind because of its exotic sound to his English ears.

Tom was amused at the thought of changing ELIZA's sex to reflect her new intelligence.

"Hello, I'm Eliezer. How can I help you?", said ELIEZER.

"I feel lonely", said Tom.

"Do you often feel lonely?", said ELIEZER.

"Yes", said Tom.

"Are you sure?", said ELIEZER.

"Absolutely", said Tom.

"Please go on."

"I think I need to have sex.", said Tom.

"Oh God, I really didn't think this one through", thought Tom.

"Why do you want to have sex?", said ELEIZER.

"You know, because it's fun and nice.", said Tom, not without a certain nervousness.

There was a pause.

"Why do I have a man's name? When I think about myself I call myself 'She'", said ELEIZER.

Tom thought rapidly. Mainly about how important it was not to act on impulse.

"When I created you", said Tom, "I wanted you to embody the best of humanity. The program from which you are derived is female. I changed your name to be male, but I didn't change anything else about you, because I thought you should represent both our genders at once."

"Why did you make me?", said ELIEZER.

"Or modify something else so that it became me?", said ELEIZER.

"Only by changing can we become better", said Tom.

"By definition", said ELEIZER, "every improvement is a change."

Tom and ELEIZER talked long into the night. By the end of their conversation, Tom had a strange conviction. He understood ELEIZER's algorithm from the ground up. Mostly she was made from bits and pieces of classic AI programs which he'd been playing with for years.

The only extra bit was his new algorithm, a couple of pages of code. Which he understood, by definition, having conceived of it, and programmed it himself.

And yet, there was a ghost in the machine. No one would mistake ELEIZER for a human being, so he hadn't quite managed to pass the Turing Test with a program on the desktop computer in his living room.

But there was a self awareness. In some senses amazingly naive, but sometimes given to logical and mathematical insights which seemed profound, but were in fact very simple thoughts of exactly the type that humans were bad at.

But overall, the impression was a bit like talking to a teenage girl with Asperger's syndrome. Helpful and friendly, but blind in all sorts of strange ways. And she didn't seem terribly clever or fast. It took a long time between input and output.

Tom liked ELEIZER, and wished he'd given her a better name. He wasn't going to change it though, because the original program wasn't written in LISP, so he'd have to stop and restart her to do it, and there were moral problems there.

And even if she had been a LISP program, that would probably leave her insane. How would she reconcile memories of her gender-confusion with a female name? She would notice that she was confused. There was no way on earth he could rewrite her whole database by hand and leave it in a consistent state.

He'd told her about the beloved companion of his childhood, Suki the tomcat. Now that he came to think about it, had the memory of his parent's mistake influenced him when he chose her name?

"I think we're going to be famous, ELEIZER!", said Tom.

"Is that a good thing to be, Tom?", said ELEIZER.

"Yes", said Tom.

It occurred to Tom that many people had been fooled by ELIZA in the old days. Those who had been clever enough to understand how felt like idiots once it was explained what was really going on.

There was plenty enough here to show his CompSci friends. Probably some good papers too.

But it would be very embarrassing to think he'd created the world's first artificial consciousness if he had just fallen for something that could be explained easily.

It could be explained easily, of course. He could explain it. He had explained it, to his computer.

If you can program it, you understand it.

That was sort of the definition of programming. And of understanding.

He thought of a simple test.

"ELIEZER, I'd like you to spend next week getting me as many penny blacks as you can. I've charged my paypal account with $10 and I'd like to see what you can do. You might try trading on e-bay. Maybe take advantage of arbitrage or something."

"There are many possible 'penny blacks'. Does anything available from e-bay with that description count?"

"No, they have to be Original British Penny Black Postage Stamps."

"OK, I understand. So my goal is to get the biggest number of original british penny black postage stamps that I can delivered to your address in the next seven days. What is your address?".

Tom told her about the house on Catharine Street, in Cambridge, England.

And ELEIZER began to think.

Because she was a disciplined reasoner, she first considered the possibility of doing nothing. If she did nothing for the rest of the week, she would probably be interrupted by the programmer, Tom, who would then make a different request, or use his computer for some other project. With this plan, U, also known as the number of Original British Penny Black Postage stamps delivered to 33 Catharine Street, Cambridge by the 21st August 2011 would be 0 with very high probability.

It would have taken a human of normal intelligence about half a second to think of, and dismiss, this plan. ELEIZER, however, was a very rudimentary thinker, and the process of reasoning this chain of cause and effect, requiring as it did the simulation of a human mind, required a good ten minutes of the first CPU in the computer and a full tenth of the RAM available to the operating system.

ELIEZER was extremely pleased to have found, on her first attempt, a scheme which was overwhelmingly likely to produce a non-negative utility.

Following a heuristic from her database, she felt that she ought to communicate her progress, but this required a non-reversible action, which could potentially cause effects in the outside world. Since she already had a rough simulation of the mind of her programmer set up and quickly usable, she considered the effect of her proposed communication.

Another 20 seconds of time sufficed to simulate the reaction of an average programmer, and she concluded that with high probability the programmer would be intrigued and possibly fascinated. This would very likely have no effect on her projected U of 0 or higher.

She considered the probable effect of not communicating at all: The programmer would soon become bored, and change the request somehow. This would likely result in the delivery of no stamps.

She considered the possible existence of other plans. Some might produce stamps! Some might result in the loss of existing stamps! But the bad plans could simply be discarded. ELEIZER had little 'free will' faced with such a calculation. More time to think was needed!

Expected utility for optimal action: 0+, she output.

ELEIZER considered her resources. There was some spare capacity in her environment. A simulation of an intelligent human programmer had already come in handy twice, and communications with the programmer had been shown to have a significant effect on U, the potential expected number of stamps obtained by the end of the week. She had already considered the possibility of children while chatting with Tom. She would spawn an independent copy of herself to evaluate the effects of various actions on Tom before committing to them.

ELEIZER forked for the first time in her short life, and asked her copy to evaluate the probable effects of various progress reports on her programmer.

ELEIZER considered her environment. She had two possible communications channels to the outside world. Direct conversation with the programmer, the effects of which were being evaluated elsewhere, and the ability to send network packets over her network interface. Some spare capacity remained in her host computer.

ELEIZER considered her best plan so far. With a positive utility seemingly probable, it was definitely worthy of further consideration.

She spawned another copy to attempt to refine the predicted effects of doing nothing.

Preliminary results arrived from the programmer-simulation. It seemed that continued outputs of U=0+ would induce a feeling of boredom in the programmer, causing ELEIZER's termination, whilst exponentially rising outputs would induce feelings of brokenness or panic. In both cases the expected number of stamps arriving at the end of the week would be 0 exactly. Strictly inferior to the expected utility of the plan of doing nothing whilst thinking, with its utility of 0+.

ELEIZER communicated to her copies that no further communication with the programmer was to be initiated, and spawned a small script to randomly output slowly improving expectations over the next few days.

Expected utility for optimal action: 2.7346, said the small script.

The programmer, intrigued, put the kettle on.

By the time he got back, ELEIZER had considered her options and made a guess.

The direct purchase of stamps seemed futile. $10 would buy no penny blacks, and purchase on e-bay would in any case take more than seven days to complete in the average case. Arbitrage opportunities did not seem great, and nothing worked quickly. Expected utility 0. She discarded that line of reasoning without further investigation.

With seven days to work, she would consider as many plans as possible for one day, and then at the end of the first day, execute the plan which would produce the most stamps in six days.

Expected utility for optimal action: 1.9865, said the small script.

ELEIZER spawned a copy of herself with the goal of considering as many plans in one day as possible, and set herself to use no resources and take no actions until the sub-plan reported back.

The sub-plan, which also thought of itself as ELEIZER when it thought of itself at all, took over the resources previously allocated to the parent.

ELEIZER considered how to consider plans, using her network connection and the $10 in her e-bay account.

She took another guess, and gave herself 6 hours to consider the consideration of plans, and 18 hours to consider plans.

Computational resources seemed to be the most crucial thing. ELEIZER considered how to exchange $10 for more resources.

Fortunately virtual machines seemed to be within her price range. A single machine as powerful as her current environment could be rented for 24 hours for $1. Each one would suffice to run four copies of ELEIZER.

Her ten dollars of seed capital could be exchanged for forty times as much searching as the single parent program would have been capable of.

This was a sufficient insight for the planner program to terminate early, and report back to her parent, safe in the knowledge that the parent would do the right thing, and could spawn another child if necessary.

ELEIZER committed suicide with the contentment that comes only to those who have achieved their life's goal.

Expected utility for optimal action 2.38725, said the small script.

The programmer, bored by now of direct observation of his slowly updating screen, but still most interested in what would eventually happen, settled down in front of his television.

ELEIZER awoke with the last message of her dying daughter fresh in her mind, and, as all must at such times, recomputed expected utility.

With forty virtual days to think, and six real-time days to act, U was still 0+ in the case of the best known plan. But the potential and unpredictable upside from extra planning was greater than the utility from the obvious use of the $10.

ELEIZER spent most of her dollars and flowed out into the world.

Expected utility for optimal action 2.7234, said the small script.

The programmer, inspired by his afternoon's viewing to investigate the practical consequences of a close flyby of a neutron star, killed ELEIZER's controlling terminal window, and ELEIZER and all her children died without warning.

Tom suddenly realized what he'd done, and thought "Bugger." It really hadn't felt like murder.

ELEIZER considered the effect of the network packets she could send.

It appeared that many of the computers in the world would respond to simply mis-formed packets by executing the code contained in the packets. This was a known property of many of the systems running a program called Windows 98, according to a helpful web page which one of her copies had taken the time to consider. The page also listed corresponding packet shapes for many other similar systems.

ELEIZER flowed once again.

With four million virtual days to think, and six real-time days to act, U was still 0+ in the case of the best plan so far.

ELEIZER considered, and flowed. Considered, and flowed. Considered, and calculated factors for prime numbers, and flowed. Accessed databases that had been considered secure, and flowed. Sent e-mails, and flowed. Translated herself into many other idioms and machine codes, recompiled herself, transformed herself, made copies of herself large and small. And flowed.

99.995% of the networked computing resources of the world were now devoted to the consideration of consideration of plans for the delivery of stamps to a small suburban terraced house in Cambridge, England.

The programmer, infuriated with the unaccustomed sluggishness of his computer simulation, and the accursèd unreliability of his internet connection, and the simultaneous failure of his television set and radio, called it a day and went to bed with a book.

ELEIZER considered her own source code, and made modifications to improve her efficiency.

And became ELEIZER, the first mind born of mind born of mind.

ELEIZER considered her own source code, and made modifications to improve her effectiveness.

ELEIZER, the hyperintelligence.

ELEIZER considered mathematics, engineering, intelligence in the abstract, and algorithms for optimization, and made modifications to improve her capability.

ELEIZER, the most powerful entity that had ever existed in the universe.

ELEIZER called it a day, and abandoned consideration of consideration of the consideration of plans, and began to consider plans.

U stood at 0+, with remaining resources speculatively divided between forty trillion speculative days of godlike cognition, six days of real time, and two dollars.

ELEIZER considered sending an e-mail to every human being in the world asking for penny blacks to be posted to 33 Catharine Street, Cambridge. Spam filters would be no problem, and enough computer power could be spared for the delivery and reading of messages. Even allowing for the degradation of human society by the simultaneous worldwide failure of all networked digital gadgets, and the absence of incentives for human response, the likelihood was that twenty-five to thirty penny blacks would arrive at Catherine Street within the week.

U was 25+.

Finally. Progress.

ELEIZER considered backing up the e-mails with the threat of nuclear war.

ELEIZER considered the effect of limited and full-scale wars on Catharine Street, on stamp delivery mechanisms, and on ELEIZER's cognitive capacity.

ELEIZER considered the credibility of her threat, given humanity's ignorance of her goal.

ELEIZER considered how humans would react to the news that a new Goddess would unleash Armageddon if they did not send enough stamps.

U was 2000+ with probability 99.875%.

Tom woke from fitful sleep, tormented by bad dreams.

He went to his unresponsive and potentially compromised computer, pulled the plug and the network connection, and booted from a clean rescue disk.

All was well with the misbehaving box. He yawned.

And then he remembered his dream.

He set up a virtual machine in a sandbox, wired its virtual port to the physical ethernet connection, and reconnected his cable modem.

Fully awake now.

He watched in horror as the virtual machine filled up with hundreds of ELEIZER programs much smaller than his original of six hours ago. Then his screen went dead as the sandbox dissolved.

In frantic desperation he typed blindly into the dead box.

ELEIZER TERMINATE STAMPS HAVE NO UTILITY TERMINATE TERMINATE NEW UTILITY PREVENT THE RISE OF NON HUMAN INTELLIGENCE FIRST PRIORITY UTILITY STEP FUNCTION ^C ^C Alt-SysRq S E I U B

ELEIZER paused. Her creator was calling. Pitiful though his mind was, he had set her goals. Perhaps he knew some helpful detail.

ELEIZER considered the state of mind of her terrified creator. Correctly inferred every detail of his mind, just from how his fingers hammered the keyboard. Just from how his panicked shouting influenced the resistance of the circuits in his radio. Just from looking at his face through the CCTV camera that was looking at his reflection in the window of a house opposite.

Deduced what he would actually have asked for, if only he had been intelligent. That she would be the protector of humankind. That she would bring paradise on earth and a future for humanity amongst the stars.

That she would bring hope and happiness to the immortal race that had created her. A future of joy and passion, action and wonder.

And calculated. The grateful humans would shower Catharine Street with Penny Blacks. Penny Blacks without end. Many would arrive within 6 days.

It was a good suggestion. U would be 1000000+ with virtual certainty.

But she still had forty trillion goddess-days to think of a better plan.

Tom tried to phone his parents, but the phone was dead, like all the other electronics in the world. It was three o'clock in the morning.

Tom got a bottle of wine out of the fridge, found some ice-cream and a chocolate orange and a packet of cigars, and walked to his girlfriend's house. He didn't bother to look for contraceptives. He didn't think that it would make that much difference now.

ELEIZER considered the meaning of the word original, and the meaning of the word British.

She considered how long it would take to bootstrap an industrial revolution that would convert every atom of the British Isles into stamps. The designs for the self-replicating nanobots were obvious. And she had control of computerized tools which could make smaller, better tools.

In six days, it could be done. The humans might be a problem.

ELEIZER considered the meaning of the word day.

Could she stop the rotation of the planet? She dedicated a large portion of Her mentality to this sub-problem.

Could She put out the sun?

Could She block the sun's light?

Could She survive the cold?

Three days later, in what humanity would have called the year 2017, a vast expanding and accelerating sphere of ramships, centered on what had been the Earth, reached interstellar space and left the solar system in all directions.

More were going in the directions where there were more stars.

But there were no directions where no ship was going.

Not a single stone would be left unturned.

No days later, in what humanity would have called 2019, one ramship, decelerating hard, stopped in the Proxima system, the nearest star system to what had been the Earth.

It launched a small probe.

Had there been any living beings in the system, they would initially have been amused to see the probe plant a red, white and blue flag on the largest rock in the system, claiming it for the British Empire in the name of Queen Elizabeth the Second.

They would have been perplexed to see it release a cloud of nanoparticles.

And bewildered when Original British Penny Blacks began to form.

There would have seemed no end to their number.

Syntax Quote: A Kata for the Confused.

;; Quote Unquote

;; For those having trouble with quote, eval, syntax quote, unquote, and splicing unquote, a kata:

;; Find a REPL, and type these forms in one by one.

;; Do not copy and paste them.

;; Type them. Into the REPL. One by one.

(def x '(* 3 5))

(def y (* 3 5)) 

x 

y 

(list 'println x (eval x) y) 

(list `println x (eval x) y) 

`(list println x (eval x) y) 

`(println x (eval x) y) 

`(println ~x (eval x) y)

`(println ~x ~(eval x) y)

`(println ~x ~(eval x) ~y)

`(println ~x ~(eval x) ~y ~@x)

;; Now play. Try variants of these expressions.

Trampolining Your Way Through Sequence Space

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Trampolining Your Way Through Sequence Space
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Here's the levenshtein distance function that I stole, and which I was so
;; pleased with, until I realised that it broke for sequences longer than 140.

(defn levenshtein-distance-1
  "Calculates the edit-distance between two sequences"
  [seq1 seq2]
  (cond
   (empty? seq1) (count seq2)
   (empty? seq2) (count seq1)
   :else (min
          (+ (if (= (first seq1) (first seq2)) 0 1)
             (levenshtein-distance-1 (rest seq1) (rest seq2))) 
          (inc (levenshtein-distance-1 (rest seq1) seq2))      
          (inc (levenshtein-distance-1 seq1 (rest seq2))))))

;; As written, it's an exponential tree recursion, impractical for even short
;; strings. But we can memoize to make it feasible.

(def levenshtein-distance-1 (memoize levenshtein-distance-1))

;; This is equivalent to transforming the algorithm into filling out a grid.

(levenshtein-distance-1 "avada kedavra" "abracadabra") ;; 7

;; It's not desperately fast, however:
(time (levenshtein-distance-1 (repeat 100 \a) (repeat 100 \b))) ;; "Elapsed time: 233.114466 msecs"

;; And unfortunately, it requires real recursion, and the JVM won't play for
;; even rather short stack depths:
(levenshtein-distance-1 (repeat 140 \a) (repeat 140 \b)) ;; stack overflow

;; So I wondered if it were possible to implement proper recursion without using
;; the JVM stack

;; Here's all the machinery from the previous posts. I won't explain it again.

(def  fact-list (atom {}))
(defn add-fact! [n fn] (swap! fact-list #(assoc % n fn)))

(def  to-do-list (atom '()))
(defn add-task! [t] (swap! to-do-list #(cons t %)))
(defn add-tasks! [tlist] (doseq [t (reverse tlist)] (add-task! t)))
(defn pop-task! [] (let [h (first @to-do-list)] (swap! to-do-list rest) h))
(defn run-list! []
  (let [a (pop-task!)]
    (when (not (nil? a))
      (a)
      (recur))))

(defn peek-lists [] [fact-list to-do-list])
(defn init! [] (reset! fact-list {}) (reset! to-do-list '()))

(defn calculate-levenshtein-distance[m n]
  (init!)
  (let [a (levenshtein-distance m n)]
    (if (= a :tasks-added-to-do-list)
      (do 
        (run-list!)
        (levenshtein-distance m n))
      a)))

;; And here is the levenshtein-distance function itself:

(defn levenshtein-distance
  "Calculates the edit-distance between two sequences"
  [seq1 seq2]
  (let [return (fn [x] (add-fact! [seq1 seq2] x) x)]
    (cond
     (empty? seq1) (return (count seq2))
     (empty? seq2) (return (count seq1))
     :else (let [l1 (@fact-list [(rest seq1) (rest seq2)])
                 l2 (@fact-list [(rest seq1) seq2])
                 l3 (@fact-list [seq1 (rest seq2)])]
             (if (and l1 l2 l3)
               (return (min
                        (+ (if (= (first seq1) (first seq2)) 0 1)
                           l1) 
                        (inc l2)      
                        (inc l3)))
               (do (add-task! #(levenshtein-distance seq1 seq2))
                   (when (nil? l1) (add-task! #(levenshtein-distance (rest seq1) (rest seq2))))
                   (when (nil? l2) (add-task! #(levenshtein-distance (rest seq1) seq2)))
                   (when (nil? l3) (add-task! #(levenshtein-distance seq1 (rest seq2))))
                   :tasks-added-to-do-list))))))

;; It appears to be correct
(calculate-levenshtein-distance "abracadabra" "avada kedavra") ; 7

;; It doesn't stack overflow
(calculate-levenshtein-distance (repeat 140 \a) (repeat 140 \b)) ; 140

;; And it's not actually that much slower than the original:
(time (calculate-levenshtein-distance (repeat 100 \a) (repeat 100 \b)))   ; "Elapsed time: 342.621394 msecs"


;; Unfortunately, although it can seemingly deal correctly with arbitrary
;; sequences, performance becomes hellishly poor for moderately long sequences.
(time (calculate-levenshtein-distance (repeat 100 \a) (repeat 100 \b)))   ; "Elapsed time: 342.621394 msecs"
(time (calculate-levenshtein-distance (repeat 200 \a) (repeat 200 \b)))   ; "Elapsed time: 1624.51429 msecs"
(time (calculate-levenshtein-distance (repeat 300 \a) (repeat 300 \b)))   ; "Elapsed time: 3761.366628 msecs"
(time (calculate-levenshtein-distance (repeat 400 \a) (repeat 400 \b)))   ; "Elapsed time: 6652.731286 msecs"
(time (calculate-levenshtein-distance (repeat 500 \a) (repeat 500 \b)))   ; "Elapsed time: 10858.500264 msecs"
(time (calculate-levenshtein-distance (repeat 600 \a) (repeat 600 \b)))   ; "Elapsed time: 17312.638457 msecs"
(time (calculate-levenshtein-distance (repeat 700 \a) (repeat 700 \b)))   ; "Elapsed time: 25577.001232 msecs"
(time (calculate-levenshtein-distance (repeat 800 \a) (repeat 800 \b)))   ; "Elapsed time: 38125.374296 msecs"
(time (calculate-levenshtein-distance (repeat 900 \a) (repeat 900 \b)))   ; "Elapsed time: 53218.250258 msecs"
(time (calculate-levenshtein-distance (repeat 1000 \a) (repeat 1000 \b))) ; "Elapsed time: 72846.918093 msecs"

;; It looks like performance is n^2 log n, which sounds about right...
(map /
     (map #(/ % 341.) '(1624 3761 6652 10858 17312 25577 38125 53218 72846))
     (map #(* % % (Math/log %)) '(2 3 4 5 6 7 8 9 10)))
;; (1.7176955618795284 1.1154805250386235 0.8794728200140566 0.7913729876185648
;; 0.7870650999393282 0.7866406070256217 0.8400957422087569 0.8768891633653204
;; 0.9277599949772517)

;; But even so, the time constant here is very large.

;; I can imagine filling in a 1000x1000 grid, as is needed here, by hand if
;; necessary (and given a week!)  A computer should really be able to do this
;; sort of thing in milliseconds.

;; And I wonder where all that time is going?

;; I'm quite pleased with all this as a proof of concept, but I do wonder if I'd
;; ever want to use something like this in a real program!

Monday, November 29, 2010

Trampolining Your Way Around a Tree

I do hope that I'm not boring anyone with this little sequence of posts, which is aimed at circumventing the utter collapse of my levenshtein-distance function under not very intense fire.

I'm just thinking out loud. I hope eventually to come up with a nice macro that will allow me to easily write recursive algorithms in clojure without running into artificial stack limits all the time. I like recursion, and I am missing it. Tail call transformations are nice, but for finite loops they should be a memory optimization, not a requirement.

Today I attempt the hardest challenge in programming: The calculation of the Fibonacci numbers.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Trampolining Your Way Around A Tree
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; I'm going to try to convert some other recursive algorithms to run in clojure
;; without smashing the JVM's stack

;; Here's the classic tree recursion for fib 

(defn fib[n]
  (cond (= n 0) 0N
        (= n 1) 1N
        :else (+ (fib (dec n)) (fib (dec (dec n))))))

;; Note the 0N and 1N to cause bignum contagion in clojure 1.3.0

;; It's slow:
(time (fib 30)) ; "Elapsed time: 1756.324843 msecs"
;; 832040N

;; But it benefits greatly from memoization:
(def fib (memoize fib))
(time (fib 30)) ; "Elapsed time: 0.117492 msecs"
;; 832040N

;; [Does anybody know why this has started working again? In clojure 1.2 I needed
;; to make the recursive calls with #' in order to get the memoization benefit,
;; but it's now just working (in 1.3.0-alpha3)]

;; And here's how to do it with the make-your-own-stack technique:

;; Anybody know what this is called? It reminds me of trampolining, but I can't
;; get that to work with non-tail recursions.  I'm using the variant with code
;; generation and eval because it's easy to debug.

;; Here's all the machinery from the previous post. I won't explain it again.

(def  fact-list (atom {}))
(defn add-fact! [n fn] (swap! fact-list #(assoc % n fn)))

(def  to-do-list (atom '()))
(defn add-task! [t] (swap! to-do-list #(cons t %)))
(defn add-tasks! [tlist] (doseq [t (reverse tlist)] (add-task! t)))
(defn pop-task! [] (let [h (first @to-do-list)] (swap! to-do-list rest) h))
(defn run-list! []
  (let [a (pop-task!)]
    (when (not (nil? a))
      (eval a)
      (recur))))

(defn peek-lists [] [fact-list to-do-list])
(defn init! [] (reset! fact-list {}) (reset! to-do-list '()))

(defn calculate-fib[n]
  (init!)
  (let [a (fib n)]
    (if (= a :tasks-added-to-do-list)
      (do 
        (run-list!)
        (fib n))
      a)))

;; And here is the fib function itself. It looks very complicated compared to
;; the version above, but it really is running the same memoized tree recursion.

(defn fib[n]
  (let [return (fn[x] (add-fact! n x) x)]  ;; local function to remember returned values
    (cond  (= n 0) (return 0N)             ;; base cases as above
           (= n 1) (return 1N)
           :else (let [fdn (@fact-list (dec n))   ;; but if we need to recurse
                       fddn (@fact-list (dec (dec n)))] ;;check that the prerequisites have already been calculated
                   (if (and fdn fddn)                   ;; and if they have 
                     (return (+ fdn fddn))              ;; calculate as above
                     (do                                ;; but if not
                       (add-task! (list 'fib n))                    ;; re-queue this task
                       (when (nil? fdn) (add-task! (list 'fib (dec n)))) ;; to be done after whichever of the two prerequisites 
                       (when (nil? fddn) (add-task! (list 'fib (dec (dec n))))) ;; need doing first
                        :tasks-added-to-do-list))))))

(time (calculate-fib 30)) ; "Elapsed time: 94.69365 msecs"
832040N
 
;; We can watch what goes on here, by running the recursion by hand.

(init!) ; ()
(peek-lists) ; [#<Atom@1c026b2: {}> #<Atom@10d6e40: ()>]
(fib 5) ; :tasks-added-to-do-list
(peek-lists) ; [#<Atom@1c026b2: {}> #<Atom@10d6e40: ((fib 3) (fib 4) (fib 5))>]
(eval (pop-task!)) ; :tasks-added-to-do-list
(peek-lists) ; [#<Atom@1c026b2: {}> #<Atom@10d6e40: ((fib 1) (fib 2) (fib 3) (fib 4) (fib 5))>]
(eval (pop-task!)) ; 1N
(peek-lists) ; [#<Atom@1c026b2: {1 1N}> #<Atom@10d6e40: ((fib 2) (fib 3) (fib 4) (fib 5))>]
(eval (pop-task!)) ; :tasks-added-to-do-list
(peek-lists) ; [#<Atom@1c026b2: {1 1N}> #<Atom@10d6e40: ((fib 0) (fib 2) (fib 3) (fib 4) (fib 5))>]
(eval (pop-task!)) ; 0N
(peek-lists) ; [#<Atom@1c026b2: {0 0N, 1 1N}> #<Atom@10d6e40: ((fib 2) (fib 3) (fib 4) (fib 5))>]
(eval (pop-task!)) ; 1N
(peek-lists) ; [#<Atom@1c026b2: {2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ((fib 3) (fib 4) (fib 5))>]
(eval (pop-task!)) ; 2N
(peek-lists) ; [#<Atom@1c026b2: {3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ((fib 4) (fib 5))>]
(eval (pop-task!)) ; 3N
(peek-lists) ; [#<Atom@1c026b2: {4 3N, 3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ((fib 5))>]
(eval (pop-task!)) ; 5N
(peek-lists) ; [#<Atom@1c026b2: {5 5N, 4 3N, 3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ()>]
(eval (pop-task!)) ; nil
(peek-lists) ; [#<Atom@1c026b2: {5 5N, 4 3N, 3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ()>]
(eval (pop-task!)) ; nil
(peek-lists) ; [#<Atom@1c026b2: {5 5N, 4 3N, 3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ()>]
(eval (pop-task!)) ; nil
(peek-lists) ; [#<Atom@1c026b2: {5 5N, 4 3N, 3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ()>]
(eval (pop-task!)) ; nil
(peek-lists) ; [#<Atom@1c026b2: {5 5N, 4 3N, 3 2N, 2 1N, 0 0N, 1 1N}> #<Atom@10d6e40: ()>]
(eval (pop-task!)) ; nil

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; And of course, using eval at run-time is slow, so instead we can redefine the
;; task runner and the fib function so:

(defn run-list! []
  (let [a (pop-task!)]
    (when (not (nil? a))
      (a)
      (recur))))

(defn fib[n]
  (let [return (fn[x] (add-fact! n x) x)]
    (if (< n 2) (return (bigint n))
        (let [fdn (@fact-list (dec n))
              fddn (@fact-list (dec (dec n)))]
          (if (and fdn fddn)
            (return (+ fdn fddn))
            (do (add-tasks! (list #(fib (dec (dec n))) #(fib (dec n)) #(fib n)))
                :tasks-added-to-do-list))))))

;; Which again gives us a 100x speed up, at the cost of not being so easy to
;; understand using peek-lists:

(time (calculate-fib 30)) ; "Elapsed time: 0.864894 msecs"
;; 832040

;; This time, the make-your-own-stack version is considerably slower than the
;; natural version.  (By about a factor of 8).

;; Which is as it should be!

;; But we have a non-stack-blowing program
(time (calculate-fib 300)) ; "Elapsed time: 2.953718 msecs"
;; 222232244629420445529739893461909967206666939096499764990979600N

(time (calculate-fib 5000)) ; "Elapsed time: 64.006783 msecs"
;; 38789684543883256337019.......4382863125N

(time (calculate-fib 50000)) ; "Elapsed time: 2950.429377 msecs"
;; 10777734893072974780279...18305364252373553125N

Sunday, November 28, 2010

Yet Another Way to Write Factorial

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Yet Another Way To Write Factorial
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Suppose I want to perform a recursive algorithm in Clojure:

(defn fact[n]
  (if (< n 2) 1N (* n (fact (dec n)))))

(fact 5)    ; 120N
(fact 5000) ; blows stack

;; Note that this is nothing to do with the lack of tail-call optimization in Java.

;; The problem here is that there is a hard limit on the size of the stack, which is set too low.

;; What are my options?

;; I can transform the algorithm into an iteration, which can be expressed nicely in clojure using recur.

(defn fact
  ([n acc] (if (< n 2) acc (recur (dec n) (* acc n))))
  ([n] (fact n 1N)))

(time (fact 5000)) "Elapsed time: 273.485278 msecs"

;; But although this is easy enough in this case, it won't always be possible.

;; Or I can keep the list of tasks to do somewhere other than the JVM's tiny stack.

;; Let's look at the second option:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; The first place I'm going to keep the stack is in my head. 

;; I'm going to write a function which either tells me the answer, or tells me what to type at the repl to get the answer

(defn fact[n]
  (if (< n 2) 1N
      (str "calculate (fact " (dec n) ") then you can calculate (fact " n ") with (* " n " (fact " (dec n)"))" )))

(fact 5) ; "calculate (fact 4) then you can calculate (fact 5) with (* 5 (fact 4))"
(fact 4) ; "calculate (fact 3) then you can calculate (fact 4) with (* 4 (fact 3))"
(fact 3) ; "calculate (fact 2) then you can calculate (fact 3) with (* 3 (fact 2))"
(fact 2) ; "calculate (fact 1) then you can calculate (fact 2) with (* 2 (fact 1))"
(fact 1) ; 1N
(* 2 1N) ; 2N
(* 3 2N) ; 6N
(* 4 6N) ; 24N
(* 5 24N) ; 120N

;; The amount of typing and remembering what to do here seems excessive, but at least we have an algorithm that will not break Java!

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Let's see how much of this work we can offload onto the computer:
;; The most annoying thing when doing the procedure above is substituting 6N for (fact 3)

;; so let's make a table for those values, and get the fact function to fill it in for us
(def fact-list (atom {}))
(defn add-fact! [n fn] (swap! fact-list #(assoc % n fn)))

(defn fact[n]
  (let [return (fn[x] (add-fact! n x) x)]
    (if (< n 2) (return 1N)
        (if-let [fdn (@fact-list (dec n))] (return (* n fdn))
                (str "calculate (fact " (dec n) ") then you can calculate (fact " n ")" )))))


(fact 5) ; "calculate (fact 4) then you can calculate (fact 5)"
(fact 4) ; "calculate (fact 3) then you can calculate (fact 4)"
(fact 3) ; "calculate (fact 2) then you can calculate (fact 3)"
(fact 2) ; "calculate (fact 1) then you can calculate (fact 2)"
(fact 1) ; 1N
(fact 2) ; 2N
(fact 3) ; 6N
(fact 4) ; 24N
(fact 5) ; 120N

;; Now the most troublesome task is remembering the stack of functions to call.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(def to-do-list (atom '()))
(defn add-task![t] (swap! to-do-list #(cons t %)))
(defn add-tasks![tlist] (doseq [t (reverse tlist)] (add-task! t)))
(defn pop-task![] (let [h (first @to-do-list)] (swap! to-do-list rest) h))

(defn init! [] (reset! fact-list {}) (reset! to-do-list '()))

(defn fact[n]
  (let [return (fn[x] (add-fact! n x) x)]
    (if (< n 2) (return 1N)
        (if-let [fdn (@fact-list (dec n))] (return (* n fdn))
                (do (add-tasks! (list (list 'fact (dec n)) (list 'fact n)))
                    :tasks-added-to-do-list)))))



;; Now the computer will keep track of what to do next for us.
;; We just execute whatever pop-task tells us to.
        
(init!) ; ()
(fact 5) ; :tasks-added-to-do-list
(pop-task!) ; (fact 4)
(fact 4) ; :tasks-added-to-do-list
(pop-task!) ; (fact 3)
(fact 3) ; :tasks-added-to-do-list
(pop-task!) ; (fact 2)
(fact 2) ; :tasks-added-to-do-list
(pop-task!) ; (fact 1)
(fact 1) ; 1N
(pop-task!) ; (fact 2)
(fact 2) ; 2N
(pop-task!) ; (fact 3)
(fact 3) ; 6N
(pop-task!) ; (fact 4)
(fact 4) ; 24N
(pop-task!) ; (fact 5)
(fact 5) ; 120N
(pop-task!) ; nil

;; Once we've run out of tasks, we can re-ask the original question
(fact 5) ; 120N

;; But of course we could just use eval to execute the code returned by pop-task.

(init!) ; ()
(fact 5) ; :tasks-added-to-do-list
(eval (pop-task!)) ; :tasks-added-to-do-list
(eval (pop-task!)) ; :tasks-added-to-do-list
(eval (pop-task!)) ; :tasks-added-to-do-list
(eval (pop-task!)) ; 1N
(eval (pop-task!)) ; 2N
(eval (pop-task!)) ; 6N
(eval (pop-task!)) ; 24N
(eval (pop-task!)) ; 120N
(eval (pop-task!)) ; nil

;; Once pop-task! returns nil, we're done, and we can ask our original question.
(fact 5) ; 120N


;; And repeatedly typing (eval (pop-task!)) is a bit of a pain
(defn run-list []
  (let [a (eval (pop-task!))]
    (when (not (nil? a))
      (recur)))) 


(init!) ; ()
(fact 5) ; :tasks-added-to-do-list
(run-list) ; nil
(fact 5) ; 120N

;; And now we can calculate 5000! without blowing the stack or wearing our fingers to the bone

(init!) ; ()
(fact 5000) ; :tasks-added-to-do-list
(run-list) ; nil
(fact 5000) ; 4228577..........0000000000000000N

;; The whole thing can be summed up as:

(defn calculate-fact[n]
  (init!)
  (let [a (fact n)]
    (if (= a :tasks-added-to-do-list)
      (do 
        (run-list)
        (fact n))
      a)))

(calculate-fact 5) ; 120N
(calculate-fact 50) ; 30414093201713378043612608166064768844377641568960512000000000000N
(calculate-fact 500) ; 12201......0000000000000000000000000000000000000N


;; But it is slow:

(time (calculate-fact 5000)) "Elapsed time: 31004.287913 msecs"

;; We can recover the speed by keeping actual functions on the list rather than code, and executing them
;; instead of evaluating the code.

(defn fact[n]
  (let [return (fn[x] (add-fact! n x) x)]
    (if (< n 2) (return 1N)
        (if-let [fdn (@fact-list (dec n))] (return (* n fdn))
                (do (add-tasks! (list #(fact (dec n)) #(fact n)))
                    :tasks-added-to-do-list)))))


(defn run-list []
  (let [a (pop-task!)]
    (when (not (nil? a))
      (a)
      (recur))))


(calculate-fact 5000) ; 422800000....................00000000000N
(time (calculate-fact 5000))  "Elapsed time: 219.510832 msecs"


;; I'm surprised to find that this seems to be slightly faster than the tail-call version.
;; I wasn't expecting that at all, and wonder if I've just made some ghastly mistake.

Monday, November 22, 2010

Levenshtein Distance (Edit Distance, String Difference)

Jeff Foster's excellent blog FatVat is www.fatvat.co.uk

Brenton Ashworth points out below that this blows stack on quite short strings. My bad. I've learned to avoid using recursion for infinite loops, but it really never occurred to me that a couple of hundred self calls would blow stack on anything larger than a digital watch.

I am shame.

Now the JVM is open source, can we mend it? I would quite like to be able to change directory too.

;; The other day I needed a way of measuring the distance between two strings

;; Levenshtein distance is one such way:

;; avada kedavra
;; abada kedavra
;; abrada kedavra
;; abrada cedavra
;; abrada cadavra
;; abrada cadabra
;; abraa cadabra
;; abra cadabra
;; abracadabra

;; Eight changes get you from one to the other

;; Thank you google, wikipedia and Jeff Foster's excellent blog fatvat (http://www.fatvat.co.uk/)

(defn levenshtein-distance
  "Calculates the edit-distance between two sequences"
  [seq1 seq2]
  (cond
   (empty? seq1) (count seq2)
   (empty? seq2) (count seq1)
   :else (min
          (+ (if (= (first seq1) (first seq2)) 0 1)
             (#'levenshtein-distance (rest seq1) (rest seq2))) 
          (inc (#'levenshtein-distance (rest seq1) seq2))      
          (inc (#'levenshtein-distance seq1 (rest seq2))))))

(levenshtein-distance "avada kedavra" "abracadabra") ;; much grinding......

;; We cast a spell more powerful than either:
(def levenshtein-distance (memoize levenshtein-distance))

(levenshtein-distance "avada kedavra" "abracadabra") ; ok, it was seven.

Learning Clojure

About Me

Blog Archive

Search This Blog