A Brief History of Markov Chains

One of the most common simple techniques for generating text is a Markov chain. The algorithm takes an input text or texts, divides it into tokens (usually letters or words), and generates new text based on the statistics of short sequences of those tokens. If you're interested in the technical details, Allison Parrish has a good tutorial on implementing Markov chains in Python.

This technique has been around for a very long time, but it's slightly unclear when it was first developed. The following is a slightly sketchy work-in-progress timeline of what I've been able to work out.

My goal here is to work out when was the first person to:

implement Markov chains on a computer
with probabilities derived from a source text
to generate new text
for creative purposes

Timeline

(where available, links are to English translations of sources)

1907-1913: Andrei Markov develops the mathematical theory of Markov chains, using as an example an analysis of the patterns of vowels and consonants in Pushkin's poem Eugene Onegin. (link)

1948: Claude Shannon publishes A Mathematical Theory of Communication, laying the foundations of information theory. As an example, he manually generates samples of text ("the series of approximations to English") using a Markov chain process ("Markoff process"), based on an unknown source text ("a book"). (link)

1957: Lejaren Hiller and Leonard Isaacson compose the Illiac Suite, a composition for string quartet, using the ILLIAC I computer. Their "Experiment Four" includes music generated using Markov chain techniques with manually specified transition probabilities. (link)

1959: Theo Lutz produces a set of poems using a ZUSE Z 22 mainframe to combine vocabulary from pre-determined lists (taken from Kafka's The Castle). This is arguably the first computer-generated poetry. Lutz suggests that this could be augmented by using transition probabilities to ensure relationships between certain subject/object combinations, but does not actually implement this. (link)

1960: By this point Max Bense (Theo Lutz's supervisor) is writing about using Shannon's ideas and Markov chain techniques to analyse texts. However, he is a theorist and reportedly never uses a computer. (link)

1961: Nanni Balestrini composes his poem Tape Mark I, using an IBM 7070 to assemble fragments of three other poems. This is similar to a Markov chain in that it involves generating a stream of text from recombinations of source material. However, the transitions are deterministic (albeit dependent on a random ordering of the source texts), and the possible transitions are specified by hand - deliberately avoiding recapitulating the source material. (link interview with Balestrini)

1963: François Le Lionnais, in Lipo: First Manifesto, makes an offhand remark about how the cento might be "reinvigorated...by a few considerations taken from Markov's chain theory." I believe this is in reference to Raymond Queneau's work analysing texts based on their transition probabilities, rather than to generating text with Markov chains - the Oulipo were generally opposed to randomness in text generation. (link)

1963: Lejaren Hiller and Robert Baker release Computer Cantata, a musical composition generated using Markov chains with probabilities derived from Charles Ives' Three Places In New England. Notably they also used Markov chain techniques on sequences of phonemes to generate "texts" which were sung as part of the performance. (link) (recording)

1963: Hiroshi Kawano uses Markov chain techniques with manually specified transition probabilities to generate visual art. (link)

1968: Kawano writes "This Shannon's method by Markov-process has been applied to the field of literature and music and many computer-aided works of art have been produced with this method". It's unclear to me if he is referring to his own work, or to some other work I'm unaware of. In the same piece he gives an example of text generated by a Markov process, lifted directly from Shannon. (link)

1972: Bill Gosper, in HAKMEM (item 176), describes "processing a character string by taking the last 3 letters typed out, searching for a random occurrence of that sequence in the text, taking the letter following that occurrence, typing it out, and iterating". This is a semi-automated form of the Markov chain algorithm (although not described as such), but no output is given. (link)

1976: William Bennett publishes a textbook Scientific and Engineering Problem-Solving with the Computer. In it, he describes a technique for generating text, identical to the Markov chain approach, but he does not use this terminology, nor does he reference Shannon or Markov. This is the earliest clear example of Markov chain text generation on a computer that I can find. (link)

1983: Brian Hayes, in his Computer Recreations column in Scientific American, rehashes Bennett's work and brings it to a wider audience. (link)

1984: Hugh Kenner and Joseph O'Rourke, in Byte Magazine, publish the source code for Travesty, an implementation of the Markov chain technique. They refer back to Hayes and Shannon, but not explicitly to Markov. (link)

1984: Rob Pike and Bruce Ellis (with inspiration from Don Mitchell) use a Markov chain program to create the character of Mark V. Shaney, posting to Usenet. (wikipedia)

1995: Hugh Kenner and Charles Hartman use Travesty to generate a volume of poetry, Sentences. This may be the first published poetry generated using Markov chains.

So it seems that the possibility of using Markov chain text generation for creative work was there from the early 1960s. By 1963, Hiller and Kawano were using Markov chains to generate music and images, and Bense and Queneau were using the same ideas to analyse texts. The process for generating text had been laid out by Shannon in an extremely famous paper in 1948.

And yet, it's 1976 before I can find evidence that anyone had actually generated any text using a Markov chain computer program (not counting Hiller and Baker's 1963 phonetic experiments). When it does happen, as best I can tell it's an independent invention by a physicist/engineer writing a computer textbook. Even then, it doesn't get picked up and used until almost a decade later.

Am I missing something here?

If you've enjoyed this piece, please consider contributing on Patreon so I can do more things like this.