Thursday, September 24, 2009

The Voynich Manuscript

The Voynich manuscript is a strange book, written in a strange script, with strange illustrations.



Yesterday I attended a talk on the Voynich Manuscript (VM), at MIT, by Kevin Knight of USC's Information Sciences Institute. Here's a brief summary of his talk:


The manuscript consists of 235 pages on vellum, with color drawings of plants, nymphs, stars, etc. It contains about 30,000
words written in an unknown script, and is owned by Yale University.

It has a character set that has not been observed in any other document. It is broken up into sections called "herbal",
"astrological", "biological", "cosmological", "pharmacological", and a pure text section at the end. These names reflect the pictures in each section. For example, the "herbal" section contains pictures of unknown plants being grafted onto other plants. The "biological" section depicts small nudes in baths with interconnecting tubes of liquids. The "pharmacological" section shows something that has been interpreted as a medicine jar.

A cover letter of Joannes Marcus Marci of Cronland was found tucked in the manuscript. The letter claims that the book once belonged to Emperor Rudolf II and that Rudolf beliefed that Roger Bacon was the author.

There have been many attempts to decipher the book. One was made by William Newbold at the University of Pennsylvania, He claimed that each letter consisted of many other Greek letters, which were anagrams holding the real meaning of the manuscript, and "deciphered" it on this basis. His decipherment is now regarded as completely bogus.

Athanasius Kircher once owned the book, from 1665-1680.

The Voynich script consists of between 23 and 40 distinct characters. (It is hard to say for sure, since some characters appear to be compounds of others.) There are no signs of corrections, which suggests that the manuscript was copied from some other source. There is an unusual distribution of word lengths - most "words" are of lengths 3, 4, and 5 letters. Many words are doubled, and some are tripled.

The cryptographer William Friedman worked on the manuscript during World War II. There are many claimed decipherments. A 2004 Scientific American article by Gordon Rugg, however, suggests that the manuscript is just gibberish. Perhaps Voynich faked it himself.

Kevin Knight discussed some of his own attacks on the manuscript using clustering techniques. For example, if you try breaking up the English alphabet into two types, say a and b, and use expectation maximization to generate two clusters, you get AEIOUy as one cluster, and the consonants in another. Doing the same for the Voynich manuscript, however, doesn't generate anything particularly meaningful.

You could also try this kind of clustering with the words of the manuscript instead of the letters. When you do so, you get two clusters: the words in the "herbal", "astrological", and "pharmacological" sections predominantly fall into one cluster, and the words in the "biological" and "cosmological" sections predominantly fall into another. [To me, this suggests that the manuscript probably had at least two authors.]

Voynich "B" is the "biological" + "astrological" sections. You can then try to divide the words in this section into more classes. If you do this for English, you get a cluster with words like "my, a, an, the,..."; another with "and, but, next,...", another with "had, asked, could, have, are, is, would,...", another with "for, at, in, no, that, be, but,..." etc. If you do this for Voynich you also get clusters but the meaning is less clear.


My guess is that the manuscript is some form of hoax, but I'd be delighted to be proved wrong.

8 comments:

Anonymous said...

How did he pick the number of clusters? Did he compare his results on English text to other languages to get some sense of the variance in the number of phoneme and/or word-type clusters that natural languages have? I doubt Japanese would give any nice phoneme clusters, for instance.

If he didn't do that, it's hard to see those results as anything more than fishing expeditions.

Joshua said...

I've always been very skeptical of the hoax hypothesis. It simply looks like too much work went into the document for it to be a hoax.

My favorite (and the silliest) explanation is that it is an encrypted text of the Necronomicon. When I was an undergrad at Yale, I used to use this as an excuse to brag that I went to a university that had a copy of the Necronomicon. On a related note, I've met people who've only heard of the Voynich Manuscript in the Lovecraft context and didn't realize that the Voynich Manuscript was a real document.

Ringo said...

"I've always been very skeptical of the hoax hypothesis. It simply looks like too much work went into the document for it to be a hoax."

You'd be surprised.

Has the manuscript been carbon dated? Getting a definitive age might rule out some possibilities.

Joshua said...

Ringo, well a related issue is that the letter clustering is not pseudorandom in the way one would expect for a hoax. Humans are very bad at making up data that looks random. For example, if you ask someone to flip a coin a hundred times or instead to make up a hundred coin flips, it isn't hard looking at the resulting data to tell which they did. But the Voynich Manuscript doesn't seem to have the standard problems of non-random human made text. So the symbol generation was likely a fairly sophisticated process. That means that if it is a hoax, someone was going out of their way to deal with a sort of issue that was likely not known at the time it was made and certainly was not widely known.

Anonymous said...

There is always the possibility that someone came up with a flawed cryptographic system. I'm thinking of a "one way" system, or a "many to one" system, which can generate output, but which cannot be reversed to recover unambiguous source. I can imagine a single-minded person spending much effort on this.

TomS

Takis Konstantopoulos said...

At first sight, this looks similar to the Byzantine "mikrogrammati grafi", i.e. minuscule font (which is hard to read: for example, both $\mu$ and $\nu$ are written as $\mu$); see, e.g., a page from the earliest surviving copy of Euclid's Elements. But a closer look shows that this script is fully unreadable.

Alex said...

"When you do so, you get two clusters: the words in the "herbal", "astrological", and "pharmacological" sections predominantly fall into one cluster, and the words in the "biological" and "cosmological" sections predominantly fall into another. [To me, this suggests that the manuscript probably had at least two authors.] "

Could one also argue for the single-author theory just as easily?

Anonymous said...

I just saw a brief programme about this manuscript and I find it is just so bizare and interesting. I trully hope someone can crack it, if there is any meaning to it! and yes, it has been carbon dated now. 500 years old!