Sunday, February 26, 2012

Yet Another Creationist Misunderstands Information Theory

It's always funny to see a creationist try to use information theory, because they almost always get it wrong. Here we have Joseph Esfandiar Hannon Bozorgmehr, who posts under the name "Atheistoclast", demonstrating his ignorance:


"Matzke misunderstands what is meant by "new information".

He apparently thinks that new genes, produced by duplication, represent novel information. But if you copy one gene 1000 times over, the information content remains the same even though you have created many more genes.



Poor Bozorgmehr needs to sit in on my course CS 462 at the University of Waterloo, where we will shortly discuss this very issue. Then he can prove the following theorem:

Theorem: If K denotes Kolmogorov information, then K(xn) - K(x) is unbounded as n tends to infinity.

This would be regarded as a relatively simple exercise in my course.

27 comments:

Luke Barnes said...

"Welcome to Jeopardy, Prof. Shallit. Why don't you tell us two things about yourself?"

"Well, I have a beard. Also, I have a beard."

There must be some sense of the word "information" in which another copy of the same sentence does not add information. It may not be true of Kologorov information, but it seems true of what most people (even most scientists) think of information.

Is there a way to reconcile these two different ideas? Has Kolmogorov's definition failed to correctly formalise our intuitive concept of information? (GASP!)

The other question, of course, is which should apply to biological systems.

(Incidentally, is the reason (roughly) why K(x^n) - K(x) is unbounded because the program must specify n? We can make the program as long as we like by making n huge.)

Jeffrey Shallit said...

Luke, I had more respect for you before this reply.

If Kolmogorov complexity does not capture your notion of information, then it's up to you to provide a rigorous definition for discussion and debate. So far what creationists do is rely on our informal notions, which are vague and inconsistent.

Even in the example you give, it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once. For example, suppose there's a confederate listening on the television, and we've pre-arranged an action depending on whether I say it once or twice. "One if by land, two if by sea".

As for your final question, no, the program need not necessarily "specify n", whatever that means. But you are right that at least most of the additional information can ultimately be attributed to the number of copies.

Luke Barnes said...

"One if by land, two if by sea". Good example.

"As for your final question, no, the program need not necessarily "specify n", whatever that means." I think rather concretely about this, so being a fortran programmer I had in mind:

integer :: n
n = 100000
do i = 1,n
write(*,*) ... call (shortest) function that prints x ...
end do

The second line can be made arbitrarily long by making n arbitrarily large (ignoring such practicalities the limits on the integer kind). So I can place an arbitrarily large amount of information into n itself.

Obviously, that's not how a mathematician would argue - I really should get around to reading Li and Vitanyi's textbook. Their article (with Kirchherr) you linked to a while back ("miraculous universal distribution") was excellent, though as a scientist I am a tad worried that the complexity of a string is non-computable. It doesn't seem like there will be a simple "numerical recipe" for hypothesis testing with Kolmogorov, and yet Kolmogorov's thesis says that his definition is provably better than yours.

Valhar2000 said...

Has Kolmogorov's definition failed to correctly formalise our intuitive concept of information?

Yes, well, that's the thing, isn't it? Kolmogorov's theory is clearly defined and usable, whereas our "intuitive understanding" is vague, contradictory and inconsistent between different persons.

Thus, if we use Kolmogorov, we can check each other's work and see if mistakes have been made, whereas if we use "intuitive understanding" all we are left with is seeing who can shout the loudest.

Face it, buddy, intuition fails, at least when it comes to this.

Jeffrey Shallit said...

"As for your final question, no, the program need not necessarily "specify n", whatever that means." I think rather concretely about this, so being a fortran programmer I had in mind:


Well, see, that's the point. You demonstrate a fundamental misunderstanding here: producing a single program to print x^n says nothing at all about the Kolmogorov complexity of x^n. You are confusing upper bounds with lower bounds - a common mistake among beginning students.

The correct argument goes the other way. Given a program to produce x^n, we can deduce from that a program to produce n. Since we know that infinitely many n are incompressible, this gives the result.

Tim Kenyon said...

Even in the example you give, it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once.

Yeah, yeah.

John Stockwell said...

You could think of it in terms of a physical problem. If you had an string of alternating black and white beads. It would take a certain amount of energy to make the initial string.

Suppose that you then used that string as a template for making 1 copy. Even if copies were cheaper to make than the original, they would cost something in energy,
it would take n times as much energy to make n copies as it would to make one copy.

Of course, that is a handwavey example and not a real math proof.

Curt Cameron said...

I've subscribe to one of those brain-dead Christian podcasts, and for the last couple of episodes they've been going in detail over the "book" Me, The Professor, Fuzzy, and The Meaning of Life, which starts with basic principles and build up to a "proof" of the Christian God.

The syllogism basically goes like this:

1. The universe had a beginning.
2. Every event has to have a cause.
3. Entropy is always increasing.
4. There are two exceptions to entropy increasing: life and intelligence.
5. Entropy is the same as disorder. Complex things are ordered, therefore complexity is always decreasing.
6. Except when there's intelligence involved.
7. Since we see complex things in the world today, and complexity is always decreasing, then back at the beginning of the universe, it had to be much more complex.
8. Since the only way to get complexity is with intelligence, there must have been a great intelligence who put all that complexity into the universe at the beginning.

9. Therefore the cause of the universe had to be very intelligent, which is another way of saying God.

(He goes on later to show how it must specifically be the Christian version of God)

I see three main problems with this. The first one I'm good with, but Jeffrey, can you comment on the second and third?

Premise #2, that every event has to have a cause, I know to be false from my knowledge of quantum physics (I was an EE major in college back 30 years ago, so I had a fair amount of QM and some information theory).

If you relate complexity and entropy, they're not inverses of each other - complexity is an equivalent concept to entropy, isn't it? Can't you basically get Boltzman's constant by taking the log2 of the probability of something or other?

And finally, does the 2nd Law of Thermodynamics apply to the information kind of entropy? If we say that complexity is kinda like entropy, does it follow that complexity must be increasing?


Thanks for your thoughts.

Miranda said...

"Even in the example you give, it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once."

Has the potential, yes. In biology, too, it has the potential. (IOW, Atheistoclast may be wrong.) But Luke said "There must be some sense of the word "information" in which another copy of the same sentence does not add information." You didn't answer yes or no, whether in the realm of speaking words, or in the realm of biology. You avoided answering by demanding a "rigorous definition for discussion and debate."

Jeffrey Shallit said...

You didn't answer yes or no, whether in the realm of speaking words, or in the realm of biology.

Miranda: I'm not a linguist or biologist. I'm a mathematician and computer scientist. In the standard Kolmogorov definition of information as used by mathematicians and computer scientists, and explained in dozens of papers and books, doubling a string is guaranteed to increase information infinitely often.

The technical understanding of "information" should not be confused with various vague folk understandings of the word - in the same way that the folk understanding of the word "field" has little to do with how it used in algebra or vector analysis.

It is quite unwise to overload a term like "information", which is well-understood in mathematics and computer science, to mean something entirely different from what professionals in those subjects expect.

Miranda said...

I could be misinterpreting you, but the following two sentences appear to contradict each other, or are at least talking about different things:
1) "...it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once."
2) "... doubling a string is guaranteed to increase information infinitely often.

James Cranch said...

Most times I read nonsensical arguments in philosophy or theology, the underlying fallacy is of the same sort.

Usually there are two meanings of a word floating around: a jargon meaning (either pre-existing or cooked up specially for the purpose) and an everyday meaning. Then the error consists in managing to confuse the two: introducing an example in one sense, and using as if it were an example in the other sense.

Jeffrey Shallit said...

Yes, Miranda, you definitely seem confused.

In the Kolmogorov theory, xx need not have more information than x for every string x. But it will for infinitely many x.

Jeffrey Shallit said...

James:

I agree. Dembski is particularly effective at this fallacy.

Miranda said...

"In the Kolmogorov theory, xx need not have more information than x for every string x. But it will for infinitely many x. "

But who cares if it will for infinitely many x, with respect to information in biology? Last I checked, there aren't infinitely many biological systems out there.

Jeffrey Shallit said...

Miranda: I do not think I can remedy your confusion. Perhaps a course in mathematics at your local community college might help.

Miranda said...

Fine, I'll take that class, and you take this one: http://www.duq.edu/communication/graduate/phd/curriculum.cfm

Jeffrey Shallit said...

Thanks, Miranda! I will give your suggestion all the consideration it deserves.

John said...

Miranda, you need to take those two statements in context

"...it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once."

this is specifically in response to Luke's example of a spoken phrase.

"...doubling a string is guaranteed to increase information infinitely often."

is part of an explanation of the definition of information as used by mathematicians and computer sciences.

Miranda said...

That was very clear, John, thanks. It's too bad that neither statement had to do with biological systems, which is what this post was originally talking about.

Luke Barnes said...

In The Blind Watchmaker, Dawkins says that a tree shedding seeds is literally raining information - it wouldn't be more true if the tree were raining floppy disks.

Do biologists use Kolmogorov information when talking about the information in the biological systems? Is it part of the biologists mathematical toolkit? Does anyone know a good review paper or book on this?

Jeffrey Shallit said...

Yes, Kolmogorov information (or variants of it) is used all the time in biology. One application I know of is in constructing phylogenies. You can read Ming Li's work on the subject, probably easily findable with a google search.

PNG said...

The fallacy referred to above is called equivocation - using a term in different ways at different points, as if the different definitions were equivalent when they aren't.

Anonymous said...

Interesting. I just happened to have one of Ming Li's papers on the table beside my desk this morning. He uses Kolmogorov information as an 'information distance' when clustering homologous sequences for, say, a protein family. He defines the information distance as 'the length of the shortest binary program that is needed to transform' two sequences into each other. I wouldn't say that Kolmogorov information is used in biology 'all the time', but it is an interesting distance measure for sequence clustering. Whether Kolmogorov distance is 'meaningful' to the cell is another question (i.e., makes any structural or functional difference). For example, some sequences can have a large Kolmogorov distance from each other, yet be indistinguishable to the cell as far as structure and function goes. Another sequence may be have a single 'knock out' mutation that renders the sequence non-functional for the cell, yet have a very short Kolmogorov distance. What I'm saying is that Kolmogorov information may be 'meaningful' to us, but not necessarily to biology. Sometimes yes, sometimes no ... an indication that more work needs to be done to come up with an information measure that is more relevant to biology.

Jeffrey Shallit said...

Anonymous:

I think it unlikely that there will be a single information measure "relevant to biology". Instead, mathematicians and biologists will develop measures relevant & appropriate to the problem at hand. But when ID creationists "prove" bogus theorems about their measures and fail to admit the theorems are wrong, they do a disservice to science and mathematics.

charles allan said...

What about the amount of NEW information required to make the first cell in a pool of muddy water. The evos dont even have a start to their crazy theory

Jeffrey Shallit said...

Charles Allan:

What about the amount of NEW information required to make the first cell in a pool of muddy water. The evos dont even have a start to their crazy theory

Congratulations - you are yet another creationist who doesn't understand information theory.

Please prove you are not a blithering idiot.

1. What is the difference between "new information" and ordinary "information"?

2. Why are random events insufficient to generate "new information"?

3. How much "new information" in the following string of bits? 0110100110010110. Show your work.