Saturday, January 03, 2009

Test Your Knowledge of Information Theory

Creationists think information theory poses a serious challenge to modern evolutionary biology -- but that only goes to show that creationists are as ignorant of information theory as they are of biology.

Whenever a creationist brings up this argument, insist that they answer the following five questions. All five questions are based on the Kolmogorov interpretation of information theory. I like this version of information theory because (a) it does not depend on any hypothesized probability distribution (a frequent refuge of scoundrels) (b) the answers about how information can change when a string is changed are unambiguous and agreed upon by all mathematicians, allowing less wiggle room to weasel out of the inevitable conclusions, and (c) it applies to discrete strings of symbols and hence corresponds well with DNA.

All five questions are completely elementary, and I ask these questions in an introduction to the theory of Kolmogorov information for undergraduates at Waterloo. My undergraduates can nearly always answer these questions correctly, but creationists usually cannot.

Q1: Can information be created by gene duplication or polyploidy? More specifically, if x is a string of symbols, is it possible for xx to contain more information than x?

Q2: Can information be created by point mutations? More specifically, if xay is a string of symbols, is it possible that xby contains significantly more information? Here a, b are distinct symbols, and x, y are strings.

Q3: Can information be created by deletion? More specifically, if xyz is a string of symbols, is it possible that xz contains signficantly more information?

Q4: Can information be created by random rearrangement? More specifically, if x is a string of symbols, is it possible that some permutation of x contains significantly more information?

Q5. Can information be created by recombination? More specifically, let x and y be strings of the same length, and let s(x, y) be any single string obtained by "shuffling" x and y together. Here I do not mean what is sometimes called "perfect shuffle", but rather a possibly imperfect shuffle where x and y both appear left-to-right in s(x, y) , but not necessarily contiguously. For example, a perfect shuffle of 0000 and 1111 gives 01010101, and one possible non-perfect shuffle of 0000 and 1111 is 01101100. Can an imperfect shuffle of two strings have more information than the sum of the information in each string?

The answer to each question is "yes". In fact, for questions Q2-Q5, I can even prove that the given transformation can arbitrarily increase the amount of information in the string, in the sense that there exist strings for which the given transformation increases the complexity by an arbitrarily large multiplicative factor. I won't give the proofs here, because that's part of the challenge: ask your creationist to provide a proof for each of Q1-Q5.

Now I asserted that creationists usually cannot answer these questions correctly, and here is some proof.

Q1. In his book No Free Lunch, William Dembski claimed (p. 129) that "there is no more information in two copies of Shakespeare's Hamlet than in a single copy. This is of course patently obvious, and any formal account of information had better agree." Too bad for him that Kolmogorov complexity is a formal account of information theory, and it does not agree.

Q2. Lee Spetner and the odious Ken Ham are fond of claiming that mutations cannot increase information. And this creationist web page flatly claims that "No mutation has yet been found that increased the genetic information." All of them are wrong in the Kolmogorov model of information.

Q4. R. L. Wysong, in his book The Creation-Evolution Controversy, claimed (p. 109) that "random rearrangements in DNA would result in loss of DNA information". Wrong in the Kolmogorov model.

So, the next time you hear these bogus claims, point them to my challenge, and let the weaselling begin!

118 comments:

Rich said...

I referred to your 2003 paper with Wesley Elsberry here: http://www.calvin.edu/archive/asa/200901/0046.html

Do I have permission to quote this blog post in full and with attribution on the American Scientific Affiliation e-mail list? Thanks.

Jeffrey Shallit said...

Sure, go right ahead, Rich.

Rich said...

In the December 2008 issue of Perspectives on Science and Christian Faith, Dembski in a review of David Bartholomew's God, Chance, and Purpose said the following. He still doesn't get Kolmogorov Complexity, having learned nothing from your 2003 paper. He then claims others don't understand him!


Bartholomew is an ardent Darwinist: “The combination of chance variation and natural selection has been a powerful creative force, fashioning the world as we know it” (p. 170). Consequently, he critiques intelligent design (ID) and my work in particular. His critique disappointed me because back in 1998 Bartholomew reviewed my book The Design Inference for the Templeton Foundation (it was an in-house review commissioned by Charles Harper at a time when ID still had some respectability with Templeton). Back in 1998, Bartholomew liked the book, though he indicated that portions went beyond his understanding. That lack of understanding
has, unfortunately, persisted.

Bartholomew argues that my method of design detection as outlined in The Design Inference is fatally flawed because it presupposes design to identify the rejection regions I use to eliminate chance and infer design. Thus my method of design detection is supposed to constitute circular reasoning. But Bartholomew never engages my key notion of specification, which extends and enriches the traditional statistical understanding of a rejection region (indeed, the word “specification” appears only in the footnote on page 113, and the concept itself remains unanalyzed throughout the book). Specifications, as I define them, do not presuppose design but are characterized independently in terms of an extension of Kolmogorov complexity. Bartholomew fails to acknowledge this crucial point, much less to engage it. Similar misunderstandings and misrepresentations pervade his other criticisms of ID.

MrG said...

"Probability calculations
are the last refuge of
a scoundrel." I like
it ... detailed refutations
are nice, but a good
soundbite is a great
bonus.

John S. Wilkins said...

The probabilities in a Shannon account are specified by a prior knowledge of the frequencies of the symbols in the "population" of symbols (e.g., the frequencies of the letter "E" in English, etc.). There's nothing much objectionable about post hoc frequencies, is there?

Jeffrey Shallit said...

John:

Of course there's nothing "objectionable" about Shannon information. It's just that it depends on probability distributions, and arguing with creationists about those just allows more wiggle room and obfuscation possibilities.

MrG said...

"More wriggle room" ... I have a suspicion that if cornered on KC information the response will be to backtrack and say: "What we meant all along was Complex Specified Information." I'm not sure CSI has any standing in information theory -- and if not, well so much the better to muddy the waters.

Jeffrey Shallit said...

Yes, Mr. G., I am sure that is one of the ploys that creationists resort to. The problem is that Dembski's "complex specified information" has no rigorous definition, has never been calculated except in toy examples, and the proof that it is "conserved" is wrong, as I have shown in my long paper with Elsberry.

MrG said...

I was of the general impression that CSI had no real standing. I do recall (and it might have been from one of your or Ellsberry's papers) that Leslie Orgel actually "invented" the term -- but strictly as a term of convenience, with no pretense of a rigorous definition.

ollie said...

How about a question from someone who doesn't know information theory:

two copies of Hamlet conveys

1. the information in Hamlet (one copy) plus

2. The fact that there is a second copy.

Isn't that in itself, information?

John S. Wilkins said...

Well my point about the Shannon frequentist notion of probabilities is that it relies on knowing the frequencies a priori. Since the IDiots do not know those frequencies, any Shannon-style account makes no sense.

Zeno said...

Since Kolmogorov information defeats the creationists in every instance, I'm certain that creationists (once they finally, slowly, realize this), will decide that Kolmogorov information is the wrong model for biological information. See? Problem solved! Back to Shannon information (which I'll bet most of them don't understand either!) to take shelter under obfuscatory musings about probability distributions, with much forehead slapping and cries of "Oh, see how unlikely?!"

Jeffrey Shallit said...

Yes, Ollie, you've got one aspect of it. Roughly speaking, n copies of a string x can be encoded by the encoding of x, together with the encoding of n.

Unfortunately, this reasoning just shows that the Kolmogorov complexity is upper-bounded by the complexities of x and n, but it doesn't provide a lower bound, which is what the exercise requires.

Joe Felsenstein said...

Of course you are right in your statements about the Five Questions (I then am tempted to say "... which is why this post is different from all other posts").

However a footnote should be added. In all of these cases the new string could indeed have different information content than the old string. But ...

Q6: Can Jeff calculate the information content of the string?

As far as I know the answer is "no" as no one can, for any particular string. Or do I thereby misunderstand Kolmogorov Information?

Not that this gets the creationists off the hook.

Jeffrey Shallit said...

Joe:

You may not be able to calculate the Kolmogorov complexity exactly, but you can estimate it in some cases, and that's what makes these problems solvable.

Remember that all I am asking is to show the existence of strings with the given properties.

Anonymous said...

Q3: Can information be created by deletion? More specifically, if xyz is a string of symbols, is it possible that xz contains signficantly more information?

Can you give an example either in text strings or some other manner demonstrating this? Thanks. Very Good Article!

Dan Styer said...

Another question:

Does the second law of thermodynamics (law of entropy increase) apply to Shannon entropy or (negative of) Kolmogorov information?

I'm not an information theorist, but my impression is the answer is "no".

Tim said...

Sometimes I wonder why you spend so much time on such asshat wankers. I guess it's necessary, but anybody smart enough to understand the math you present is already smart enough to know better than to believe creationist rubbish.

Enjoy.

Ian said...

Uh oh, turns out I'm 1/5th creationist :) Q3 is stumping me, but I'll keep thinking about it.

Great post, this would have come in handy in a creationist back-and-forth I had a while ago.

Jeffrey Shallit said...

Anonymous:

If you contact me by e-mail, I can send you the example for Q3.

Geoff said...

Actually, Jeff, could you post that Q3 example here? I can see this providing information if there's a common context in which xyz is present - then, the absence of y is significant. But is there a stand alone context for this?

Anonymous said...

I too would like an example for Q3. I am stumped. All the others made perfect sense...

Maybe someone could post it here.

Anonymous said...

Oh, for Q3, I got one:

String is

ATATATATATAT

delete one of the T's in the middle,

string is now:

ATATAATATAT

Jeffrey Shallit said...

I'm sorry, I don't want to post any examples here, because that would defeat the purpose of the exercise. If you contact me by e-mail, however, I can send you the solution.

Rich said...

Does the second law of thermodynamics (law of entropy increase) apply to Shannon entropy or (negative of) Kolmogorov information?

No. I cannot find the quote but Shannon chose the term because his entropy looked similar to Boltzmann's formula and named his quantity entropy. Speaking of Shannon, when I debated Shannon-type information on the ASA list I caused some heads to explode saying a truly random string has maximum information for a given string length. (IDists hate randomness with a passion.)

That's not the only thing that's counter-intuitive. In quantum information theory information can be negative. See Nature 436, 673-676 (4 August 2005).

Because terms such as entropy and information are used, lay people "think" they understand but they really don't. It doesn't help when you have people like Dembski that further muddy the water.

MrG said...

Would I be off base to think that any change that "breaks symmetry" in a string necessarily adds information?

MrG said...

"I caused some heads to explode saying a truly random string has maximum information for a given string length."

When I was fumbling around with information theory I had an instinct to link it with data compression, which is a subject I know a bit about.

Given an uncompressed
full-color BMP image
file say, 300 x 300 pixels in size, then any image will be the same size as any other. If the images are converted to a lossless compression format like PNG, then the size of the resulting file can be regarded as an indication of how much information there is in the image.

An image made up of a checkerboard of colored squares has little information in it and it compresses WAY down.
It's full of "air". A
busy picture of, say, a flower bed doesn't compress very well.

And if you have an image consisting of nothing but noisy diverse dots of color, it compresses even more poorly than that. It has more information.

The interesting thing its to take the
checkerboard of squares and spray-paint it with a mist of colored pixels.
Did it lose information?
No, it gained it, its
compression wasn't as good.

So is this off-base? I'm sure the analogy isn't perfect.

SteveF said...

Jeffrey,

You may find this recent paper to be of interest:

Frank, S.A. (2008) Natural selection maximizes Fisher information. Journal of Evolutionary Biology, advance online publication.

It can be read here:

http://stevefrank.org/reprints-pdf/09JEBftns.pdf

Rich said...

MrG your intuition is correct. A helpful discussion on randomness and Shannon coding and Kolmogorov Complexity can be found below.

If you pay attention to what's below you will know why the anonymous example in the comments works. I am sure Jeff doesn't want to give things away but that presupposes a certain amount of *cough* reading comprehension.

A quote from http://www.bearcave.com/misl/misl_tech/
wavelets/compression/shannon.html follows. The discussion of compression in Shallit and Elsberry 2003 is also helpful (http://www.talkreason.org/articles/
eandsdembski.pdf) Shallit is the same as the blog owner.

"If we have a symbol set {A,B,C,D,E} where the symbol occurance frequencies are:

A = 0.5
B = 0.2
C = 0.1
D = 0.1
E = 0.1
The average minimum number of bits needed to represent a symbol is

H(X) = -[(0.5log2(0.5) + 0.2log2(0.2) + (0.1log2(0.1)*3)]
H(X) = -[-0.5 + (-0.46438) + (-0.9965)]
H(X) = -[-1.9]
H(X) = 1.9

Rounding up, we get 2 bits/per symbol. To represent a ten character string AAAAABBCDE would require 20 bits if the string were encoded optimally. Such an optimal encoding would allocate fewer bits for the frequency occuring symbols (e.g., A and B) and long bit sequences for the more infrequent symbols (C,D,E).

This example is borrowed from A Guide to Data Compression Methods by Solomon. Note that the frequence of the symbols also happens to match the frequency in the string. This will not usually be the case and it seems to me that there are two ways to apply the Shannon entropy equation:

1. The symbol set has a known frequency, which does not necessarily correspond to the frequency in the message string. For example, characters in a natural language, like english, have a particular average frequency. The number of bits per character can be calculated from this frequency set using the Shannon entropy equation. A constant number of bits per character is used for any string in the natural language.

2. Symbol frequency can be calculated for a particular message. The Shannon entropy equation can be used calculate the number of bits per symbol for that particular message.

Shannon entropy provides a lower bound for the compression that can be achieved by the data representation (coding) compression step. Shannon entropy makes no statement about the compression efficiency that can be achieved by predictive compression. Algorithmic complexity (Kolmogorov complexity) theory deals with this area. Given an infinite data set (something that only mathematicians possess), the data set can be examined for randomness. If the data set is not random, then there is some program that will generate or approximate it and the data set can, in theory, be compressed.

Note that without an infinite data set, this determination is not always possible. A finite set of digits generated for a pi expansion satisify tests for randomness. However, these digits must be pseudo-random, since they are generated from a deterministic process. Algorithmic complexity theory views a pi expansion of any number of digits as compressible to the function that generated the sequence (a relatively small number of bits)."

Paul Crowley said...

I'm confused by your remark "for questions Q2-Q5, I can even prove that the given transformation can arbitrarily increase the amount of information in the string, in the sense that there exist strings for which the given transformation increases the complexity by an arbitrarily large multiplicative factor."

Surely the length of the shortest program that describes the new string is upper bounded by the length of the program that describes the old, plus an addenum to describe this change? It seems to me that the difference in Kolmogorov complexity is bounded by O(log n) for Q1 and Q3, and O(log (n|\Sigma|)) for Q2. For Q4 and Q5 the bounds are considerably wider of course.

MrG said...

I understand data compresssion pretty well -- I wrote a survey at:

http://www.vectorsite.net/ttdcmp_1.html

-- and the chapter on lossless compression got some good press. In terms of implementations you're referring to Huffman and Shannon-Fano coding (same results, inverse methods). But I'm an engineer by background and find discussions of generalized theory hard to follow.

Incidentally, I cooked up the images I described as a fun experiment. No particular use for them on my website, anyone who might want to use them on their blog or etc need but ask:

mrg_n005@q.com

Jeffrey Shallit said...

Paul:

For Q3, the "addendum that describes the change" can be arbitrarily large as a function of the Kolmogorov complexity of the original string. Contact me by e-mail to get an example.

Jeffrey Shallit said...

Mr. G:

It's the not the breaking of symmetry, precisely, that adds the information, but the breaking of pattern. Symmetry has a fairly precise meaning to mathematicians, and that's not what's at issue here.

Silver Fox said...

"So, the next time you hear these bogus claims, point them to my challenge, and let the weaselling begin!"

If one were to consider using Bayes Theorem for conditional probability, would you consider this "weaselling"?

Jeffrey Shallit said...

Silver Fox:

Well, now, it all depends on how the priors are calculated, doesn't it?

Takis Konstantopoulos said...

I recently read a paper by Dembski (not published yet) which is a reply to Haggstrom's paper; the latter criticises the so-called no free lunch "theorem", pointing out that (a) it is trivial and (b) stupidly used by Dembski and others to prove the existence of a (male) creator.

In his rebuttal, Debski does such a poor job that I find it hard (embarrassing even) to write something about. He is confused (or so it seems from his paper) about the meaning of the word INFORMATION. Indeed, while you are referring to information as the well-known mathematical function, Dembski confuses it with a common usage of the word and, I believe, makes use of this confusion in order to produce (on purpose?) obscure arguments.

This is a practice followed by many creationists/intelligent-designers. The difference between an arbitrary person belonging to this category and Dembski is that the latter is supposed to be a mathematician. A failed one, to be sure, a mathematician nevertheless with Billingsley as his PhD advisor. (Incidentally, do you know Billingsley's survey paper on information theory and additive number functions? Dembski, apparently, doesn't...)

Back to Dembski's (mis)use of information:
1. He takes a probability, say that of finding an ace in 5 randomly drawn cards, computes its log base 2 and calls this number information. This is what he defines as information in his paper.
2. He considers an idiotic example, such as the first occurrence of the phrase METHINKS_HE_IS_A_SINNER, in two ways:
(i) by letting a monkey type at random until the phrase pops up,
(ii) by letting 23 monkeys type in parallel and if the i-th monkey finds the correct i-th letter then we kill him and keep the letter.
He goes on to explain (!) that the latter is easier than the former (wooaoo!) and (here is the punch line) says that the latter is due to some philosophico-theological term that he calls "active information". To make it clear (if it can be made so), he insists that search procedure (i) lacks active information (but posseses "endogenous information"!), whereas (ii) does have it.

And then there is my favourite sweet-talking John Lennox (of Oxford, of course) whose goal is to prove that (a male creator of his particular north-irish version of christianity called) god exists, via the use of probability theory, information theory, complexity theory, and theoretical physics, topics which he is NOT familiar with (he has done work on group theory, e.g. solvability of finite groups, but this alone does not make him an expert in other fields; in fact, as we see from his use of mathematical terms such as probability and information, he either uses them at the same level as your undergraduates or does not know how to make good use of them). John Lennox has summarized his thesis in his book God's Undertaker: Has Science Buried God? which (surprisingly?) has many good reviews but which I find silly. He goes aroung giving talks (exactly the same talk, using the same naive arguments, as in his book), where he abuses probability, statistics, information theory and complexity theory. I'll let your undergraduate students decide what is flawed about Lennox's use of probability in this 1 1/2 minute video clip.

In your posting, you say:
Creationists think information theory poses a serious challenge to modern evolutionary biology -- but that only goes to show that creationists are as ignorant of information theory as they are of biology.
Why, are you surprised about it? Creationism/intelligent-design is not an academic field. It is closer to a religion. Therefore, creationists/intelligent-designers do not need to know mathematics, physics or biology. As a matter of fact, you need no formal training to become one of them. Faith in, say, the number 666 (as appearing in the book of revelation) is more important than understanding science (or that 666 is divisible by 37 :-) :-) )

MrG said...

"Symmetry has a fairly precise meaning to mathematicians, and that's not what's at issue here."

Yeah, that's why I feel
on thin ice in considering
a theoretical math discussion. I may have an idea of what's going on but the pros have nomenclature that is highly specific and not necessarily all that close to informal / popular usage:

"Darwinian evolution is random!"

"Ah, I think you might want to use the term 'nondeterministic' instead."

You seem to be sympathetic to the fact that folks outside of your domain aren't using quite the same language. Some math-oriented folks can be intolerant on this score. Forebearance is appreciated.

Eamon Knight said...

Jeff: Great post! I have felt for several years that something along these lines is the correct reply to creationist blather about information (though you obviously do it with more rigor than I can muster ;-). The creationists are attempting to impute a quasi-mathematical authority to their argument; we should deny them that ground, with prejudice. Here is where Dawkins (in that infamous ambush video from a few years back) slips up: he accepts without complaint something like the colloquial definition of information, and tries to run with it. The result is unimpressive (to me at least, YMMV). Of course, the relationship between the information content of DNA considered as a symbol-string, and phenotype (particularly morphology) is far from clear (at least to this non-expert). For eg: Does it take more or less genetic information to grow forelimbs into functional wings, than into manipulatory appendages?

MrG said...

"Does it take more or less genetic information to grow forelimbs into functional wings, than into manipulatory appendages?"

This was going around on PT for a while and I was tinkering with it a bit. What I think is being asked is: "Is there a magic ratio of functionality to the number of bits required to implement it?"

This sounds really dodgy, since it would demand some actual measurement of "functionality" and that could only be done in very restricted ways.

Imagine a restricted case of a particular software task. Is there some way to determine the size of the program needed to implement it? That might be possible for very well-defined tasks, but in general it would depend on the computer language used and the specific implementation details.

I would find it hard to think anyone could figure out a ratio between sequences of DNA and the functionality implemented. One thing is that the genome is hierarchical, with "developmental genes" that say "build an eye" or "build a leg" and leave the details up to lower-level routines. Are the developmental genes "more functional" because they operate at a more general structural level? I don't think you can get from here to there, at least not in any non-trivial fashion.

Takis Konstantopoulos said...

The point is that creationisto-intelligent-designer-preachers (call them preachers for brevity) need to understand is that they MAY NOT (because they CANNNOT) use mathematics to explain their ad hoc beliefs. I wouldn't care less about what they believed if they didn't abuse science and mathematics. Some of them do so because they target general audience hoping to blind them with "advanced" stuff. A layman will (and does) point to an academic preacher saying "look, it's not just what my holy book says, it's also what Prof. XY and what Dr. ZW says, ergo god exists".

However, the preacher Prof. XY and Dr. ZW should not be allowed to keep on blethering. They can do so, by using their bibles, qurans, mahabharatas, whatever. But not by abusing, say, information theory.

I bet they won't pass the test Jeff. Some time ago, I realized that some intelligent-design organization was referring to John Lennox as an expert in complex systems (whatever this means). How come?

Nilou said...

This is how I think about physical and informational entropy.

The ink written words of a Shakespearean sonnet will absolutely succumb to the destruction promised by the second law of thermodynamics – the ink molecules will break down, the paper will degrade; given enough time, those words would cease to exist. But if those words could function to get themselves replicated, they may manifest again. Different ink molecules may once again form the very specific pattern in three-dimensional space that we call Hamlet. If during the replication of Hamlet an error were to occur that changed the sonnet in a way that has adverse consequences for its function (if a crucial line or word were omitted, thus leaving the reader dissatisfied), then that copy may not last long in the publisher’s office or the bookstore. The sonnets that stay in a particular form copied better. Changes that disturbed the minds of humans lost replication potential. Thus the written manifestation is low in informational entropy, and the patterns in matter that define the sonnet are low in informational entropy – but the physical manifestation will always, surely, and faithfully increase in physical entropy.
There may have been a word that was omitted in the original sonnet, perhaps an error by the hand of Shakespeare himself. Copies including the word, although absent in the original, may have been more harmonious to the mind of human, and thus, copied better. In this case, information is gained and the non-physical abstraction we call informational entropy decreases.

MrrKAT said...

Great post!

My version of Q1 vs Dembski has been ~ this:
Wholesaler sends 437 Hamlets to a bookstore and there sales(wo)man asks boss: -What is the secret password to contact wholesaler's main computer? Boss:-Calc the number of copies we got..

Anonymous said...

Q1. In his book No Free Lunch, William Dembski claimed (p. 129) that "there is no more information in two copies of Shakespeare's Hamlet than in a single copy. This is of course patently obvious, and any formal account of information had better agree." Too bad for him that Kolmogorov complexity is a formal account of information theory, and it does not agree.

Yes but Behe is correct that is very little more info in the 2 copies. A very small insignificant amount.

Jeffrey Shallit said...

Anonymous:

The amount may be small, but when gene duplication is followed by divergence and drift, the amount of information can essentially double, with high probability.

MrG said...

I think I'm starting to follow this stuff a bit better.

I had some feel for KC information -- take a data stream (string), compress it by a specific algorithm, then the "residue" plus the size of the program required to implement the algorithm gives the KC information.

I could not follow Shannon information for a time, it seemed something about a sum of probabilities of symbols, but it appears that compression schemes like Huffman and Shannon-Fano are direct derivatives of Shannon's definition of information:

Take a data stream, get stats on the symbols in it, assign the shortest codes to the most common symbols and the longest codes to the least common symbols. The resulting compressed data stream represents the Shannon information.

The two definitions seem very functionally similar, both set a lower bound on the compression of a data stream. If the data stream were infinite that would be an absolute lower bound. It never is of course, but in practice a compression scheme gets the appropriate stats on the symbols of a message, either in a batch task before compression or "on the fly" with updates, and that does the job.

The information in both cases is a "quantity" measure -- the "busier"
or "noisier" it is, the more information is there, the less effective the compression. What the information actually is for is not really addressed.

I am still a bit confused on "entropy" versus information -- what I think is that information is what's left after compression, entropy is what's thrown away. I am fairly sure that "entropy" in infotheory has little or nothing to do with entropy in thermodynamics, though some try to make the link. Cheers -- MrG

Takis Konstantopoulos said...

MrG:

Although it may appear that (the mathematical concept of) entropy is an arbitrary function, it really isn't. It is not defined by a formula. Rather, we accept a number of principles that must be obeyed by such a function (e.g. that the joint entropy of independent random variables must be the sum of their entropies) and, from them, we derive a formula.

This is, in fact, a general mathematical principle. Another example is the so-called normal distribution. In basic probability texts, one learns that the normal density is defined as
(2*pi)^(-1/2) * exp(-x^2/2)
and then one proceeds in doing lots of (mostly jejune) exercises. This is, in my opinion, wrong for it does not show the power of the function and why it us unique in its class. Mathematical objects that express "concepts" such as Information and Normal Density must not be defined by formulae; rather they must be shown to be uniquely defined by obeying some well-agreed upon principles.

Information and entropy are related concepts. In fact, it is mathematically more rigorous (and general) to define information first and entropy next. Here is an attempt for a pedestrial explanation of the concept: If two random objects, say X and Y, are indendent then we agree that their information, say I(X,Y), equals zero. If they are not independent, then I(X,Y) measures the deviation of their joint law from they law these objects would have had, IF they were independent.

If you want a very nice account on these things, at a very basic level (including some discussion on the relation of information theory to statistical mechanics and other areas of applied mathematics such as combinatorics), then take a look at the book: A Diary on Information Theory by Alfréd Rényi. It is, admittedly, hard to find.

Please do not think that the entropy of Information Theory and of Thermodynamics are different. They are the same! Thermodynamics is a discipline which derives as a macroscopic picture of another physical theory called Statistical Mechanics (or Statistical Physics). The latter is based, entirely, on probabilistic models of microscopic systems. The phrase "macroscopic picture" means taking limits, in a mathematical sense, the same way one learns how to take limits of sequences of numbers. The entropy of a probabilistic model is the same as the entropy in Information Theory. This leads directly to the concept of entropy in Thermodynamics which is the very quantity used by engineers in designing combustion engines and not only. it is also the same quantity used by astronomers in calculating quantities related to distant stars.

John S. Wilkins said...

I'm clearly out of my league here, but Takis, didn't Brioullin show that while all informational entropy had a physical entropy, the reverse is not necessarily true?

MrG said...

"Although it may appear that (the mathematical concept of) entropy is an arbitrary function ..."

To the folks with professional qualifications: does any of that posting make sense? I'm not in a position to say, but I CAN say for a fact I don't follow a word of it.

Cheers -- MrG

Anonymous said...

I really like that instance where a deletion increases information content by making the string less compressible- very elegant.

Takis Konstantopoulos said...

John: I am not a phycisist, so the models I am aware of are mostly of mathematical nature as applied to physics. All I'm saying is that the view that physical entropy and the mathematical one are not unrelated objects. Whether the mathematical model captures all physical reality is not for me to judge. I know, for example, that reversible Markov chains (which appear to be models of certain classical physical processes) are very special ones in mathematics; and that the entropy of a Markov chain as a function of time does not necessarily increase as time increases, although a physical model may require this. I may ask my colleague Oliver Penrose to comment on this. I will admit no authority on the subject you mention.

Mr G: I apologize if I wrote something not very clear to you; however, this is the problem with replying without knowing who the other person is. I have no idea what your background is and, perhaps, I used a slightly technical language. However, it was not my intention to confuse you or appear to be pompous.

All I am saying is this: If you look at a definition of entropy, you will often find a statement of the form: Suppose X is a random variable taking values 1, 2, 3, ...
with probabilites p1, p2, p3, ...
Then the entropy of X is defined as

H(X) = -p1*log(p1) - p2*log(p2)-...

This is correct. But it reveals nothing to an unitiated reader. You might ask: why should I not define entropy as above but, say, raise that logarithms to the fourth power instead?

The answer is that if you change the formula in any way (other than changing units, i.e. other than multiplying by a positive constant) then you are going to lose one or more of the important properties of H(X). And these properties are some natural ones, some properties we all wish this function to possess. To put it in another way, if we make a list of these properties (there are only 3 or 4 of them) and ask which quantity satisfies them then we arrive at the formula above.

So the formula is not arbitrary: it is very-very-very-well motivated. It is not easy to convince someone that this is the case indeed, but I hope that what I am saying does not appear entirely nonsensical to you.

When Dembski (who should have known better) or other creationists use entropy (or probability), they are not aware of some of its deeper mathematical consequences. They view entropy, at best, as something that is given by a formula. To a mathematician, formulas do not exist. They are merely representations of abstract properties. Mathematics does not start and does not end in formulas. Somehow, we do use formulas as a tool, but do not do so arbitrarily.

Ty said...

I loved reading this entry and the following comments.

Erdos56 said...

Interesting, all.

I'll just mention that if we move from these largely syntactic operations on strings, there are other good reasons to look for relationships between information theory, compression and biological evolution (and all run counter to ID claims). The Kolmogorov model also points towards behavioral error minimization (see Solomonoff, etc.), which is broadly a consequence of survivability.

Here's a topical paper by my father on the death of Shannon, though not directly invoking algorithmic complexity issues per se:

Atmar, W. 2001. A profoundly repeated pattern. (Comments on the death of Claude Shannon and the intimate relationship of information to life) Bulletin of the Ecological Society of America, 82(3):208-211.

Tom English said...

Jeff,

My response to one "Upright BiPed" (Dembski?), who wrote at UD:

Of course, it can be said that with infinite time, or limitless universes, anything can come about by chance and necessity - which is exactly the argument materialists are forced into. So be it.

If you want the universe to be all that is, then so be it. There are events within the universe, but the universe is not itself an event. It is. Period. You can give it an objective (frequentist) probability only by postulating something that stands without to generate the universe by a replicable process. But you just ruled that out. Without an objective probability of the cosmos, there is no cosmic CSI.

Any CSI you compute for an entity within the cosmos depends ultimately on your framing of an event and an observer (Dembski's semiotic agent). I have not seen an accounting for that framing. It seems that any CSI-computing entity effectively must associate high CSI with itself. There is some ugly self-reference and paradox in this, and I'm serving fair warning.

"Intelligence changes probabilities" rolls off the tongue nicely, but is not as neat and clean as it seems. It doesn't work at the cosmic level. And if it doesn't work there, just how does it gain traction at lower levels? In particular, where does a bound on gain of CSI by natural processes operating within the known universe come from when a "probability distribution on universes" is a non-starter?

S.P. said...

Can you clarify for me what's meant by "information" in this context?

Shannon "information" / entropy.
or
"Information" in the common usage- ie data that describes or explains something.
or Something else?

S.P. said...

Please ignore my earlier question. I just read more of the blog and think I see now.

Torbjörn Larsson, OM said...

Yes, a very good and hopefully useful post!

"Here is where Dawkins (in that infamous ambush video from a few years back) slips up: he accepts without complaint something like the colloquial definition of information, and tries to run with it. The result is unimpressive (to me at least, YMMV). Of course, the relationship between the information content of DNA considered as a symbol-string, and phenotype (particularly morphology) is far from clear (at least to this non-expert). For eg: Does it take more or less genetic information to grow forelimbs into functional wings, than into manipulatory appendages?"

FWIW, in Dawkin's article answering to the ambush video's "Information Challenge" he seems to have been prompted to answer to the idea of biological information since he has worked with it in his papers and books.

I'm not sure what the problem is with his reasoning, as Dawkins (who uses Shannon information) arrives by fallible analogy (and admits as much) at the same description as other evolutionists that model the biological information flow during evolution. (For example Tom Schneider with his ev model.

These biologists seems to think it is practically impossible to meaningfully describe the genetic information in the individual DNA string, as functionality development and usage is contingent on the environment. (I.e. a fish egg may develop to male in a certain temperature range or else female, and so on.) Instead they see and measure information gained by the population's genome emergent by the evolutionary process in a specific environment, more specifically selection which takes an a priori distribution of alleles to a narrower a posteriori distribution as Shannon studied.

Dawkins' analogy: "Mutation is not an increase in true information content, rather the reverse, for mutation, in the Shannon analogy, contributes to increasing the prior uncertainty. But now we come to natural selection, which reduces the "prior uncertainty" and therefore, in Shannon's sense, contributes information to the gene pool. In every generation, natural selection removes the less successful genes from the gene pool, so the remaining gene pool is a narrower subset."

"If natural selection feeds information into gene pools, what is the information about? It is about how to survive. Strictly it is about how to survive and reproduce, in the conditions that prevailed when previous generations were alive. To the extent that present day conditions are different from ancestral conditions, the ancestral genetic advice will be wrong. In extreme cases, the species may then go extinct. [Emphasis removed.]"

(Tom Schneider's figure in the link, comparing the information gained in the genome with and without selection, illustrates Dawkins' description.)

Torbjörn Larsson, OM said...

"Please do not think that the entropy of Information Theory and of Thermodynamics are different. They are the same!"

I think it is obvious they are not, as thermodynamical entropy has physical dimensions and mathematical information not. More specifically, in statistical physics entropy is defined as the number of available energy states, not the number of useful bits in a symbolic description (of the states or the system?). The former is subject to changes by the dynamics in the physics, the latter not AFAIU.

(And in classical TD it is defined differently but comes out as a measure of the unavailability of heat energy. The two physical entropies are related, but only approximately the same (for large systems) due to the actual statistical fluctuations allowed by the former theory.)

There are connections though, as later comments notes. Especially, I think, in computer science on actual computers where the two meet.

Mike Elzinga said...

There are important distinctions among the concepts of (thermodynamic) entropy, information, and order. It hasn’t helped that even physics textbooks have been sloppy in maintaining these distinctions.

In reality, entropy refers to the multiplicity of available energy microstates in a thermodynamic system; it has very little to do with the spatial order of matter. An energy microstate is simply any mechanism that can carry energy (e.g., each way a molecule can vibrate or rotate). Thermodynamics is about the bookkeeping of energy; not spatial order. The distinction is crucial.

For example, atoms can condense into spatially ordered crystals. But they can do this only if excess energy is carried away into the surrounding environment via additional energy states, such as photons, phonons, or other atoms. Thus the atoms forming a crystal condense into fewer energy states (a local entropy decrease) as a result of even more energy states becoming available globally (global increase in entropy). The crystalline spatial order arises from electromagnetic interactions and the rules of quantum mechanics, along with any emergent properties of matter as it condenses into more complex systems; never by “overcoming entropy”.

When atoms and molecules are forming crystals or living systems, matter and energy are always being exchanged with a larger environment. The field of condensed matter physics reveals that order, complexity, and emergent phenomena are ubiquitous in the universe, from the formation of protons and neutrons out of quarks and gluons, to atoms and molecules, to solids and liquids and organic compounds, all the way to the formation of living systems. The variety and complexity are enormous and depend on the energy ranges and materials available.

Yet no laws of thermodynamics are being violated anywhere. There are no ID/Creationist “entropy barriers” prohibiting order and complexity from arising, and no ID/Creationist has ever demonstrated such a barrier (it could be worth a Nobel Prize if one did). The “paradox” arises only because of misconceptions promulgated by the ID/Creationists.

There is a nice little exercise one can do to start clarifying what is meant by entropy, order, and information. Think of a very simple system that includes an asteroid or planet making a highly elliptical orbit around a star. Now ask the question, “How can the orbit become circular; what has to happen?” What mechanisms can you think of that will make this happen?

Then ask, “What is meant by order in this situation?” Which orbit is more “orderly?” Then continue to ask, “Which situation contains more information?”

What comes of this is that the processes by which an orbit changes from highly elliptical to circular are physical and involve the bookkeeping of energy. That part connects with thermodynamics and entropy. The questions about order and information have little to do with the processes by which the orbit becomes circular, but they raise most of the problems one grapples with in attempting to define what is meant by order and information.

paul01 said...

Is there not a difference between information and signal? As I understand it a signal can be made stronger through redundancy which helps the signal transmit through noise. But redundancy actually reduces the amount of information in the entire system since it shortens the description of that system. At the same time the information in the signal itself is not much different. When one creates an MP3 out of a .wav file one compresses the file to its essential, but usually there is no discernible difference in the sound.

IDers often seem to play back and forth between these meanings of information as information and "information" as signal. If one is to talk about the information contained in DNA in this latter sense, one should also talk about the noise and also the redundancy in such DNA as well.

By the way, if I am being inaccurate, I would appreciate being corrected. (that's why I read this blog in the first place!)

Kirk Durston said...

Jeffrey, would you grade your students' papers on the basis of how much Kolmogorov information they contain? Not likely. Kolmogorov information, like Shannon information, does not distinguish between meaningless gibberish and strings of symbols that are functional or meaningful. In the same way, biologists that concern themselves with information theory applied to biopolymers, have recognized that neither Kolmogorov information nor Shannon information is of any real use in biology, simply because it matters a great deal in biology whether the information is functional or not, whereas neither Kolmogorov complexity nor Shannon information makes that distinction. This was first pointed out in a short article in Nature (see Szostack, Jack, 'Molecular Messages', 423 (2003) p.689). More recently, a method to measure functional information, with application to biology, was published in PNAS. You will notice that their approach uses a variation of Shannon's approach, but with the added variable of functionality. A slightly more recent paper proposes a very similar, albeit more sophisticated, method and demonstrates its application to proteins. Post hoc probabilities are computed from actual protein family data, downloadable from PFAM.

Your post on Kolmogorov information is interesting, but not that relevant to biology unless you can publish a method to compute functional Kolmogorov complexity. I think that adding a functional variable is easy, but the awkward part is compressing the 2-D data array for a protein family. The individual symbols are not independent of each other. I have found higher order relationships (up to 7th-order) between the columns in the array, so any attempt to compress the array must still preserve the higher order relationships between the columns. (Paper In final stages co-authored, incidentally, by two professors at your university, Jeffrey.) Having said that, I have written some software that does do some rudimentary compression, primarily by removing columns that have a very low probability of contributing to the structure and function of the protein. Nevertheless, the problem always remains .... could it be compressed still more and how? ..... which is why using some form of Functional Kolmogorov Complexity is a little too awkward and underdeveloped to be useable in biology just yet.

In summary, your questions are a little misapplied when it comes to biology. However, I will confess that most creationists are not current either.

Jeffrey Shallit said...

Kirk:

Kolmogorov complexity has a long track record of applicability, even in biology (see, for example, section 8.6 of Li and Vitanyi's book). It is well-defined, and there are no arguments about the definition (other than those who want to quibble about K versus C).

On the other hand, the measure you propose has been mentioned in a handful of papers, and has no firm mathematical definition.

You babble about "functional Kolmogorov complexity", but there is no such thing. If you have a mathematical definition to propose for it, then do so.

Kirk Durston said...

Jeffrey, I would recommend that you read the Nature article I mentioned above. It is no secret that both Kolmogorov and Shannon complexity has been used in biology, but there has been a fundamental problem with its usage, as pointed out in the aforementioned Nature article. Most biologists I know have become disenchanted with the relevance of information to biology for the reason pointed out in the Nature article. The 'handful of papers', as you put it, represent the current state of the art in addressing that problem. That 'handful' is about to become much larger given what I'm aware is going on out there. As far as a 'firm mathematical definition' is concerned, that is happening. Hazen's mathematical definition was a step in the right direction albeit a bit simplistic (see the PNAS reference). My definition was a bit more rigorous (see the TBMM reference in my previous post). It was originally more rigorous still, but one of the reviewers complained of too many equations and I had to cut the math by about 50%. Nevertheless, an even more rigorous mathematical definition is forthcoming.

You wrote that I did 'babble' about 'functional Kolmogorov complexity' and noted that there was no such thing. I do wonder how carefully you read things sometimes. If you re-read my first post you will note that I said, 'Your post on Kolmogorov information is interesting, but not that relevant to biology unless you can publish a method to compute functional Kolmogorov complexity.' Jeffrey, the 'you' I referred to is none other than you, Jeffrey Shallit. You are a big fan of Kolmogorov complexity, so you are the one to publish something on it if you think it is so good. If you cannot address the problem first raised in the Nature article, however, then you may want to refrain from suggesting that Kolmogorov complexity is an adequate measure of the functional information encoded in DNA, RNA and proteins. In addition, you might want to reconsider asking the questions you ask in your opening blog. Those were once good questions, but they are now obsolete in view of what current science has to say about functional information in general, and in biopolymers specifically. Granted, most creationists aren't current either, but surely you do not want to look like them?

Mike Elzinga said...

Paul01 wrote:

“Is there not a difference between information and signal? As I understand it a signal can be made stronger through redundancy which helps the signal transmit through noise. But redundancy actually reduces the amount of information in the entire system since it shortens the description of that system.”

The terms are not the same, although “signal” is used very loosely.

The technique to which you are alluding is common in signal and image processing. It takes advantage of the fact that noise is generally random and uncorrelated compared with the “information” contained in the signal. By summing a repeated signal (called “boxcar integration”), the uncorrelated noise tends to cancel itself out, but the more highly correlated parts continue to add coherently.

There are other techniques as well. “Dithering” takes repeated snapshots of the image or signal from slightly different random spatial or temporal positions. The best explanation of why this works is in the Fourier transform (FT) domain of the image or signal, and it relies on what is colloquially referred to as the “shift theorem”. Shifts in position or time correspond to shifts in phase when transformed to the FT domain.

If the noise tends to be of higher frequency, the phases of the noise shift by much greater amounts and tend to cancel whereas the lower frequencies don’t shift as much and continue to add. This technique sometimes goes by the label “poor man’s low-pass filtering.”

In any case, the processes one uses to separate the information wanted in the signal from the unwanted noise depend on knowledge of the characteristics of each.

Thus it is not clear just what one means when one says that repetition or redundancy reduces information. I suspect it is just a bogus confusion. In the case of evolution, phenotypes get many chances against their environment. What survives is that which is most highly correlated relative to the current environment. What gets thrown away is “noise”. But who is to say which is which?

Anonymous said...

some examples of how genes evolve: arXiv/0901.0583

Jeffrey Shallit said...

Kirk:

You really need to work on your attitude; stop assuming that everyone else is stupid.

I've read Szostak's paper (not "Szostack" as you comically put it) and I've read your paper with Chiu, Abel, and Trevors, and I've read Hazen et al. None of them propose a rigorous mathematical definition of information. The definition of Hazen et al., for example, is completely domain-dependent; choose a different set of "functions"* that you're looking for and you get completely different results.

* "function" here is not a function in the mathematical sense , but rather the ability to carry out some task.

I never said that Kolmogorov complexity was a measure of "functional information", since there doesn't even yet exist a rigorous mathematical definition of what "functional information" means. That's your bizarre extrapolation. My post was about Kolmogorov complexity, a well-understood and studied measure of information. I wonder if you could solve the simple exercises I proposed.

As for "functional Kolmogorov complexity", I refuse to do your own work for you. If you wish to propose a definition for such a thing, by all means, go ahead. Otherwise, stop babbling about it as if it means something. Just because you give something a name doesn't mean it exists.

Jeffrey Shallit said...

Kirk:

I should add that ISI Web of Science shows that your 2005 paper with Chiu has received exactly 0 citations. I guess that means not too many people are excited about your "functional entropy". Don't feel bad; Hazen et al.'s PNAS article has only been cited 6 times. Not exactly burning up the biological or mathematical charts, here.

Kirk Durston said...

Jeffrey, if you'll check that link, you'll see our paper was published in December 2007, not 2005. It's unusual to see a lot of citations in the first year of publication. Hazen's paper was published 11 months ahead of ours. I do know from that TBMM website that there have been well over 1,000 downloads of our paper, so we shall see where it goes. There is some very interesting work being done right now. Expect to see a lot more papers dealing with functional information over the next few years.

Unfortunately, my laptop decided to pack it in this afternoon. I am now up to my ears in dealing with this, so I'm afraid I'll have to withdraw from this discussion for a while.

Ian said...

Thanks to Mark Chu-Carroll's blog, I've read about the trancendental number Ω. Would that related to the idea of "functional information theory"? Based on my (admittedly) quick skimming of the "Functional information and the emergence
of biocomplexity" paper, it would seem so...

Jeffrey Shallit said...

Kirk:

There you go again, assuming everyone else is stupider than you are.

When I said you had 0 citations, I am not referring to your 2007 paper (with Chiu, Abel, and Trevors). I was referring as I said to your 2005 paper with Chiu - the one published in Dynamics of Continuous, Discrete, and Impulsive Systems.
It has 0 citations.

If I had meant your 2007 paper, I would have said so.

MrG said...

I was interested in the comments on KC information being unable to tell the difference between "functional" information and gibberish.

But even from my naive point of view I ask: how can it? A data stream consisting of sheer gibberish may or may not be meaningful -- who knows except the people who are sending and receiving it if it's not enciphered text?

I think it was Shallit who was nailing Nancy Pearcy for claiming that one data stream full of gibberish was equivalent to another. A crypto hobbyist knows better, since each stream of gibberish makes a great encryption key. They are not equivalent, one key cannot decipher a message enciphered in another. Trying to read "function" or "meaning" into information is clearly tricky since one person's gibberish is another person's gold.

That's what I was getting at when I mentioned that the Darwin-bashers are likely to give up on KC information soon and just insist they mean "complex specified information" instead. Or make that "functional information" if you like. Or whatever anyone wants to call it, because it doesn't really exist.

Shallit's response to my comment about CSI suggested that he felt the switch would not be a big deal: OK, so you guys have at least stopped pretending you're not making things up completely. Cheers -- MrG

MrG said...

To add an insight: what has more information, a message or the same message enciphered? Shouldn't they have the same amount of information? Shouldn't the gibberish ciphertext have less?

No. The enciphered version of the message as a very good bet will have more information. Any good encryption key has to be random -- it can be constructed by throwing dice -- and the enciphered text is going to be far more "noisy" than the message itself. Cheers -- MrG

William Wallace said...

"All five questions are based on the Kolmogorov interpretation of information theory."

Stack the deck, much?

Information theory devoloped as a tool to describe, quantify, characterize communication channels.

Andrey Kolmogorov, according to wikipedia article on him, advanced various scientific fields, and among those listed in the introduction at wikipedia, information theory is noticable absent.

Compare to the wikipedia article on Claude Shannon.

Jeffrey Shallit said...

William Wallace:

I'd suggest not relying on WIkipedia for your understanding of mathematicias and their contributions. You might read Li and Vitanyi's book, AN Introduction to Kolmogorov Complexity and Its Applications, to appreciate Kolmogorov's contributions.

Anonymous said...

I have the impression that Komolgorov's definition of information is more or less a refinement of Shannon's,
and at a broad conceptual level they're not so different. Truth?

I've been puzzling here and haven't got a lot of validation, but then again if I said something definitely WRONG I know I'd hear about it!

Cheers -- MrG

William Wallace said...

Your deceit is not unexpected.

For any lurkers, pick up any book on information theory, and after reading it, you will usually be very hard pressed to describe Kolmogorov's contribution to information theory, but no similar problem will exist vis. a vis. Shannon.

For the evolanders out there, keep stacking the deck.

Jeffrey Shallit said...

William Wallace:

Your welcome here ends when you falsely accuse me of dishonesty. Watch your step, or you are banned.

I already suggested you read Li and Vitanyi's book. Have you done so yet?

William Wallace said...

Here is a test for evolanders:

Q1: In the field of Information theory, whose contributions were more important and more widely applicable:

A. Caude Shannon
B. Andrey Kolmogorov

If you cannot answer this question correctly, you have no business teaching undergraduates. I won't give the answer here, because that's part of the challenge: ask your evolander to provide an answer to Q1 above.

Jeffrey Shallit said...

Still haven't read the book, have you, Wallace? Still flogging your dead horse?

MrG said...

"Braveheart" here consistently sounds the same note over and over again. His postings of the past are a reliable model of his postings in the future. Cheers -- MrG

William Wallace said...

Of course I have not read the book, you just recommended it to me. I looked over some reviews, and it seems like it would be worth a read.

But my estimation aside, you're still avoiding the issue of why you would expect creationists or intelligent design proponents who have a real, or feigned, knowledge of information theory to know anything about the "Kolmogorov interpretation" of information theory.

I happen to have knowledge of information theory, was a member of the IEEE Information Theory Society and IEEE Communications Society, and received and regularly read Transactions on Information Theory and Transactions on Communications, respectively, and have read a number of books on information theory, communications, cryptography, finite automata, computer algorithms, optimization, coding, formal languages, etc., have written programs to successfully crack both known weak encryption algorithms and presumed strong encryption algorithm implementations, invented and analyzed a Boolean synthesis algorithm that turned out to be as merely as efficient as the already existing algorithm, and so on. Yet, I don't recall that Kolmogorov's name came up much—indeed, I don't recall ever hearing or reading the man's name before PvM mentioned his name recently.

Disclosure: I am no longer a member of either IEEE Society as work and family won in the competition for my time, and my work became more practical than theoretical. If Kolmogorov has found recent fame in either Society's journal I would not know.

Do you think that DNA sequences are objects in the mathematical or computer science sense? That DNA is an algorithm storage device? That DNA is best thought of as a communications channel, a code, or a computer program? That the standard information theory under the "Shannon interpretation" is somehow inferior?

If you want to be taken seriously by those not already in your choir, you should explain your beef with Shannon before moving goal posts to Kolmogorov, and then explain the esoteric interpretation of information theory that you insist your opponents be bound to when answering your deck-stacked questions. You should also disclose your assumptions without demanding that everybody has read the same book you have.

The conclusion follows from the premise.

William Wallace said...

Q1: Yes
Q2: Yes
Q3: Yes
Q4: Yes
Q5: Yes

And, upon reflection, Kolmogorov is not necessary to answer these questions in the affirmative.

Turns out the Kolmogorov interpretation is a red herring.

Jeffrey Shallit said...

William Wallace:

I would expect any person who claims to know about "information theory" would know of the Kolmogorov interpretation. I am sorry your education is so deficient. Luckily, there is a remedy: educate yourself.

If you read my post carefully, you would already know why I chose the Kolmogorov interpretation: it does not depend on any particular probability distribution. I think you need to improve your reading skills, while you're at it.

As for your "answers" to my questions, congratulations! You've proved you can simply copy the answers I provided. But you haven't shown your work. If an undergraduate provided such sloppy work, they would flunk. So do you.

William Wallace said...

Again, Kolmogorov is not necessary to answer these questions. If you're teaching at a university, and you were honorable, you'd resign until you educated yourself.

William Wallace said...

Although it is foolish to wager with a person who is deceitful and has a strange Kolmogorov fetish, how about this wager: I'll prove every answer without resorting to Kolmogorov, and if I do, you give up blogging. If I don't, I'll give up blogging.

Didn't think so.

Rich said...

William Wallace said:

"But my estimation aside, you're still avoiding the issue of why you would expect creationists or intelligent design proponents who have a real, or feigned, knowledge of information theory to know anything about the "Kolmogorov interpretation" of information theory."

The Executive Director of the ASA is Randy Isaac. Before he became our Executive Director he was the head of IBM's TJ Watson lab. As such he had direct access to the key "movers and shakers" of information theory. One of the conversations he had with Charles Bennett went as follows:

"This also leads to an important observation on the 'information' in the genome. Charles Bennett once gently corrected me, saying that technically, the more accurate term is 'complexity' not 'information.' The genetic code conveyed from one cell to its replicated cell is not 'information' as Shannon described. This 'information' is not independent of its physical embodiment. The physical embodiment IS the information. It is never converted from one medium to another. This is really complexity, not information. The supposed notions of conservation of information don't apply to the genetic code. It is not a message conveyed from one agent to another. Information about the genome and its sequence of course is classical information."[emphasis mine]

So, as for which is more applicable the father of quantum information theory votes for Kolmogorov over Shannon. So who is Charles Bennett? Here's the intellectual pedigree. Again Randy Isaac:

"Claude Shannon was the key pioneer of information theory. Rolf Landauer may have done the most to turn it into a bona fide hard science. Charles Bennett has been a leader in moving Shannon's ideas in the classical realm to the exotic world of quantum theory."

Erdos56 said...

Not knowing Kolmogorov seems to me a consequence of the history and the Cold War as much as anything. For EEs Kolmogorov doesn't appear in Wozencraft and Jacobs, for instance, while Shannon, Fano and Hamming feature prominently. It was the rise of digital that shifted focus to combinatoric formulations combined with the popularizations of Chaitin in the 80s that bridged computational theory with coding theory. Rediscovering Kolmogorov was a side-effect, though this doesn't absolve Mr. Wallace to my mind of his obligation to understand the topic before making assumptions about the depth of the contribution.

Jeffrey Shallit said...

William Wallace:

Your attempt to evade answering the questions is noted. So far no creationist has been able to provide the proofs. Why am I not surprised?

I warned you once that groundlessly impugning my integrity is not acceptable on my blog, but you have persisted. You are now banned. Further comments will be discarded. Congratulations.

Takis Konstantopoulos said...

William Wallace:

For someone who claims to have knowledge of Information Theory (which itself can be seen as an application of Probability) it is very surprising you don't know the name Kolmogorov. It is like saying you work in Mechanics but have not heard of Newton. That bad.

I applaud Jeffrey Shallit for having the patience to reply to you.

In particular, you seem to be a person who compartmentalizes knowledge in domains and subdomains, when you say...


Do you think that DNA sequences are objects in the mathematical or computer science sense? That DNA is an algorithm storage device? That DNA is best thought of as a communications channel, a code, or a computer program? That the standard information theory under the "Shannon interpretation" is somehow inferior?


Nobody said that Shannon's (theory not interpretation) is inferior. It's just that model-free theories are much stronger.

Shannon did lay the foundation of Information Theory. But the Russians did do a lot of work too. Khinchine, Uspensky, Kolmogorov, Holevo,... Do take a look.

Finally, regarding your comment


But my estimation aside, you're still avoiding the issue of why you would expect creationists or intelligent design proponents who have a real, or feigned, knowledge of information theory to know anything about the "Kolmogorov interpretation" of information theory.


Yes, one WOULD expect IDiots/creationist and all those crypto-christians who try to preach by hiding behind "science" to DO know about the "science" they use and know it well. Take for instance Dembski who does use "Information Theory" left and right, but in a totally childish way! He is an ID-crypto-christian-creationist. He holds a PhD in probability theory. His advisor was Patrick Billingsley. Do you know Billingsley's work in Information theory? (It's not well-known among engineering circles, but he is a very good probabilist). Despite that, Dembski did NOT learn Information Theory. He obviously did not understand his advisor's book. Here, read a review of that (early 60's) book and do look carefully: Both Shannon and Kolmogorov are mentioned therein. Then get a copy of the book and read it.

Mark Boyd said...

The mutations described in Q1,Q2 and Q3 can change any non-empty string into any other string.
So, under any remotely sane definition of information, one of the answers to Q1, Q2 and Q3 has to be "yes".
This alone is enough to show that anyone who claims that mutations can't increase information is clueless.

Unsympathetic reader said...

Zeno: "Since Kolmogorov information defeats the creationists in every instance, I'm certain that creationists (once they finally, slowly, realize this), will decide that Kolmogorov information is the wrong model for biological information. See? Problem solved!"

Yes, that is why 'Spetnerian metrics' were developed. Paradoxically, they comprise a set of things that cannot be contained by any set.

Blue Devil Knight said...

Interesting discussion.

I am much more familiar with Shannon than Kolmogorov (perhaps because I am a neuroscientist, where we use Shannon all the time and not so much Kolmogorov).

I have a couple of concerns here. First, the creationists that use Shannon could say you're simply changing the subject. They make their little information(Shannon) argument, you reply with answers concerning information(Kolmogorov), so have you directly rebutted their arguments? (That said, their understanding of Shannon information isn't particularly sound either).

Second, while it is cute to not give the answers and pose this as a test, there are a lot of readers perfectly willing to admit we don't understand enough about Kolmogorov complexity measures to answer the questions (e.g., I would assume a shorter string needs a shorter program to create it, so has less complexity). It would be helpful for those of us that don't have the time to study this in depth to get some feel for the arguments/proofs involved. That, at least, would arm those of us less informed with some good stuff to talk about. How am I to ask a creationist these questions if I have no clue about them myself?

I am a biologist and don't know the answers, and I would predict that a tiny fraction of a percentage of practicing biologists would know the answers to these "elementary" questions from theory of computation/math. On the other hand, we aren't running around making arguments using these notions.

William Brookfield said...

Hi Jeff,

Just to let you know, I have posted a reply to your test at ISCID and on my ICON-RIDS blog.

Cheers,
William Brookfield

Jeffrey Shallit said...

Congratulations, William Brookfield! Your response is a model of content-free crackpottery. I particularly enjoyed the pseudomathematical babble of "irreducible correlational couplings".

While you're at it, be sure to contact creationist William Wallace, who agrees with me that the answers to all questions are "yes" (yet still manages to call me "deceitful".) You two should duke it out -- it would be better than watching professional wrestling!

Takis Konstantopoulos said...

Could someone enlighten me who William Brookfield is? This ISCID organisation looks like a complete fraud. As far as I can tell it consists of people who try to (ab)use science to prove that god or something of that sort exists. I observed that John Lennox (I have commented on his naive arguments a few times) is a member of this ISCID. But Brookfield's biography does not appear here so I don't know how much information theory he knows.

But, judging from his statement,
[o]nly minds can give birth to information. This is because only the “minds eye” (with its insight, hindsight and foresight) possesses an in inner mental space, large enough to initially nurture and hold information for subsequent projection outward,
he appears confused. And how could he not be, when he is set to explain religion using information theory? It's for laughs.

William Brookfield said...

Hi Takis,

I am not an ISCID fellow. I am just an ISCID *member* (since 2002). I am an amature scientist. I have studied the sciences all of my life. I have also studied music and have always made my living in music. I am 52 years old. I have never belonged to any religion. I do however have a metaphysical *philosophy* of life.

Cheers,
William Brookfield

Takis Konstantopoulos said...

William: Thank you. Although I do not mind people having their own philosophies, metaphysical or not, I find it utterly offensive to see (a) established (?) scientists (Lennox) abuse science in order to "prove" that some divine entity exists and (b) scientists with dubious qualifications (Dembski) to try to entice their audiences/readers to their particular version of religion.

Besides, I have not seen any intelligent argument coming from this hoax called creationism/intelligent-design. I read (some of) the papers of, say, Dembski, and started laughing out loud. Then I saw that this guy is taken seriously by some. This is dangerous. It gives the wrong impression about probability, mathematics, and science.

Although I do know many scientists with religious beliefs, the mix of religion and science, or, rather, the use of the latter to explain the former, is inappropriate.

Rich said...

I am a member of the American Scientific Affiliation which is an explicitly Christian organization. Professionally I am an electrical engineer and I use information theory in my work. Many of us in the ASA who are familiar with information theory are shocked by two things.

1. How poorly the Intelligent Design Movement understands information theory and how widespread is that misinformation amongst Christian lay people. As I stated previously our current Executive Director ran IBM's research laboratory prior to retiring. Both he and I have been trying to correct ID's bad concept not only of information theory but also of what constitutes randomness. I will note in passing that AIT gives a good working definition of randomness.

2. ID's bogus claim that they are about science and not religion. That one is so transparent it's laughable. I believe in lower-case intelligent design because of my religious beliefs but not the other way around. The Intelligent Design Movement's arguments simply don't "get there". When I have been on Uncommon Descent and I clearly stated and labelled my religious perspective I got banned in 24 hours. It's not like the atheists haven't figured this one out and pretending otherwise is just plain silly. What this shows is ID is more beholden to their broken arguments than they are to their religious faith and in the end bring disrepute to Christianity.

Anonymous said...

I prefer the term "Solomonoff Complexity" since Kolmogorov was a Christian.

Doppelganger said...

Maybe I am missing something, but in one of Durston's comments on a different issue on this blog, he wrote something about how information 'had been encoded into' a gene. That seemed to me to be the core of the creationist folly re: infromation - they start out assuminng that the 'information' in a gene was pre-planned and implemenmted, as it were, since to encode something means to 'convert a message into a code.' We know what the 'code' is, but we don't get a 'message' until after the fact.

YECs have it backwards - they think that the 'message' came first and was magically converted into a code. They trot out verbose expositions about 'information' and gussy it up with all sorts of formulae, but it seems to me that none of that even matters.

It is just a facade.

And I'm sure you know this, Jeff, but ALL creationists are arrogant elitists that have convinced themselves that they are smarter than everyone else. That is why they feel so comforatbel pontificating on matters that are well outside of their actual area of expertise.

Ali Razavi said...

Hi Prof. Shallit,

I have some information theory questions. Is it accurate to aver that two structures contain the same information if there exists a bijection between the two? Sorry if the question is rather trivial, I have been away from math for a while. Another Q, what are the mathematical procedures for establishing that two strings convey the *same* information (e.g. one is a paraphrase of the other) in the context of a formal language, say CFG. I understand that KC or Entropy only indicate the *amount* of information contained by two pieces of data.
Last, is there any work on isolating structural information from attributed information (and also perhaps typed information) in such mathematical structures as typed attributed graphs/Trees (DNA sequences also)?

PS. My motivation for asking these is purely research oriented.

Jeffrey Shallit said...

Hi Ali,

No, it's not correct to say that two structures have the same information if there is a bijection. For example, take any word with high Kolmogorov complexity, say w, and the string 1. Then there is a bijection between 1 and w; yet 1 has little information and w has lots.

If you want to compare the information in x to y, then you need the relative Kolmogorov complexity K(x|y) or K(y|x). This is discussed in detail in Li and Vitanyi's book.

Dr, No said...

Interesting discussion. I have dabbled some in information theory and read Shannon and Komolgorov. I have had a long term interest in random grammers, L-systems and various sorts of what can be generally defined as transition systems. I must say that I have major frustations with all known definitions of what Information Theory calls information. Although I am sure that they are measuring something that is a real attribute of a system I cannot pragmatically see how they can call it information. When the information content of a random string of characters is calculated by accepted techniques as being maximal this to me is nothing bu a reductio adsurbum indicating a systematic problem with basic assumptions. I think if you make an anolgy with mass you can get somewhere (although we are still not there). We measure mass by its effect on other masses we should measure a systems information by its effect on another systems information content. In other words the import of information is what is important. I think this is probably the reason why people studying bio-polymers are finding the current thought on the field of Information inadequate. Although the definitions of some kind of a functional information are still mathematically unclear I think this is the direction towards finding an increased clarity in this problem. Both Shannon and Kolmogorov solved the same problem and I agree tha the Kolmogorov formulation being model free is superior but I don't think they solved the problem everyone thought they had solved and this is becoming clear. The beginning of wisdom is the understanding of what you do not know and we do not know what the relation between complexity and information is. Maybe in a few more years we will. We really don't have a clear definition of what information is. I would propose that information is the ability of a system to create an effect in another system through the implicate structure of the first system. However there are so many ill-defined terms and concepts that need to be investigated in that statement that I fear it will be amny years before we have a clear cut understanding. Fortunately the cutting edge in this subject is being advanced by purely pragmatic efforts to understand biological information. It will be interesting to see where this field moves over the next 10-15 years.

Jeffrey Shallit said...

Dr. No:

I think it's perfectly fine if people want to create new versions of information theory. But if they do so, they need to create a version in which the quantities are clearly and rigorously defined, so that anyone can calculate it and come up with the same answer. Creationists haven't done that.

Anonymous said...

My latest article on the subject of "Creationist Information" can be found HERE.. This article includes an equation for my version of "information."

More info on me can be found here..

http://www.iscid.org/boards/ubb-get_topic-f-6-t-000693.html Constructive comments are welcome.

IvanM said...

Gems of insanity like Mr. Brookfield's "latest article" are the main reason I subscribe to the comment feed here, I think. :-)

It shows astonishing humility that he's given the name "Brookfield information" to his perfectly unintelligible definition.

Jeffrey Shallit said...

Brookfield:

How about calculating the value of your information measure for the following strings? Be sure to show your work!

#1: 11111111111111111111111111111111111111111111111111

#2:
10101010101010101010101010101010101010101010101010

#3:
01101010001010001010001000001010000010001010001000

#4:
11001011111010110010001011101100100101010110100010

#5:
01010011101100110000111110101110010111001111011001

Takis Konstantopoulos said...

W. Brookfield:

I have a constructive comment for you. Please read something like this and try the exercises here, in particular the probability exercises.

Anonymous said...

Hi Jeff,

I happened across someone referring to your quiz and couldn't resist. I answered yes to each question but had to think about deletions increasing Kolmogorov complexity for a moment before I thought of a repeating sequence like xyzxyzxyz and realized if I deleted one character it would take more information to fully describe the string.

The problem I have with Kolmogorov complexity is it doesn't seem to bear much resemblance to genetic complexity. For instance, duplicating one of Shakespeare's plays will increase the Komogorov complexity a smidgeon but how many times would any number of duplications change the ending?

More to the point let's say we melt an ice sculpture and let the water soak into the ground. The Kolmogorov complexity of the water is much greater after it melts.

Just so, if we disintegrate a genome into random molecules the Kolmogorov complexity increases.

I'm sure you're informed enough to know the difference between information that is organized for a purpose and that which is not.

So what's the point of this quiz?

--DaveScot

Jeffrey Shallit said...

The problem I have with Kolmogorov complexity is it doesn't seem to bear much resemblance to genetic complexity.

What is the formal definition of genetic complexity?

For instance, duplicating one of Shakespeare's plays will increase the Komogorov complexity a smidgeon but how many times would any number of duplications change the ending?

Why do you think the ending of a play is a good indicator of the complexity of strings?

I can easily give an example of a dialogue where duplicating one word changes the meaning.

I'm sure you're informed enough to know the difference between information that is organized for a purpose and that which is not.

OK, I'll bite. How can we tell the difference between information "that is organized for a purpose" and other kinds?

For example, in the following strings, which are "organized for a purpose" and which are not?

#1: 001001001100011011111010010111010010111000100000100000100111

#2:
010100111011001100001111101011100101110011110110010000001101

#3:
101010101010101010101010101010101010101010101010101010101010

#4:
101111101111101110101110111110101111101110101110101110101001

Anonymous said...

Jeff,

I don't know how to determine by formula what information is organized for a purpose and which is not.

You ask for a formal definition for genetic complexity. I don't have a short answer anymore than I have a short answer for a formal definition of literary complexity. Yet we have no difficulty recognizing the difference between one of Shakespeare's plays and a bowl of alphabet soup.

It would seem that one objective indicator of purposeful complexity would be data structure that follows rules where those rules aren't imposed by laws of physics. In the case of Shakespeare we find rules of spelling, grammar, and syntax being followed. The laws of physics as we understand them doesn't impose those rules.

Similarly with genomic complexity we find rules of codon structure & translation, intron & exon editing rules, start & stop frame rules, and many others known and unknown. Yet we really can't definitively tell a coding gene apart from a non-coding sequence until we actually observe the coding gene being put to use to produce a protein product which serves some purpose.

It does seem obvious though that there that there is no correlation between Kolmogorov complexity and genetic or literary complexity. If we were to randomize the text in your blog the Kolmogorov complexity would increase but the purpose of the text would be lost. If we randomize the base sequence of an organism's genome the Kolmogorov complexity would increase but the organism would surely die.

Unless and until there is a correlation established between Kolmogorov complexity and genetic complexity I fail to see how understanding the former sheds any light on the latter.

-DaveScot

Jeffrey Shallit said...

Dave:

Information theory - both Shannon and Kolmogorov - are well-recognized theories of information that have a large body of work behind them. And both have proven useful in biology. Does this mean they will be useful in every situation? Of course not.

Creationists like Dembski and Meyer claim they have a new definition of information but -- like you -- they cannot give a coherent, rigorous definition of it or explain how to compute it. Until they do, the ball is still in their court. It is simply not legit to throw around terms like "specified complexity" and "complex specified information", pretending they are meaningful, when in fact they have no coherent meaning.

As for alphabet soup versus Shakespeare, we recognize a difference because we recognize one as an artifact - the characteristic product of human activity. But this does not extend to the information in the genome, which most biologists do not recognize as an artifact.

In any event, a definition is not tested by contrasting examples in the centers of their definitions; it is tested by examples near the boundaries. So I repeat again, of the four binary strings I gave above, which have "purposeful complexity" and which do not?

As for Kolmogorov shedding any light on "genetic complexity", I think it does. For example, we know that the genome is very difficult to compress, which is good evidence that random processes were very strongly involved in its creation.

Geoff said...

Hey jeffrey, sorry to breath life into such an old thread, but i've only just found your blog and I find this subject particularly interesting.

So as best as I can tell, the whole ID argument for CSI basically boils down to them trying to draw some kind of definitional difference between the total amount of information possible in a system (which could be made up of white noise for all we know) and functional information that has a specific communicative purpose. But it seams to come down to this:

DaveScot: “You ask for a formal definition for genetic complexity. I don't have a short answer anymore than I have a short answer for a formal definition of literary complexity. Yet we have no difficulty recognizing the difference between one of Shakespeare's plays and a bowl of alphabet soup.”

But (and correct me if i'm wrong here as i'm only the most simple of layman regarding information theory) someone that only knows Japanese probably wouldn't know the difference between the Shakespeare play and the alphabet soup. The meaningfulness of the informations content (or its specified functionality if you wish to label it so) is entirely subjective.

Ignoring the Japanese only speakers entirely, if we even take two English stories of similar length (and probably similar kolmogorov and shannon complexety/information), how do you calculate which has more or better content in its information. Subjectivity and personal taste are one thing, but how can you possibly mathematically quantify Shakespeare as being better than Oscar Wilde?

The same thing applies to biology. Creationists love to frame evolution in the light that we apparently became more 'awesome' than lower life forms, but lower by what criteria? How do you mathematically quantify lungs as being better than gills? It almost seems like an Information Theory version of the anthropic argument for god.

As far as I can see, there is no possible way this content can be mathematically quantified... and yet that's essentially what the ID/Creationists are trying to do, or claiming they have already done.

So I call bullshit on ID and CSI, but i'm not an information theorist. Am I wildly off target here?

Anyway, thanks for the great blog posts. By the by, are there any good beginners/layman books on information theory that you'd suggest checking out? I'm pretty good with maths, science and computers, but I've never really looked into Information Theory with any kind of depth. I think perhaps I should.

Jeffrey Shallit said...

Geoff:

I don't think you're wildly off target. Shannon's theory doesn't deal with "meaning", and nobody has a good way to measure "meaning".

As for a good introduction to Shannon, there is Renyi's A Diary of Information Theory. Too bad he died before completing it.

Takis Konstantopoulos said...

Amazing coincidence. I was just reading, on my email, Geoff's comment and immediately thought to suggest Alfred Renyi's book. And then I saw Jeffrey's suggestion!

Well, this only shows that Renyi's book is a masterpiece, a very good introduction to Information Theory (and some combinatorics).

Geoff said...

Thanks for the replies guys, but I have to ask. When you say:

Shannon's theory doesn't deal with "meaning", and nobody has a good way to measure "meaning".

Do you mean 'nobody has a good way to measure meaning' or 'nobody has any way to measure meaning'?

I guess what i'm getting at is to ask if there is anything in computer science, mathematics, information theory, or any other field of science that you know of, which would even remotely resemble a method to measure “meaning”, or “design”, or “purpose”, or “specified complex information” or whatever people like Bill Dembski wish to call it, as a quantifiable unit?

If the answer is no, as a renowned PhD computer scientist who seems to have quite an expertise in information theory, do you think it is something that is even theoretically possible to be quantified and measured? My guess is probably not given what a subjective property “meaning” is, but I'd prefer to ask an expert than simply assume.

Thanks for the book recommendation. I'll see if I can score one off amazon. Though I guess, if it is a diary of information theory, it would kind of make sense that he died before it was finished given that the field is still around today right? Hehehe

Thanks again for taking the time to read the comments of a layman guys.

A.J said...

Mr shallit Why are you so incredibly rude and then expect people to treat you with the utmost respect?
You are a down right jerk. I have no interest in you publishing this comment. Just letting you know that you are the one who needs to work on his attitude and have some decency.

Jeffrey Shallit said...

AJ:

Thank you for your deep, penetrating, and relevant insights. And so well expressed!

Takis Konstantopoulos said...

A.J. One has to be rude with idiots like creationists/intelligent designers. They are a threat to civilization and reason, just like all the religious fundamentalists (see, e.g., Rick Perry). But, rudeness aside, you can see that none of the challenges posed by Shallit, or any thinking person, can be answered by creationists. Indeed, the nature of their subject (preach and believe in moronic things) leaves them no space in their brain to argue rationally.