The example is given of flipping a presumably fair coin 500 times and observing it come up heads each time. The ID advocates say this is clear evidence of "design", and those arguing against them (including the usually clear-headed Neil Rickert) say no, the sequence HH...H is, probabilistically speaking, just as likely as any other.
This is an old paradox; it goes back as far as Samuel Johnson and Pierre-Simon Laplace. But neither the ID advocates nor their detractors seem to understand that this old paradox has a solution which dates back more than 15 years now.
The solution is by my UW colleague Ming Li and his co-authors. The basic idea is that Kolmogorov complexity offers a solution to the paradox: it provides a universal probability distribution on strings that allows you to express your degree of surprise on enountering a string of symbols that is said to represent the flips of a fair coin. If the string is compressible (as 500 consecutive H's would be) then one can reject the chance hypothesis with high confidence; if the string is, as far as we can see, incompressible, we cannot. It works because the proportion of compressible strings to noncompressible goes to 0 quickly as the length of the string increases.
So Rickert and his defenders are simply wrong. But the ID advocates are also wrong, because they jump from "reject the fair coin hypothesis" to "design". This is completely unsubstantiated. For example, maybe the so-called "fair coin" is actually weighted so that heads come up 999 out of 1000 times. Then "chance" still figures, but getting 500 consecutive 1's would not be so surprising; in fact it would happen about 61% of the time. Or maybe the flipping mechanism is not completely fair -- perhaps the coin is made of two kinds of metal, one magnetic, and it passes past a magnet before you examine it.
In other words, if you flip what is said to be a fair coin 500 times and it comes up heads every time, then you have extremely good evidence that your prior belief about the probability distribution of flips is simply wrong. But ID advocates don't understand this and don't apply it to biology. When they view some biological structure, calculate the probability based on a uniform distribution, claim it is "specified", and then conclude "design", they never bother to consider that using the uniform distribution for probabilities is unfounded, because the causal history of the events has not been taken into account. Any kind of algorithmic bias (such as happens when random mutation is followed by selection) can create results that differ greatly from the uniform distribution.
Elsberry and I discussed this in great detail in our paper years ago, but it seems neither side has read or understood it.
25 comments:
I’ve been following that same discussion also with some bemusement.
I think those arguing against the ID/creationists are trying to say that all permutations would be equally probable for a fair coin; but all combinations not. However, I don’t see the words “permutation” and “combination” being contrasted at all in the “argument.”
The degree of “surprise” can come only with some knowledge of the probability distribution of the combinations.
The discussion got started because of an implication by Salvador T. Cordova over on The Skeptical Zone that one can assert that a one-off event, such as the origin of life, was highly improbable. It is equivalent to asserting one can determine the probability of a deviation from an expectation value without knowing what the probability distribution looks like.
This argument was a side track that avoided addressing the real problem which asserted that sampling outcomes from an “ideal gas” of inert objects is a stand-in for the way atoms and molecules behave.
The probability distribution of coin flips has nothing to do with the probability of the formation of a complex molecule; but Sal, as do all ID/creationists, simply proceed to argue as though tornados in a junkyard not producing 747s demonstrates that complex assemblies of molecules are too improbable to happen.
I had been trying to get Sal to scale up the charge-to-mass ratios of protons and electrons to kilogram-sized masses separated by distances on the order of a meter and calculate the energies of interaction in joules and in megatons of TNT. After that, fold in the rules of quantum mechanics and then justify the ID/creationist penchant for using inert objects such as coin flips as stand-ins for the behaviors of atoms and molecules.
It didn’t work. The “argument” went off the rails into conflating permutations with combinations.
I have responded to this post at Uncommon Descent.
Long comment, sorry. Good post, and right on the money about not-chance does not equal design. I read Kirchherr et al. last time you referred to it. Fascinating paper, all looking very promising and then, on page 14:
"But there’s a catch – none of these schemes can actually be carried out. The complexity of a string is non-computable."
This seems to raise my hopes only to dash them ...
a) To do science, you need to do X.
b) With mathematical certainty, you can't do X.
c) Therefore, you can't do science.
I have a few more questions. (Not criticisms. Just my ignorance.):
* Are there any reliable, practical methods for estimating the complexity of a string? Kirchherr et al. suggest that "a computable approximation is length of some program p to compute s”. Is there any practical guide to finding "some" appropriate program?
* It's not entirely clear to me as a physicist how to convert an equation into a string for the purpose of calculating complexity. Mathematical elegance isn't necessarily length of equation (though there are clear cases: http://www.preposterousuniverse.com/blog/2011/02/26/dark-matter-just-fine-thanks/). How should one penetrate the veneer of convenient mathematical formalisms? Do I just write an integral sign, or expand using a Riemann sum, or a Lebesgue integral or head all the way back to the Principia Mathematica?
* The problem that many physicists want solved in this area is selecting the prior on a certain parameter. That is, for some observable x, we want the probability distribution p(x) before we do the experiment that we will observe the value x. Is that the complexity of a decimal (say) string of x? Given experimental errors, we will only constrain a few digits of x, and for a physical quantity with units the exact string representing x is arbitrary. There is no reason to prefer x_1 = 2.00000 metres over x_2 = 2.18972 metres, since the metre is merely a convention. (i.e. If I define a new unit, the smetre, then I can make x_1 = 1.82671 smetres, x_2 = 2.00000 smetres).
Should we only put a prior on dimensionless parameters? Which parameter? x, x^2, 1/x, sqrt(x), log(x), exp(x) ... (Is the universal distribution normalisable over an infinite or semi-infinite range?)
Most physicists would think that considerations of shortest computer programs printing strings are pretty useless here. Searching for ways to compress the expression for the fine-structure constant could easily turn into the numerology you've been warning us against.
* The universal distribution doesn't necessarily take into account all the information we have. Suppose there are two sets of 10 hands in poker, both with identical very low complexity but the Hand A will win every time, while Hand B gives no systematic advantage over a random hand. So, I assume we conclude from complexity considerations that the chance/random/uniform probability hypothesis is ruled out, but that on Hand B the "cheating" hypothesis isn't thereby made any more likely since there is no reason for a cheater to prefer Hand B over not cheating. Correct? We treat the universal distribution as giving the zero-information prior, and incorporate additional information in the normal, Bayesian way. Yes?
* I've never seen a physics / astronomy / cosmology paper mention let alone use the universal distribution or Kolmogorov complexity to evaluate priors on hypotheses. Are we all doing science wrong? Is what scientists usually do with data a decent approximation to the universal distribution method?
Is there any practical guide to finding "some" appropriate program?
Well, that's what compression programs like "compress" do. There is, of course, no universal compression, but at least it gives some idea about the complexity of texts. An interesting thing is that organismal DNA is largely incompressible, which is good evidence of a strong random component to the sequence of base pairs.
The universal distribution doesn't necessarily take into account all the information we have.
True, but there is the notion of conditional Kolmogorov complexity.
That's all I can say at the moment (travelling) but maybe I'll have more to say in the future.
One thing that I don't think that I've ever seen from the advocates of "intelligent design" is a calculation of the probability, from their point of view, on the hypothesis of intelligent designer(s), capable of doing more things than can be done by natural causes.
ISTM that the "intelligent design" hypothesis should decrease the probability of any specific outcome.
TomS
But the ID advocates are also wrong, because they jump from "reject the fair coin hypothesis" to "design".
We do? Reference please.
For example, maybe the so-called "fair coin" is actually weighted so that heads come up 999 out of 1000 times.
How did it become weighted?
Or maybe the flipping mechanism is not completely fair -- perhaps the coin is made of two kinds of metal, one magnetic, and it passes past a magnet before you examine it. In other words, if you flip what is said to be a fair coin 500 times and it comes up heads every time, then you have extremely good evidence that your prior belief about the probability distribution of flips is simply wrong.
And how did that happen?
Again all you are doing is adding more design into it. A designer weighted the coin. A designer made the coin out of two different metals. A designer magnetized it.
As for biology, well there still isn't any evidence that natural selection (which includes random mutations) is a designer mimic. Meaning it doesn't do anything. And that is what IDists do- we look to the evidence- and your position doesn't have any.
Cross posted at TSZ, also at UD:
I don't think I've ever seen a thread generate so much heat with so little actual fundamental disagreement!
Almost everyone (including Sal, Eigenstate, Neil, Shallit, Jerad, and Barry) is correct. It’s just that massive and inadvertent equivocation is going on regarding the word “probability”.
The compressibility thing is irrelevant. Where we all agree is that "special" sequences are vastly outnumbered by "non-special" sequences, however we define "special", whether it’s the sequence I just generated yesterday in Excel, or highly compressible sequences, or sequences with extreme ratios of H:T, or whatever. It doesn't matter in what way a sequence is "special" as long as it was either deemed special before you started, or is in a clear class of "special" numbers that anyone would agree was cool. The definition of “special” (the Specification) is not the problem.
The problem is that “probability” under a frequentist interpretation means something different than under a Bayesian interpretation, and we are sliding from frequentist interpretation (“how likely is this event?”) which we start with, to a Bayesian interpretation (“what caused this event?”) , which is what we want, but without noticing that we are doing so.
Under the frequentist interpretation of probability, a probability distribution is simply a normalised frequency distribution - if you toss enough sequences, you can plot the frequency of each sequence, and get a nice histogram which you then normalise by dividing by the total number of observations to generate a "probability distribution". You can also compute it theoretically, but it still just gives you a normalised frequency distribution albeit a theoretical one. In other words, a frequentist probability distribution, when applied to future events, simply tells you how frequently you can expect to observe that event. It therefore tells you how confident you can be (how probable it is) that that the event will happen on your next try.
The problem is arises when we try to turn frequentist probabilities about future events into a measure of confidence about the cause of a past event. We are asking a frequency probability distribution to do a job it isn't built for. We are trying to turn a normalised frequency, which tells us the how much confidence we can have of a future event, given some hypothesis, into a measure of confidence in some hypothesis concerning a past event. These are NOT THE SAME THING.
So how do we convert our confidence about whether a future event will occur into a measure of confidence that a past event had a particular cause? To do so, we have to look beyond the reported event itself (the tossing of 500 heads), and include more data.
Sal has told us that the coin was fair. How great is his confidence that the coin is fair? Has Sal used the coin himself many times, and always previously got non-special sequences? If not, perhaps we should not place too much confidence in Sal’s confidence! And even if he tells us he has, do we trust his honesty? Probably, but not absolutely. In fact, is there any way we can be sure that Sal tossed a fair coin, fairly? No, there is no way. We can test the coin subsequently; we can subject Sal to a polygraph test; but we have no way of knowing, for sure, a priori, whether Sal tossed a fair coin fairly or not.
So, let’s say I set the prior probability that Sal is not honest, at something really very low (after all, in my experience, he seems to be a decent guy): let’s say, p=.0001. And I put the probability of getting a “special” sequence at something fairly generous – let’s say there are 1000 sequences of 500 coin tosses that I would seriously blink at, making the probability of getting one of them 1000/2^500. I’ll call the observed sequence of heads S, and the hypothesis that Sal was dishonest, D. From Bayes theorem we have:
P(D|S)=[P(S|D)*P(D)]/[ P(S|D)*P(D)*+ P(T|~D)*P(~D)]
where P(D|S) is what we actually want to know, which is the probability of Sal being Dishonest, given the observed Sequence.
We can set the probability of P(S|D) (i.e. the probability of a Special sequence given the hypothesis that Sal was Dishonest) as 1 (there’s a tiny possibility he meant to be Dishonest, but forgot, and tossed honestly by mistake, but we can discount that for simplicity). We have already set the probability of D (Sal being Dishonest) as .0001. So we have:
P(D|S)=[1*.0001]/[1*.0001+1000/2^500*(1-.0001)]
Which is, as near as dammit, 1. In other words, despite the very low prior probability of Sal being dishonest, now that we have observed him claiming that he tossed 500 heads with a fair coin, the probability that he was being Dishonest, is now a virtual certainty, even though throwing 500 Heads honestly is perfectly possible, entirely consistent with the Laws of Physics, and, indeed, the Laws of Statistics. Because the parameter (P(T|~D) (the probability of the Target given not-Dishonesty) is so tiny, any realistic evaluation of P(~D) (the probability that Sal was not Dishonest) , however great, is still going to make the term on the denominator, P(T|~W)]P(~W), negligible, and the denominator always only very slightly larger than the numerator. Only if our confidence in Sal’s integrity exceeds 500 bits will we be forced to conclude that the sequence could just or more easily have been Just One Of Those Crazy Things that occasionally happen when a person tosses 500 fair coins honestly.
In other words, the reason we know with near certainty that if we see 500 Heads tossed, the Tosser must have been Dishonest, is simply that Dishonest people are more common (frequent!) than tossing 500 Heads. It’s so obvious, a child can see it, as indeed we all could. It’s just that we don’t notice the intuitive Bayesian reasoning we do to get there – which involves not only computing the prior probability of 500 Heads under the null of Fair Coin, Fairly Tossed, but also the prior probability of Honest Sal. Both of which we can do using Frequentist statistics, because they tell us about the future (hence “prior”). But to get the Posterior (the probability that a past event had one cause rather than another) we need to plug them into Bayes.
The possibly unwelcome implication of this, for any inference about past events, is that when we try to estimate our confidence that a particular past event had a particular cause (whether it is a bacterial flagellum or a sequence of coin-tosses), we cannot simply estimate it from observed frequency distribution of the data. We also need to factor in our degree of confidence in various causal hypotheses.
And that degree of confidence will depend on all kinds of things, including our personal experience, for example, of an unseen Designer altering our lives in apparently meaningful and physical ways (increasing our priors for the existence of Unseen Designers), our confidence in expertise, our confidence in witness reports, our experience of running phylogenetic analyses, or writing evolutionary algorithms. In other words, it’s subjective. That doesn’t mean it isn’t valid, but it does mean that we should be wary (on all sides!) of making over confident claims based on voodoo statistics in which frequentist predictions are transmogrified into Bayesian inferences without visible priors.
How did it become weighted?
I knew there would be an ID advocate so stupid that this objection would be made. Congratulations, Joe!
It makes no difference how to my argument, of course. Maybe somebody weighted it, or maybe it lay on the ground for a thousand years eroding away until one side was heavier than the other.
But, as usual, ID advocates miss the point.
Meaning it doesn't do anything.
Liar. But then, all ID advocates can do is lie.
We know from observation that mutation and selection can accomplish amazing things. And there is simply no mathematical barrier to producing complex "specified" strings through mutation and natural selection. This latter is explained in detail in my paper with Elsberry.
In other words, the reason we know with near certainty that if we see 500 Heads tossed, the Tosser must have been Dishonest, is simply that Dishonest people are more common (frequent!) than tossing 500 Heads.
Perhaps - but doesn't it depend entirely on the situation? Perhaps the tosser is an innocent victim and the coin itself has been switched. That is my point. Observing 500 heads in a row just tells us that, with near certainty, our belief about the probability distribution was wrong. Exactly how it became wrong is another thing entirely.
Yes, indeed, Jeffrey, and when I first wrote that overlong comment, I allowed for that possibility, and then cut it to keep life simple (as I also cut the possibility that Sal had meant to use a two-headed coin, had accidentally used a fair coin, but got 500 Heads anyway).
And we could include that probability in second node of the Bayesian decision tree.
Here is the central problem in Sal's post at The Skeptical Zone.
"I could go on and on, but the point being is we can provisionally say the binomial distribution I used for coins also applies to the homochirality in living creatures, and hence we can make the design inference and assert a biopolymer has at least -log2(1/2^N) = N bits of CSI v1.0 based on N stereoisomer residues. One might try to calculate CSI v2.0 for this case, but me being lazy will stick to the CSI v1.0 calculation. Easier is sometimes better."
What possible justification does Sal have for using a binomial distribution for the emergence of homochirality in the molecules of life?
What is the probability distribution of a one-off event - as far as we humans know - that resulted in the emergence of living organisms and evolution?
Nevertheless, this it typical of ID/creationist "calculations" about what is "improbable."
Yes, Mike, the ID literature abounds with these kinds of claims. Dembski, in his famous calculation in No Free Lunch -- you know, the one that was off by 65 orders of magnitude! -- also bases his numbers on a scenario that no biologist in the world thinks is accurate.
Poor Joe, like most ID advocates, just hasn't really read the ID literature with care.
" Perhaps the tosser is an innocent victim and the coin itself has been switched. That is my point. Observing 500 heads in a row just tells us that, with near certainty, our belief about the probability distribution was wrong. Exactly how it became wrong is another thing entirely."
It also tells us with near certainty that someone (as opposed to erosion) constructed a weighted coin. And that's what the ID advocates are claiming. Nothing else, your insulting accusations notwithstanding.
Hey, Jeff, you're famous:
http://www.uncommondescent.com/intelligent-design/jeffrey-shallit-demonstrates-again-that-he-is-clueless-about-even-very-basic-design-concepts/
I have a new post at TSZ explaining my resolution of the coin-flip paradox:
A resolution of the 'all-heads paradox'
It also tells us with near certainty that someone (as opposed to erosion) constructed a weighted coin. And that's what the ID advocates are claiming. Nothing else, your insulting accusations notwithstanding.
No, it doesn't tell us that. We need to know a lot more about the situation, more than just 500 coin flips coming up heads. My point is precisely that just the mathematical content of 500 coin flips alone tells you nothing; you need more information about the physical and social situation involved.
As for "insulting accusations", please direct yourself to Barry Arrington, who has been far more insulting than I have.
Eoin,
You're going to bitch about Jeff's "insulting accusations" when UD puts up a whole POST just to call Shallit "clueless"?
Can you IDjits just stick to being assholes who know NO MATH and abstain from whinging like a pussy?
It doesn't surprise me to hear that both sides are confused in this case. Some years back, I got into a lengthy flamewar on pandasthumb with someone who (I think) had mistaken me for a creationist, despite my repeated attempts to make my point clear, namely what you might reasonably conclude if you found (e.g.) the first thousand digits of pi in (e.g.) a radio message from space (and hadn't applied any cherry picking).
The phrase "miraculous universal distribution" rings a bell, so I must have been aware of it, though it looks like it was published after I left academia and before I was introduced to p-values (at a biotech job).
Later I posted this summary ("I mention this because it appears to be counterintuitive to a lot of people around here.") http://www.pandasthumb.org/archives/2006/02/dembski_and_the.html#comment-80883
Jeff,
you mentioned Dembski's infamous error of 65 orders of magnitude.
I *did* read your and Elsberry's article on Dembski's "No Free Lunch" but I don't recall the error.
Do you remember what the error was, and on what page Dembski made it, so I can check the math?
I'm assuming the coin is visible to you and you can tell it's not eroded.
The real issue is that we don't really detect design, we model processes of origin of objects. Those items that we say are "designed" we really mean are "manufactured" and we have successfully modeled or identified some substantial part of the manufacturing process that was used
to make them.
In fact, all of Dembski's examples in The Design Inference, are examples of manufactured items. At most his examples are really models of a particular manufacturing process.
For example, Caputo claimed he flipped a coin to select which party would lead on ballots, but attempts to replicate his results with a fair coin suggest that his manufacturing process was something other than flipping a fair coin.
Even SETI is based on a class of manufacturing process, if we consider radio signals to be potentially manufactured items. The SETI investigators study human manufactured signals, and try to find an identifying pattern based on that collection of known manufacturing processes.
Even if we go to the "watch on the beach" example from Paley, we identify the watch as manufactured, not because of some probability against chance assembly, but because the parts of the watch are items whose manufacturing processes are known. Send a watch into a time machine back to the neolithic, and it would be identified as a curious kind of rock.
Mt. Rushmore is identified by us as being a carved object, because we know about carving statues. To a person from a culture with no tradition of graven representations, the natural conclusion would be that these were giant's turned to stone.
Dio:
It's mentioned here.
Jeff, thanks for the link. I do read that stuff, and take notes.
Diogenes, I guess blowing your top felt better than addressing my point?
Post a Comment