Sunday, June 22, 2008

Oh, the Inanity! Slack in The Scientist

I've never read anything by Gordy Slack before, but based on this opinion piece in The Scientist, I'm not likely to in the future. Slack tries to defend the ID crowd, but all he comes up with is a confused mess.

Slack claims that ID advocates "make a few worthy points". But his examples demonstrate nothing of the sort.

1. Slack says, "While there is important work going on in the area of biogenesis, for instance, I think it's fair to say that science is still in the dark about this fundamental question." (Judging from the context, it seems that Slack really means abiogenesis.) He continues, "I think it is disingenuous to argue that the origin of life is irrelevant to evolution. It is no less relevant than the Big Bang is to physics or cosmology." This is just idiotic. Evolution is, by definition, what happens after there is a replicator to replicate. What came before is certainly relevant to biology, but it is not, strictly speaking, part of evolution itself. Even if some magical sky fairy created the first replicator, it wouldn't change all we know about the mechanisms of evolution today. Slack compares the Big Bang to physics, but then he doesn't compare the origin of life to biology, but rather to evolution. Isn't it clear that the analogy is faulty?

I disagree with Slack that we've made little progress in understanding abiogenesis. (What is this paper, chopped liver?) But even if mainstream science has made little progress, what progress has ID made? Nothing. No scientific papers, no testable models, no predictions. Nada. Zilch.

2. Slack says, "Second, IDers also argue that the cell is far more complex than Darwin could have imagined 149 years ago when he published On the Origin of Species." And so what? ID advocates weren't the ones to discover the cell's complexity, and they weren't the first to observe it was more complex than originally thought. (Darwin, by the way, knew well that the cell was not an undifferentiated blob of protoplasm; the nucleus was discovered in 1833.) And Darwin got lots of things wrong, so why is it even relevant to modern evolutionary biology what Darwin thought 149 years ago? The ID advocates would only have a worthwhile point if mainstream biologists were denying the complexity of cellular processes. But they don't. Mainstream biologists discovered the complexity. So what's the point?

3. Slack says, "Millions of people believe they directly experience the reality of a Creator every day, and to them it seems like nonsense to insist that He does not exist. Unless they are lying, God's existence is to them an observable fact. Denying it would be like insisting that my love for my children was an illusion created by neurotransmitters." I don't understand why something should be considered true simply because millions of peeople believe it. After all, there are probably millions of people who believe in witches, or that Elvis is still alive, or that 9/11 was a vast government conspiracy. But without evidence to support these claims, there's no reason why I need to take them seriously. Slack's comparison to "love for my children" being an "illusion" is remarkably inapposite. As a materialist, my guess is that love is, indeed, a product of neurotransmitters. But that doesn't mean that the experience of love is an "illusion". The neurotransmitters create the experience, but that doesn't mean the experience doesn't exist. Belief in a deity, however, is different. You can have the experience of a supernatural presence, but that doesn't mean the experience corresponds to anything outside your head. I don't see why Slack doesn't understand the difference.

4. Finally, Slack says that those who accept evolution can be dogmatic followers, too. "I met dozens of people there who were dead sure that evolutionary theory was correct though they didn't know a thing about adaptive radiation, genetic drift, or even plain old natural selection." Any field has dogmatic followers. But this has nothing to do whether ID is correct, is science, or has anything useful to say.

The really big point, the one that Slack misses completely, is the transparent dishonesty of nearly everything about intelligent design. ID advocates have to lie, because the evidence for evolution is so strong that they have no choice. That's something that even John Derbyshire understands, but Slack doesn't display any awareness of it.

All in all, this is one of the lamest defenses of ID I've ever seen.

66 comments:

Anonymous said...

If a belief in God results in less crime, would it make sense for atheists to promote religion simply to live in a society with less crime?

I'm well aware that many smart people are atheists and don't commit crimes. But that may not be true for dumber atheists.

Anonymous said...

If a belief in God results in less crime, would it make sense for atheists to promote religion simply to live in a society with less crime?

All the studies done so far on the subject suggest either no correlation between piety and lawfulness or one going in the opposite direction.

Dark Matter said...

Anonymous: Your comment doesn't actually relate to the article. At all. There is no reason why atheists should promote what we believe to be a dangerous delusion. You assume that the absence of an afterlife or divine punishment automatically makes the weak-minded believe that they can do whatever they want; in reality, people will always develop their own explanations and rationalizations for why acceptable behavior is necessary to a well-functioning society. Actually, when they don't, crime is more common; you'll find many religious believers in state prisons and low-income communities with high rates of crime.

However, on to my original point, which was: Excellent refutation of Slack's article. Any journalist acting as an ID apologist ought to be hung out to dry for intentionally misinforming the public. He's obviously either dishonest or delusional.

Thordr said...

Anonymous, per the Federal Bureau of Prisons, 94.8% of inmates are Judeo/Christian/Islam or derivatives, 0.2% are atheist (about 15%+/- of U.S. population), I think that should settle for your statement.

Anonymous said...

All evidence (admittedly, it is sparse, and the remedy is: More Science!) points to atheists being vastly underrepresented in the American criminal justice system.
We don't need the theists's morals, as ours appear to be better, or at least more resonably approached.

-Autumn

Anonymous said...

There is no evidence that theism decreases criminal behaviour. There is evidence that atheism decreases criminal behaviour (it is scanty, and should be followed up with what, dear readers? Yes! More Science!).
It is indisputable that atheists are vastly underrepresented in the American judicial system, and it is clear to most people that a secular morality is not only possible, but lacks many of the convienient loopholes allowed by theism.

Morality: you're doing it wrong.

-Autumn

Anonymous said...

"If a belief in God results in less crime,"

If a belief in God results in more wars, would it make sense for religious believers to promote atheism simply to live in a society with less death and destruction?

Ask the Mafia, The National Socialists and the KKK if a belief in God reduces crime.

I'm well aware that many smart people are religious and don't commit crimes. But that may not be true for dumber religious people - or those who purposely use their faith as a shield against guilt.

Anonymous said...

"I met dozens of people there who were dead sure that evolutionary theory was correct though they didn't know a thing about adaptive radiation, genetic drift, or even plain old natural selection."

Curiously enough, I've encountered dozens (perhaps more? Who knows) of people who fervently believe that the internal combustion engines in their automobiles are not operated by invisible fairies, despite their knowing literally nothing about the structure of fuel injectors or the fluid dynamics of exhaust manifolds. And to think such narrow minded buffoons call themselves "skeptics"...

Neel said...

Says a lot about Intelligent Design, when somebody tries to come up with a few good points about the movement and makes a complete ass of himself.

Erdos56 said...

Anonymous is mistaken according to many resources. For instance:

Religion and Crime 1

Another discussion...

As well as the underrepresentation of actual atheists in prisons.
But perhaps anonymous means active participation in organized religion as compared with casual non-observance? Perhaps, but if the underrepresentation stands, doesn't that suggest that maybe atheists should promote atheism?

In any case, I am skeptical of formulating social policy based on personal interests. Better still, why not have a pluralistic society that reserves religious belief to the personal sphere and doesn't inflict it on law and science (taking us back to the original discussion)?

Barba Rija said...

That's a great question, anon. But you do have stats available. Just look at prisons and see plainly that you'll find less atheists than in the outside.

I wonder why.

Anatoly Vorobey said...

"All in all, this is one of the lamest defenses of ID I've ever seen."

How is this a "defense of ID" in the first place? Does Slack say he believes ID to be correct, or that it should be taught in schools? No, he specifically disclaims all that (and more than once, too!). He calls IDers his "foes".

Slack's point of view is not especially complex: he says he's against ID and firmly pro-evolution, but he does think IDers, though definitely wrong on all the big questions, have a few worthy points that should be listened to. And yet this modest complexity seems to be more than you can handle. Calling this view "a defense of ID" or "trying to defend the ID crowd" is a kind of a smear-job that only someone fully in throes of "self-righteous rage", as Slack describes it, should be capable of. In that respect, your response partially validates Slack's observations.

As to Slack's arguments and your vituperative dismissals:

1. Slack's analogy with Big Bang is wrong, but not "idiotic". On the other hand, while evolution can be studied independently of abiogenesis, it is indeed disingenuous to pretend that there's no explanatory connection between them. A better analogy is early Universe cosmology rather than the Big Bang in particular. The story of what happened to the universe in the first few thousand years of its existence, although totally irrelevant to how things are now, provided a major strengthening of theories used to explain the current state of stars and galaxies.

The fact that we know next to nothing about how life might have begun does not disprove evolution in any way, but neither is it irrelevant. Certainly if there existed a scientific alternative to evolution which had a much better-explained mechanism for getting things started, that would've been a major point in its favor.

It's useful to understand the extent of our ignorance and to keep it in mind.

2. "And so what? What's the point?" You fail to understand Slack's point here, which isn't that IDers should get credit for the cell's complexity, but that now we know that there is much more to explain on the cellular level than our predecessors could ever imagine. Many supporters of evolution are completely ignorant of how much there is to explain on the biochemical level, and how little of it has been explained.

3. Slack's speaking nonsense here, but your reply to him is just as garbled and nonsensical. My "love for my children", just as "God's existence", exists solely "in my head" and doesn't correspond to any external reality outside my head (only consequences of that love do). Both are created by neurotransmitters, from the materialistic point of view.

4. "Any field has dogmatic followers. But this has nothing to do whether ID is correct, is science, or has anything useful to say." - well, you're attacking a strawman, as Slack isn't saying it has anything to do with that. He's saying that it's useful to remember that evolution has scores of dogmatic, stupid, deeply ignorant, rabidly combatant followers - and he's right, it does, as a glance at Pharyngula's comments section on any given day, for example, would suffice to demonstrate. I think it's useful to keep that in mind.

Anonymous said...

What is even worse about Slack's analogy to the Big Bang theory is that he fundamentally misunderstands the Big Bang theory. The actual cosmologists that believe in the theory have their opinions on the beginning of the world, but they all recognize that these are not part of the scientific theory for which they have evidence. The standard cosmological model that is often referred to as the Big Bang theory is compatible with a number or scenarios for the very early universe, including the scenario where the universe as we know it arises from some other state with an infinite history.

A clear discussion of this point can be found on page six of PJE Peebles text on cosmology, "Principles of Physical Cosmology" (Princeton University Press, 1993).

So Slack is correct that a theory of abiogenesis is "no less relevant" to evolution than a theory of the beginning of the universe is relevant to cosmology, but he is entirely incorrect in thinking that this relevance is very significant.

Anonymous said...

All evidence (admittedly, it is sparse, and the remedy is: More Science!) points to atheists being vastly underrepresented in the American criminal justice system.

If there's a correlation between atheism and intelligence, then this would explain this data.

But if everyone is an atheist -- including dumb people -- then maybe there would be more crime.

Kirk Durston said...

To get back on topic …. Jeffrey Shallit wrote, " The really big point, the one that Slack misses completely, is the transparent dishonesty of nearly everything about intelligent design. ID advocates have to lie, because the evidence for evolution is so strong that they have no choice."

I'm not sure that the above sweeping generalization is a completely honest representation of either ID or of evolution. Of course, one would be a fool to defend all ID advocates just as one would be a fool to defend all atheists, or all Darwinists, or all people who write with their right hand. I see plenty of evidence for lack of rigor on both sides of the fence. Here are a couple of thoughts ….

A definition of ID is important before we can discuss it. I define ID as follows:

Intelligent Design: an effect which requires a mind to produce.

Given the above definition, my MacBook Pro would be an example of ID, as would the underwater acoustic signature emitted by a Russian submarine, or a signal received from interstellar space that contains the instructions for a working fusion reactor, or a ground penetrating radar scan that shows the foundations of an ancient city. Another example of ID, under the proposed definition, would be the five 'watermarks' that the Venter Institute encoded encoded into its artificial M. genitalium genome. The 64 million dollar question is whether the functional information encoded into the rest of the genes of Venter's M. genitalium are also examples of ID.

To answer that question, we need a method for ID detection. ID detection is an important component of certain branches of science such as forensic science, archeology, and SETI. With Venter Institute's introduction of genetic 'watermarks', I suppose ID detection has now become an issue in genetics, if it wasn't already. ID detection must answer the central question, 'Was this effect produced by natural causes, or does it require a mind to produce?'

So it seems to me that before we can say ID was or was not required to produce a MacBook Pro, or to produce the functional information encoded in the genomes of life, or for the functional information in a radio signal from space that contains the instructions for a working fusion reactor, we must first have a method that can distinguish between those effects that require a mind to produce and those that can be produced through natural processes. With this in mind, let me propose a question.

ID question: You are handed two files. One file contains a radio signal from deep space that has the genetic sequencing for RecA (a protein-coding gene found in M. genitalium). The other file contains a genetic sequence for M. genitalium within which Venter Institute claims they inserted five 'watermarks'. All you have to work with are the two files and the knowledge that one file came from deep space (confirmed by numerous radio astronomers) and the other file came from the Venter Institute. What method would you use to determine whether or not the signal from deep space, and the five watermarks, required a mind to produce. (i.e., the method cannot be arbitrary or ad hoc, but must work for both).

The point of my question is this: until we have a general method to distinguish between effects that require a mind to produce and effects produced by natural processes, we are not in a very good position to figure out if the functional information required to encode RecA required a mind to produce (i.e., is an example of ID) or was produced by natural processes. The job of science in this issue is to come up with a general method for ID detection that is not arbitrary or ad hoc, and works for forensics, for SETI, for archeology, and for genetics.

Let's assume for the sake of argument, that all ID theorists and all Darwinists are liars. What logically follows from that? Not much. We still have to do the work.

Anonymous said...

The first anonymous says, "I'm well aware that many smart people are atheists and don't commit crimes. But that may not be true for dumber atheists."

1. There aren't any dumb ones.

2. If one relies on a supernatural phantom as the source of their morality and ethical behavior, one might as well tattoo the word "DUMB" on their forehead.

PurpleFrog said...

"If a belief in God results in less crime, would it make sense for atheists to promote religion simply to live in a society with less crime?"

Even if it was true, the answer would not be necessarily yes. Crime was very law in communist countries under dictatorship, but I don't want to live in that kind of society.

Jeffrey Shallit said...

Anatoly:

1. Slack's piece is a defense of ID advocates because he says that they have valid points that should be listened to. But all the valid points he cites are well-known to all educated people, so why should ID advocates get special pats on the back?

2. Sure, there's a lot to explain at the cellular level. But we have explanations for a lot of it, too. ID advocates always minimize what we do know, so why should they deserve special pats on the back for pointing out cellular complexity, a fact known to any biologist?

3. I'm glad you agree that Slack was garbled and nonsensical here.
That's my point.

4. Saying that "evolution has dogmatic followers" is about as content-filled as saying "gravity has dogmatic followers". Again, why should ID advocates get credit for observing something so trite?

As for "this modest complexity seems to be more than you can handle", maybe you should encourage Slack to improve the clarity of his writing, rather than insulting my intelligence.

Jeffrey Shallit said...

Kirk:

How about you define "mind" and "intelligence" before we go any further?

Kirk Durston said...

Jeffrey Shallit wrote, "How about you define "mind" and "intelligence" before we go any further?"

Jeffrey, that's a fair enough question. Defining what a mind is kind of like defining what gravity is (which has also been mentioned in some of the comments above) ... we might argue about what it is, but we all know it is there. There is an additional problem with making up a definition of 'mind' …. the mind is what makes up the definition of itself. I'm always a little bit suspicious as to the objectivity of such a process. On the other hand, I doubt any of us would be happy with a completely mindless definition of 'mind'. For the sake of this discussion, I'd be happy with a common sense notion of 'mind', as well as 'gravity'. However, if one wants a definition that some mind has done a little more work on, then I'd be happy to go along with this.

That being said, in this discussion of ID, the one thing about a mind that is important is the concept of intelligence. Whatever we think a mind is, I would hope that we'd all agree that a necessary characteristic of a mind is that it possesses the attribute of intelligence (at least to varying degrees). As far as defining intelligence, this is fine with me. However, I'd like to highlight some ideas in that definition. Especially relevant to this discussion is their suggestion that intelligence includes the ability to reason, to plan, and to solve problems.

So if we can agree that a mind must at least have the attribute of intelligence, we could define ID as 'an effect that requires intelligence to produce'. With the definition of intelligence discussed above, however, we could go further and define ID as 'an effect that requires the ability to reason, plan and solve problems'. However, even if defining a mind is problematic, I think we can all agree that necessary attributes of a mind include reasoning, planning and solving problems. With that in mind (heh, heh), I simply define ID as follows:

Intelligent Design: an effect that requires a mind to produce.

Examples of ID:
a) a MacBook Pro
b) a radio signal from interstellar space that contains the instructions for a working fusion reactor
c) the Venter Institutes 'watermarks' in their artificial genome
d) the results from a ground penetrating radar scan that reveals the foundations of an ancient Egyptian city (can't recall the name …. it was on Discovery channel a couple weeks ago).

The question is, is the functional information encoded in the gene that codes for RecA an example of ID? (I choose RecA because it is an average length protein, it is a universal protein found in all life forms, and I've done some work on it.) To answer that question, we need a scientific method to identify examples of ID that does not yield false positives, yet does not rule out obvious examples of ID (such as Venter's 'watermarks', or laptop computers) and is general (i.e., can be applied to forensics, SETI, archeology, and biology).

Jeffrey Shallit said...

So, Kirk, do bacteria have "minds' or exhibit "intelligence"?

How about thermostats? Plants? Termites? Computer programs?

I don't see that your definitions help decide these kinds of basic questions.

Kirk Durston said...

Jeffery, referring back to my definition of ID, the question is not whether effects have minds, but whether the effects require a mind to produce. So defining ID as I have, merely clarifies what we are talking about when we talk about ID. The next step, as I have suggested earlier, is to come up with a general method to distinguish between effects that are instances of ID and effects produced by mindless natural processes. When it comes to ID and biological life, I see many scientists jumping to a conclusion without having done the work (i.e., they don't have a method to distinguish between ID and effects produced by mindless processes). It is bad science to assert conclusions without having a method to back those conclusions up. I think we have a very interesting scientific problem sitting before us, the problem as to whether the functional information encoded within the genomes of life required a mind to produce. To those who assert that the functional information encoded in the genomes of life is not an example of ID, my question is, 'what method did you use to distinguish between ID and mindless natural processes?' Of course, the question applies to those who make the affirmative conclusion as well.

Anonymous said...

Why can't you say that intelligent design may be possible, but evolution may follow that? Why is that so impossible?

Jeffrey Shallit said...

Nice evasion, Kirk. You can't even apply your definitions to basic questions like I asked, so your questions are, in essence, content-free.

It doesn't do any good to ask whether "intelligence" was required to produce life as we see it today, when you can't even tell me whether the process of evolution itself constitutes intelligence.

Jeffrey Shallit said...

Anonymous:

I certainly agree that life on earth could possibly have been designed by other beings. Show me a crashed spaceship with plans for the design of bacteria, and I'll accept that with pleasure.

But for that claim to go beyond mere wild hypothesis you have to have some evidence. It is this crucial step that makes it science, and the step that ID advocates relentlessly refuse to provide.

Kirk Durston said...

There's no evasion on my part, Jeffrey, and the definitions I have provided can be applied across the board. Of course, evolution fails the test of intelligence by the definition of intelligence provided earlier. It does have problem solving capabilities, but it does not possess the other two necessary components, reason and planning. Furthermore, the problem solving capabilities are quite limited. For example, evolutionary solutions involving little if any increase in functional information, such as modifying RecA by natural selection, are easily done by evolution. Discovering the functional information required to code for RecA, however, is not a simple problem and one which, on the basis of ongoing research, appears to be too difficult a problem, by numerous orders of magnitude, for evolution to discover … and that's just one protein. We can improve on the crude problem solving capabilities of evolution by bringing intelligence (human) into the picture and writing our own highly sophisticated genetic or evolutionary algorithms with intelligently designed fitness functions. Such algorithms are examples of ID. However, short of programming the answer into the algorithm along with a general pathway (i.e., spoon-feeding the algorithm), even human designed evolutionary algorithms appear to fall considerably short of the type of problem solving capabilities necessary to encode the stable, folding proteins of life. This doesn't entail that the functional information required for the encoding of proteins was produced by a mind, however. We still need a positive method to identify ID, whether we are in forensics, SETI, archeology, national defense, or biology.

Darwinism has proven to be a science-stopper when it comes to applying any such method to biology. I once had an internationally, very-well known scientist, who has published a number of popular level books on evolution, tell me personally that he will oppose the development of any scientific method to detect ID. It is not difficult to design a method to detect ID; forensic scientists and archeologists have been doing it, albeit crudely, for decades. Yet I find Darwinists notoriously reluctant to apply any method to detect ID to genetic sequences. They don't even have a method. Yet they assert, on scientific grounds, that ID was not required. A 'scientific' conclusion, with no supporting method, is not even bad science; it is not science at all.

Anonymous said...

Good post, but I t5hink you're being a little hard on the author.

What came before is certainly relevant to biology, but it is not, strictly speaking, part of evolution itself. ... Slack compares the Big Bang to physics, but then he doesn't compare the origin of life to biology, but rather to evolution.

I think an understanding for how life originated in the first place would have repercussions to our understanding of evolution as a process. Consider how understanding the processes of the Big Bang would add to our understanding of other physical processes, even if they came after.

Unknown said...

As to your pt 2 "why Darwin's thoughts and words matter to ID'rs"

Their approach is messianic,as consistent with their own view on Jesus. So the though seems, refute the messiah and destroy the whole 'religion of evilution'

Jeffrey Shallit said...

Kirk, when I ask you to define a term, all you do is substitute other vague terms. What is your definition of "planning"? How would I know whether a system exhibits it?

Your claims about the inability of mutation and natural selection to accomplish anything are typical for the ID crowd: mere assertions with no evidence to support them. In fact, we know from experiments with artificial life that evolutionary algorithms routinely develop surprising and novel strategies that are not preprogrammed. Look into the work of Karl Sims, for example.

You are extremely misleading when you claim that " forensic scientists and archeologists" have a method to detect ID. They have no such general method. What they have is a large body of experience of what humans are likely to do, and the kinds of artifacts they produce. Forensic scientists and archaeologists don't detect some abstract thing called "design"; they detect human-made artifacts.

Jeffrey Shallit said...

Kirk said, "what method did you use to distinguish between ID and mindless natural processes?"

There's no such general method, and I don't think there can be. You could make the same argument about any physical phenomenon: we can't distinguish between the effect of gravity and the alternative hypothesis that tiny demons are are pushing and pulling. What makes gravity scientific, and the tiny demons not, is that a theory of gravity can make predictions and is potentially falsifiable, whereas the tiny demon hypothesis does not and is not.

If you claim that life is the result of some designer, it's up to you to produce some evidence of that fact. So far you haven't.

Kirk Durston said...

Jeffrey Shallit wrote: "Kirk, when I ask you to define a term, all you do is substitute other vague terms. What is your definition of "planning"? How would I know whether a system exhibits it?

Jeffrey, now I'm beginning to think that it is you who is being evasive. I am reminded of Bill Clinton's question on what the meaning of the word 'is' is, during his grand jury testimony in the Monica Lewinsky affair. I would expect that most of us know what the word 'planning' means, but if anyone wants some clarification as to what it means, you can start here.

Regarding my claims about the inability of mutation and natural selection to produce any significant amounts of functional information: whether they are typical of the 'ID crowd' or not is irrelevant. Nothing follows logically from whether they are typical claims or not. The real issue is whether biological mutation and natural selection can actually produce functional information or not. If you think they can, then you have a guaranteed paper in Nature. If you actually think that they can (and I'm not sure whether you are playing the devil's advocate or not here), then publish a paper on it. Until that paper comes out, however, it is important that one does not confuse intelligently designed evolutionary algorithms with what goes on in nature. I am very familiar with evolutionary algorithms and have written and used a wide variety, including state of the art algorithms out of recent publications. The ability of an evolutionary algorithm to produce any kind of solution depends upon its design and fitness function. Without ID, an evolutionary algorithm cannot produce any significant amount of functional information at all. For a definition of functional information, I'm using this.

Do you know how much functional information is encoded for the average protein of 300 to 400 residues? I don't think you were necessarily suggesting that we have written, or could write, any kind of evolutionary algorithm at all that will find folding functional protein sequences, but if you are, then such an algorithm has not been written and there are very good reasons to believe it never will, due to the nature of protein sequence space. It's not what we do not know that is the problem, it is what we do know about how physics determines the folds and, hence, sequences, and the nature of protein sequence space that is the problem. The mutational search engine of biological processes is far too slow and inadequate, by more than a hundred orders of magnitude, to find even one average protein sequence, forget about finding thousands of them.

Regarding a general method to detect ID: In forensic science, archeology, and SETI, there is always the same question, 'Is this effect merely the result of natural processes, or not?' If the answer is 'not', then the next step is to more carefully investigate the effect. One summer, as an engineering student in the mid-1970's during the Cold War, I was awarded a summer job with National Defense Research. My job was to write some software that would analyze underwater noise and decide if there was a soviet submarine in our coastal waters. It was not that difficult to do and the software worked beautifully. The central component of the software was to isolate anomalies within the underwater recordings and work from there. I had no idea what the underwater acoustic signature of a Soviet sub was like and I was not given that information, so I could not possibly design the 'answer' into my program. The program was to detect any sound not likely to be a result of nature and it did so, very nicely, Soviet subs included. Whether the underwater sound was human or alien, its acoustic signature would have been isolated by that software. In archeology, we often have no idea of what a given artifact is or did, or who made it (contrary to your theory of how archeologists detect design), yet we know it is a result of ID (i.e., produced by someone with a mind). We know that because of the degree to which it is anomalous within the local physical system. In general, in forensic science, archeology, and SETI, the starting point is to look for anomalies. That is the first step in ID detection. We must then have some method to quantify the anomaly to isolate those anomalies that are likely produced by some intelligent agent from the relatively low-level anomalies that occur in nature.

A method to detect ID:
Jeffrey wrote, "If you claim that life is the result of some designer, it's up to you to produce some evidence of that fact. So far you haven't."
Jeffrey, the driving force behind ID is 21st century science, with its discovery of increasing numbers of molecular machines and even molecular computers within the cell. The creative story telling that Richard Dawkins once relied on so heavily just doesn't cut it anymore, nor does 19th century Darwinian theory. If you want empirical evidence, start with the molecular machines. Of course some Darwinists still confuse creative story telling with doing science, and come up with creative stories of how those machines might have involved …. stories that expect us to believe that the right functional proteins, with the right 3-D structures just magically appear at the right moment of evolutionary history and against jaw-dropping, mind-staggering probabilities that any sane person would laugh at. That's not science. Darwinian theory, in the 21st century, is rapidly degenerating into quackery. Jeffrey, you've got evidence of ID in spades and it is increasing with each month. If that isn't enough, check out next month's science journals for more molecular machines.

But you are a mathematician and you appreciate, perhaps, a method to detect ID that works on the basis of formulae and cold, hard numbers. Let's try the following method to decide if the information required to encode RecA, and the Venter Institute's 'watermarks' is an example of ID:

Hypothesis: An attribute that distinguishes intelligent minds from mindless natural effects is the ability to produce significant amounts of functional information.

Small amounts of Functional information can be accidentally generated so we need to define what constitutes a 'significant' amount of information. Let's let the physical system define what is 'significant' by letting the physical system determine the total number of trials. The total number of mutational events that could have occurred under the most generous scenarios in the history of the earth is approximately 10 exp 42 (and I am being extremely generous here). We could infer from this that if some mutational effect had a probability equal to, or higher than, the inverse of the number of trials permitted by the physical system, then it might have a reasonable expectation of occurring. Insert that probability into the formula for functional information published by Hazen et al to get 140 bits of functional information that could accidentally be generated through mutation in the history of the earth. So if we observe an effect that requires significantly more functional information than 140 functional bits, then we can reasonably conclude that it is an example of ID. Note, that includes the functional information required by any fitness function. (Now that I've tossed that bit of bait out, I'll leave it to you, Jeffrey, to take it, and then we'll discuss it. In short, fitness functions are not freebies.) Also, the 140 bit threshold is exceedingly generous. Actual ongoing labwork on E. coli suggests the computational capacity of mutational events is probably less than 30 bits.

Venter's 'watermarks':
Using the equation for functional information supplied by Hazen et al, the five watermarks require a total of 259 bits of functional information. Since that is more than 100 bits higher than the 140 bit threshold, we can conclude that my method works.

RecA:
Using the data from 1,553 sequences for RecA from the Pfam database, and Hazen's equation for functional information, RecA requires 832 bits of functional information. It makes Venter's watermarks look a bit Mickey Mouse.

Conclusion: Both Venter's watermarks and RecA come up ID positive. You want evidence? You have it in RecA, and that is just one protein. We need more than 250 different proteins to get the first living cell.

Darwinist problem: If one does not like the results of the above method, come up with another method that flags the Venter watermarks as ID-positive and RecA as ID-negative.

I would not be so bold as to call this 'checkmate', but I would certainly call this 'check'. To get out of this one, the Darwinist has to come up with a legitimate method for ID detection that gives different results.

I love this! …. and you should too, Jeffrey. In spite of the fact that you and I are not in perfect agreement …. yet, you do have a sharp mind …. I'm just giving you a 'poke' to start using it on this problem.

Cheers,
Kirk

Jeffrey Shallit said...

Kirk:

I asked you what "planning" was, and I asked you how I could tell whether a system exhibits it or not. You responded by telling me to go look up "planning" on Wikipedia, but you didn't tell me how to determine whether a system exhibits it or not.

The Wikipedia definition says, among other things, that "planning" is "the psychological process of thinking about the activities required to create a desired future on some scale." But this shows why your reasoning is circular. We are trying to decide if evolution itself can be said to be "intelligent". In order to do this, we need a definition of "intelligent" that is based on the abilities of the system itself, not on how the system is implemented.

Now your definition of "intelligent" involves "planning". Your definition of "planning" involves "thinking". In other words, evolution couldn't possibly be intelligent because it is not the result of "thinking". But "thinking" in common usage is about minds, so you end up denying evolution is intelligent because it is not implemented in a mind.

My point is that vague terms like "thinking", "intelligent", "planning", etc. have vaguely-understood meanings in terms of human behavior, but that you haven't defined them in a scientifically-rigorous manner in such a way that I could apply them to systems like evolution. Sneering and comparing me to BIll Clinton because I insist on a definition I could apply across the board is not very helpful.

Jeffrey Shallit said...

Kirk:

You write,

"My job was to write some software that would analyze underwater noise and decide if there was a soviet submarine in our coastal waters. It was not that difficult to do and the software worked beautifully. The central component of the software was to isolate anomalies within the underwater recordings and work from there. I had no idea what the underwater acoustic signature of a Soviet sub was like and I was not given that information, so I could not possibly design the 'answer' into my program. The program was to detect any sound not likely to be a result of nature and it did so, very nicely, Soviet subs included. Whether the underwater sound was human or alien, its acoustic signature would have been isolated by that software."

Your confusion is evident here, and it is very illuminating because it is the same mistake made over and over again by the ID crowd.

We do not have a complete catalogue of those sounds "likely to be a result of nature", and so your software could not possibly have worked in all instances. For example, as a thought experiment, suppose we design a sub that is extremely quiet, but plays a recording of fish sounds as it moves through the water. Your software, presumably, would have detected the sounds and classified them as "likely to be a result of nature". On the other hand, dolphins are extremely skilled at mimicry, and it is not ridiculous to think that they could mimic the sound of sonar "pings". In this case, your software would have detected the sounds and classified them as "likely to be the result of human activity". In both cases your decision would have been wrong.

Science is filled with cases where an observed event or artifact has been thought to be human-caused, but which later was determined to be not the result of human causation: the Giant's Causeway, for example. ID advocates claim that their methods would have worked to resolve such cases, but they never apply them to anything but the most trivial events.

Finally, because man is a part of nature itself, I think the traditional dichotomy between "human-caused" and "nature-caused" is not useful. Human causation is simply a subset of natural causation.

Jeffrey Shallit said...

Kirk:

You claim, "The mutational search engine of biological processes is far too slow and inadequate, by more than a hundred orders of magnitude, to find even one average protein sequence, forget about finding thousands of them." What is the evidence for this claim? Point me to a paper in the peer-reviewed literature that makes this claim and backs it up.

Jeffrey Shallit said...

Kirk:

You say, "the driving force behind ID is 21st century science, with its discovery of increasing numbers of molecular machines and even molecular computers within the cell. The creative story telling that Richard Dawkins once relied on so heavily just doesn't cut it anymore, nor does 19th century Darwinian theory."

This is good propaganda, but it's completely useless as an argument. Yes, it's scientists (not ID advocates) who have discovered structures that can be considered analogous to machines in the cell. But who among those scientists thinks this is evidence for magical sky fairies? Behe certainly, but not many others. If the evidence were so clear, you would think thousands of biochemists would be publishing papers entitled "Magical sky fairies proven". But they don't. Maybe if you made a little effort to understand evolutionary explanations for such structures (such as here), you wouldn't be so cavalier about machines being evidence for ID.

And why should machines be evidence of ID? You have no argument beyond, 'I don't think machines can occur naturally'. Sorry, your incredulity is not an argument.

And why the obsession with "19th century Darwinian theory"? Yes, Darwin got some things wrong, but science is self-correcting. Today's evolutionary biology is far beyond Darwin. However, I looked up in my evolutionary biology textbook, and there's not one word about magical sky fairies and how to detect them.

Jeffrey Shallit said...

Kirk:

I suggested you go look at the work of artificial life researchers, such as Karl Sims. You entirely ignored my suggestion and say nothing about it. I conclude that work like SIms' is a devastating refutation to the claims of ID advocates that information cannot be produced by mutation and natural selection.

As for "functional information", the mistake you make is that you have no idea of what the probabilities are in the events that led to modern cellular proteins. And, you have no idea whether a given structure has biologically relevant activity other than the one you are looking for. You have made claims in the past about these probabilities, and during our debate I cited a paper that showed your probability claims were exaggerated. See here for more details.

Anonymous said...

We do not have a complete catalogue of those sounds "likely to be a result of nature", and so your software could not possibly have worked in all instances.

Case in point:

"The sound interpreted as submarines was, in fact, herrings farting"

Kirk Durston said...

(Note: Jeffrey, you raised a lot of objections and I've attempted to respond to all of them, but before you start posting responses to this post, please read the 'Final Comment' at the very end of this post.)

This is certainly better than a game of Chess. Your responses tell me that you are so confident that you are right, that you think you can do this with one hand tied behind your back. However, I know enough about you to know you've got a sharp mind; my job is to make you think more critically about this issue with the objective of seeing you become an ally.

Planning:

If you will refer back to where I pointed to the Wiki definition of 'intelligent', you will see that I highlighted three attributes that were required: to reason, to problem solve and to plan. Planning is not something I just added to the definition recently. I didn't even add it. It was in the Wiki definition. At the very least, the ability to plan requires that an entity be able to come up with a concept or goal and construct a series of steps to accomplish that goal in advance of actually implementing those steps. This should be self evident to anyone who has made it into university, so I still cannot believe you are not trying to be evasive here. Surely you are not implying that nature or biological evolution has the foresight and reasoning capability to plan in advance of even the first mutation. Are you conceding that life does require ID, but that evolution or nature is intelligent (can reason, plan and problem solve)? That is pantheism and I cannot believe you are a pantheist, but if you are, let us know ….. that will change the whole direction of the conversation.

Submarine detection as an example of ID detection:

Jeffrey wrote, ' Your confusion is evident here, and it is very illuminating because it is the same mistake made over and over again by the ID crowd.'

First of all Jeffrey, with all due respect, I have to point out that you have a tendency to make sweeping generalizations of people who you pejoratively label 'the ID crowd' that badly misrepresents the individual. Keeping in mind that what started this discussion is your sweeping generalization that ID proponents are liars (with the implication that you honestly represent ID proponents), you might want to think about whether you are being honest in your misrepresentations.

Secondly, with your defeatist attitude toward ID detection, don't bother applying for a job with any intelligence agency or SETI, or even the investigations department of the Ontario Lotteries Commission for that matter. Your response is a classic example of how Darwinism can be a science stopper. In the software that I wrote to detect Soviet submarines, you will notice that I mentioned the importance that anomalies play in any ID detection, whether it is in forensic science, SETI, archeology, intelligence gathering, or even in fraud detection with the Ontario Lotteries Commission. Of course if a Soviet sub plays a recording of a fish, it would not show up as an anomaly. Intelligent agents can mimic nature and such effects may not show up as an anomaly. ID detection systems, depending upon what is at stake, can be tuned to: a) yield no false positives, in which case it is likely to also yield a lot of false negatives or b) yield no false negatives, in which case it is likely to yield a lot of false positives. No ID detection system is likely to both yield no false positives and no false negatives. You are not the first person to think of this Jeffrey and to imply that the 'ID crowd' (or National Defense Research, or SETI, etc.) has never thought of this, or is not well aware of this is simply not an honest representation of the individual, at least not of me. Incidentally, the first step in my software was to transform the acoustic data via a Fast Fourier Transform (FFT) so the data could be analyzed in terms of frequency. Even if a dolphin could imitate the sound of a Soviet sub, when you applied a FFT to that sound, I doubt very much it would match in the frequency domain. Was the software perfect? In spite of its 100% success rate during the testing phase, I'd fall over in a dead faint if it was perfect. In fact, I'd wager large sums of money (if I had large sums of money to wager) that it was not. But that does not stop us from doing the science and constantly improving our methods of ID detection, especially where the stakes can be high, such as in national defense and related intelligence areas. By the way, I hope no one blames me for all the mistakes that CSIS or some other agency makes. My contribution to sub detection was made back in the '70's and I'm sure they have advanced far beyond the old Fortran WatFour software in use at the time.

A circular argument:

Jeffrey wrote, ' Finally, because man is a part of nature itself, I think the traditional dichotomy between "human-caused" and "nature-caused" is not useful. Human causation is simply a subset of natural causation."

Well, Jeffrey, you can probably see what you did when you wrote that. A circular argument assumes the conclusion in its opening premise. Take RecA, for example. Given your assumption that nature has caused humans, then it follows that since humans require RecA, then nature caused RecA. But one of the issues on the table is whether nature can cause RecA. You can put that proposition on the table, but one falsifiable prediction it makes is that nature can cause RecA. The research suggests not only that nature cannot produce RecA, but the level of functional information encoded RecA requires ID, as I pointed out in my previous post.

Biological evolution as a pathetically, underpowered search engine

Jeffrey responded, ' What is the evidence for this claim? Point me to a paper in the peer-reviewed literature that makes this claim and backs it up.'

Jeffrey, the evidence for this is so obvious and overwhelming, that I have to believe that you are trying to evade the issue. There are two ways we can handle your question. One is to appeal to authority, which is the non-thinker's way of doing things, and the other is to actually do a little bit of work on the problem and see for ourselves what the answer is (the intellectually honest skeptic's approach). As for the scarcity of folding functional proteins in sequence space, there are various papers that mention this. One such paper is by Taylor et al. (2001), 'Searching sequence space for protein catalysts', PNAS, 98, 10596-10601. For example, they write ….

Direct selection of catalysts from pools of fully randomized polypeptides is a conceivable alternative to de novo design, requiring no foreknowledge of structure or mechanism. An analogous approach has yielded RNA catalysts for a variety of chemical reactions (5). However, a 100-residue protein has 20exp100
(1.3 x 10exp130) possible sequences. Even a library with the mass of the Earth itself—5.98 x 10exp 27 g—would comprise at most 3.3 x 10exp47 different sequences, or a miniscule fraction of such diversity. Unless protein catalysts are unexpectedly abundant and evenly distributed in sequence space, such a strategy will clearly be
impractical.


In the paper they work on the problem of how novel protein folds could be created, given the extreme rarity of stable folding sequences. Their concluding sentence states, ' By iteratively combining combinatorial mutagenesis and selection with intelligent design, it may also prove possible to create novel protein scaffolds, unknown in nature, and to endow them with tailored catalytic activities.'

Did you notice their requirement for 'intelligent design' as something that would be necessary to discover novel folds? I'm sure they meant human ID, of course, but what then is nature going to use? They do not make any serious attempt to actually work through how nature is going to do something that requires 'intelligent design'.

Another paper you might want to acquaint yourself with is one by Trevors and Abel (2004) 'Chance and necessity do not explain the origin of life', Cell Biology International. More recently still, you may also want to check Koonin's paper (2007) ' The cosmological model of eternal inflation and the
transition from chance to biological evolution in the history of life', Biology Direct. In this paper, biologist Koonin essentially abandons any hope of evolution being able to perform the required evolutionary search for all the proteins required for the first life form and, instead, suggests that there are an infinite number of worlds, which would make the mind-staggering improbability of life, virtually certain that it would occur in some world.

However, my job is to turn you into a thinking skeptic, Jeffrey, rather than let you appeal to authorities. I do not like to appeal to any authority unless I can understand the reasoning that the authority uses and have found it to be good. So the second approach is to think about whether an evolutionary search engine is adequate to find even something relatively simple, like RecA, for example. I'll bet you have not even taken the time to figure out how many searches (mutational events) are available to an evolutionary search engine. It's pretty easy to figure it out using peer reviewed, published data on fast mutational rates, the total estimated genetic library on earth, the total time available, etc. It turns out to be less than about 10 exp 42 mutational events total. That's all we have to work with. Of course, you do not have to take my word for this, but I find that most Darwinists don't think about this little boundary condition on the search problem. Now how prevalent are sequences that code for RecA. It turns out that we can figure that out too by computational analyzing the thousands of sequences available to us on Pfam. I've written software that does that. The bottom line is that in 240-residue sequence space, the percentage that is occupied by any sequence at all that will code for RecA is about 10exp-248 %. Now think about this, Jeffrey. An evolutionary search engine has 10 exp42 searches to find something that occupies only about 10exp-248 % of 240-residue sequence space. I don't think I need to tell you what the implications are of that for the effectiveness of an evolutionary search engine. But it gets worse. If one could use some sort of natural selection to find these proteins, then that would be helpful, but we can't. A number of published papers have pointed out that stable folding regions of sequence space are surrounded by non-folding regions, and my own exploration of sequence space is bearing that out. In fact, these non-folding regions seem to occupy most of sequence space with only a miniscule portion coding for stable 3-D folds. So any evolutionary search will consist entirely of a random walk. This was first pointed out, to my knowledge, by Blanco (1998), ' Exploring the Conformational Properties of the Sequence Space Between Two Proteins with Different Folds: An Experimental Study', Journal of Molecular Biology, 285, 741-753. Other papers have pointed out that areas of folding sequence space are surrounded by non-folding regions. If the protein does not fold, it is not only not useful, but harmful to the cell, tending to aggregate. Therefore, they are broken down and recycled. Natural selection cannot act on sequences that are not useful to life, and it appears that almost all amino acid sequences are non-folding.

But it gets worse. If there were millions and millions of deferent proteins families laying all over sequence space, then it would be easy to find them in a blind search where natural selection is useless. However, there aren't millions and millions. Physics, not biology, determines where the 3-D folds lie in sequence space and what portion of sequence space codes for stable 3-D folds. It is estimated that there are probably less that 10,000 different folding domains. When we realize these 10exp4 domains (or less) are scattered over a 21-dimensional sequence space (20 amino acids plus length), that goes up to over 900-residue sequence space where the 949-residue protein ACR Tran, for example, occupies only 10exp-495 % of that sequence space, it does not take a rocket scientist to see that an evolutionary search engine, working at a crawling physio-chemical speeds will never be sufficient to find even RecA.

Molecular machines:

Jeffrey wrote, " And why should machines be evidence of ID? You have no argument beyond, 'I don't think machines can occur naturally'. Sorry, your incredulity is not an argument."

Once again, you are misrepresenting my argument. If you will go back to my previous post, you will begin to see my argument. It is most certainly not, 'wow, we can't see how nature did that, therefore Godiddit!' My argument is this:
1. Measure the functional information required to produce the effect
2. If the functional information exceeds 140 functional bits of information, it requires ID (based on the hypothesis that what distinguishes intelligence from mindless natural processes is the ability to produce significant levels of functional information).
3. The proteins required to build the molecular machines require more than 140 bits of functional information, therefore, they require ID.

The Darwinist asserts 'evolutiondiddit!' but has no method whatsoever to produce the functional information required to encode proteins. Intelligence can do stuff like that. In other words, it is simply false to assert, as Darwinists do, that we have no explanation as to where the functional information can come from. It is an empirical fact that intelligence can easily produce virtually unlimited amounts of functional information.

Fairies in the sky:
Jeffrey, it's hard for me to believe you are not trying to evade or dodge the issue when you throw up stuff like that. The problem to focus on is this, 'Does the functional information encoded in the genomes of life require ID or not?' Focus. Focus. Don't worry about fairies.

Karl Sims simulations:

I did respond to your suggestion of evolutionary algorithms in my previous post, Jeffrey. You have failed to show how his simulations produce functional information. A good skeptic ought not to simply point to an authority and assume that the authority has actually done what one hopes he has done. If you think Sims' simulations has generated functional information, then the onus is on you to show it. However, I will grant that there is a lot of confusion over 'information' and 'complexity' and Sims is not immune from it. He has done a lot of stuff and I'm not sure which of his simulations has impressed you, but let's take this one. I'll say two things about it, one to do with complexity and the other to do with functional information.

Complexity: I see Darwinists use this term a lot, but when I read what they say about it, it's clear they have only a vague idea of what they are talking about. For a good paper on this, I would suggest this one. In that paper, the authors define three types of complexity: Random complexity (RC), Ordered complexity (OD), and Functional complexity (FC). If you imagine a sheet of paper on your desk, the width of the paper can represent the x-axis (OC), the length of the paper can represent the Z-axis (RC) and the y-axis rises up out of the paper. What the laws of physics produce is usually a combination of OD and RC. If you were to plot most examples of complexity, it would be on the plane of your paper with both an x-coordinate (OC) and a z-coordinate (RC), but y-coordinate (FC) close to zero. All the patterns that Sims produces in his simulations fall into those two coordinates. There is no FC component, which is essential for functional information. I started playing around with stuff like that in the late '70's. You construct some 'rules' that determine some outcome. The rules contain a number of variables. If you let the variables 'evolve' through mutation and crossover, you can change the outcome. If you plot this over time (or over some other extended coordinate system), you get lots of neat stuff, including non-symmetric patterns that have the same shapes as different kinds of plants and leaves. This is especially useful nowdays for 3-D gaming, such as Crysis, which I am presently working through. If the designer of the algorithm is especially intelligent, you can even evolve the rules. You have to be careful about this, however, or you will just end up with a semi-random mess. Here's the thing: every pattern generated by such evolutionary algorithms is solely a mix of OC and RC …. order and randomness. All patterns have a very low level of functional information, usually close to zero bits, for the reason that follows.

Functional complexity (which the authors now concede is identical to functional information in a follow up paper, and the basic formula for which is essentially the same as the Hazen formula I mentioned in the previous post) has at the core of the formula a ratio M(Ex)/N, where M(Ex) = the number of functional options and N = the total number of options, functional and non-functional. The problem with Sims simulations, as with many other evolutionary algorithms, is that pretty much every option is viable and evolvable. You get lots of pretty designs, a potentially infinite number of interesting designs … and that is precisely the problem. In other words the M(Ex)/N ratio in the Sims simulations is close to 1. From Hazen's formula, we see that this results in a functional information level close to 0 bits.

Back to the Trevor and Abel paper on complexity. They point out that functional complexity (FC) has only been observed in human languages, including computer code, and in the genomes of life. They also argue in that peer-reviewed paper that 'rational agency' is required to produce functional complexity. Think on that for a while. Better still, read the paper.

Probabilities:

Jeffrey wrote, " As for "functional information", the mistake you make is that you have no idea of what the probabilities are in the events that led to modern cellular proteins."

There you go again, Jeffrey, throwing up your hands and asserting that ignorance is our only option … another example of why Darwinism is a science-stopper and why biology is about 50 years behind where it should be. We certainly do have an idea of the probabilities, Jeffrey. There has been a great deal of research done on the minimal genome. I've been collecting papers on this for several years. We are getting a very good idea, for example, of how many and what protein coding genes are essential for life. For some recent thinking on this, download this paper from PNAS (2006). By the way, their estimate of 387 protein-coding genes has been upgraded from earlier work, the absolute smallest genome of which was 150 genes. To keep this in perspective, the smallest known actual genome has a little over 480 genes. Once you know how many genes are required for the first lifeform (or at least have a good estimate) you are now in a position to take real data from Pfam and compute the probabilities of each required protein. The formula I use to compute the functional information required for each protein is a little more detailed than Hazen's, but it is essentially the same approach. I use ∆H(Xø(ti), Xf(tj)) = log (W) – H(Xf(ti)) where H(Xf(t)) = - ∑ P(Xf(t)) logP(Xf(t)), but in the simplest case, it reduces to Hazen's formula which ultimately comes from Claude Shannon's work. So for an average 300 residue protein which has 20 options for each residue, I need to compute 6,000 probabilities. I do this from the data on Pfam (e.g., download a few thousand sequences from which I can compute each of the 6,000 probabilities then sum them according the above eqn. The sample size can be as small as 500 but for many proteins, should be over 1,000 sequences to ensure that the sequence space for that protein has been adequately sampled.

Jeffrey the job of science (ID science included) is to do the science, not throw up our hands in despair.

Jeffrey wrote: " You have made claims in the past about these probabilities, and during our debate I cited a paper that showed your probability claims were exaggerated."

My response to that is that you massively misrepresented what that paper said in the debate. My suggestion to you, as well as to anyone who reads this, is to go to that part of the debate where Jeffrey presents his paper and I interrupt him with a question (I've not watched the debate, so I do hope my question is audible). Jeffrey replies in the affirmative, which flatly contradicts what the paper actually says. Having heard what Jeffrey says, then read the first page of the paper ('Protein Structure: Evolutionary bridges to new folds', Current Biology, vol. 17). Contrary to what Jeffrey confidently asserted in response to my question in the debate, this is not an example of a novel domain, but, to quote the paper, "two alternative conformations for the CRD domain that have been observed." The two conformations can be achieved with very little change to the sequence. This is not new. It is suspected that many sequences have more than one conformation. In other words, each area of sequence space that is useful to biology may actually code for a fold-set of one, two, or even three completely different fold. There is one case I came across a few years ago in a virus, where a single base-pair frameshift in the reading codes for an entirely different protein, and a second frameshift codes for a third completely different still protein. Think of the ID that would need to go into that. The computational requirements for achieving that is why beyond what we can do yet, but we're working on it. The problem, Jeffrey, is not moving around within the region of sequence space that codes for a fold-set. Evolution/variation can easily do that with virtually no change in functional information. The problem is finding novel fold-sets somewhere else in sequence space. Yeates example in your paper is of no help whatsoever in doing that. You either got the paper handed to you by an authority and did not read it yourself, or you attempted to read it but didn't understand it (which is perfectly fine, since you are a mathematician and not a biologist). What is not fine is making innocent, but bogus assertions in a debate, banking on the possibility that I won't have time to skim the paper and expose the false assertion. Fortunately, a quick skim rapidly exposed the false assertion and I mentioned this later in the debate or the question period, if I recall correctly (although it could have been to you when we visited afterward). Making confident, innocent but bogus statements in public is the problem with relying on authority for your beliefs. That is why I am encouraging you to become a rational skeptic who thinks through and checks things out for himself. (Note: I'm not implying that you are not rational. By 'rational skeptic' I mean a person who actually questions authority and checks the authority's statements out with a view to understanding why the authority said what he/she did.)

Regarding Arthur Hunt's response to Axe's paper

I just skimmed through this, because I do my own work and do not rely on anyone else when it comes to understanding protein sequence space. However, Hunt's 3-D graphics where very, very misleading. They make it look like finding a folding, functional protein is a hill-climbing problem with a large base. While it is true that for most proteins, the functional efficiency of the sequence space that defines that protein tends to drop off toward the edges, it drops off very rapidly, such that the island is more like an area bounded by steep cliffs. It is also a reality of sequence space that the distance between these islands is vast. I've done some preliminary computations using real data from Pfam, and a very rough but conservative result is that if all the sequences that define a particular structure or fold-set where gathered into an area 1 square meter in area, the next island would be more than a thousand light years away. There are increasing numbers of scientists beginning to notice the paucity of folding sequences in sequence space. That is why we need functional information to locate these functional biological sequences. Biology cannot make them up. Biology has to 'find' the sequences that physics pre-determines will do the job. It is the 'finding' that is the problem, unless of course, one is permitted to search using intelligence. Intelligence can do stuff like that.

Jeffrey, not every person who thinks there is evidence for ID is a moron or a liar. I think you've unquestioningly swallowed stuff by various authorities. You need to become a skeptic and question what any authority says and think through it so you can figure out what is true and what is not. I've watched you; you have a sharp mind, otherwise I would not bother to engage you in these blogs. My goal is to see you become an ally. (I imagine that if you were sipping coffe when you read that, you spewed it all over your keyboard prior to breaking out in a loud guffaw.) However, I think that if I can get you to actually think about some of this stuff, and provided you are intellectually honest, you will become an ally. Like you said, science is self correcting and there is some massive self-correcting beggining to happen when it comes to the role that ID played in the origin and diversification of life.

Final comment: Today is a holiday and I had the time to respond to each one of your many objections. I'm sure that neither you nor I have the time to continue to read through massive posts like this. May I make a suggestion. Just to keep these posts down in size, I wonder if you could just pick out what you see to be the weakest link in my argument, or the most important point, and just respond to that. Otherwise, I will just have to selectively respond to one or two things you said, which may not be the things you most wanted to see me respond to.

Jeffrey Shallit said...

Kirk:

You say: "my job is to make you think more critically about this issue with the objective of seeing you become an ally".

Please, spare me the patronizing rhetoric. Every time I have investigated one of your claims, from what Hilbert said to Axe's paper to the probability of generating proteins, I discover that you have significantly misrepresented the result. So if anyone needs to think critically, it is you.

Jeffrey Shallit said...

Kirk:

You say, "At the very least, the ability to plan requires that an entity be able to come up with a concept or goal and construct a series of steps to accomplish that goal in advance of actually implementing those steps."

OK, now give me an operational definition of "concept" or "goal" that I could apply to a system to see if it has those. I keep asking you to give me such a definition, and you refuse. I think you are the one being evasive here, and insulting me by saying something is "self evident to anyone who has made it into university".

To make it concrete, suppose I give you a computer program written (let's say) in C. What test could I carry out to see whether such a program has a "concept" or "goal"? Please stop being evasive, just answer the question.

Jeffrey Shallit said...

Kirk:

When you said earlier that, "The program was to detect any sound not likely to be a result of nature and it did so, very nicely, Soviet subs included", that was a fib. It could not do what you claimed, and I'm glad you now admit it.

However, you continue misrepresenting the facts when you say, "But that does not stop us from doing the science and constantly improving our methods of ID detection, especially where the stakes can be high, such as in national defense and related intelligence areas." Nobody in national defense is doing "ID detection" in the abstract. Rather, what they are doing is trying to separate human-caused events from non-human-caused events. Really, I have to question your honesty when you refuse to admit this.

Jeffrey Shallit said...

Kirk:

You write, "Jeffrey, the evidence for this is so obvious and overwhelming, that I have to believe that you are trying to evade the issue."

Oh, please. Cut the patronizing. If it were definitively proved that mutation and natural selection (and other evolutionary mechanisms such as genetic drift) were incapable of producing the proteins of life, this would be major news in every scientific journal. No such proof exists except in the minds of creationists.

Your citation of the Taylor et al. paper is completely irrelevant, since no one thinks modern proteins arose purely from a random pool. On the contrary, proteins show evolutionary similarity, demonstrating that they arose from an evolutionary process combining mutation with selection. Why don't you do a literature search for "protein evolution"?

As for the Trevors-Abel paper, surely you jest. I am very familiar with it, because we all had a good laugh about it some time ago. That paper is rather, uhh, eccentric (to put it politely), and consists nothing more than a series of wild claims with no evidence supporting them. They do not even cite a single paper about artificial life, demonstrating that they are blissfully unaware of the work done in that area. Artificial life, such as the Koza's 1994 paper in Artificial Life III, offer a direct counterexample to the claims of Trevors-Abel.

So please, spare me the "However, my job is to turn you into a thinking skeptic, Jeffrey, rather than let you appeal to authorities" nonsense. Anyone who can cite the Trevors-Abel paper with a straight face evidently needs a huge dose of skepticism of his own.

Jeffrey Shallit said...

Kirk:

You write:

"Just to keep these posts down in size, I wonder if you could just pick out what you see to be the weakest link in my argument, or the most important point, and just respond to that."

This is my blog, and I'll answer the way I see fit. I prefer to discuss a single issue at a time, but you were the one who insisted on raising a dozen different ones in each response.

Jeffrey Shallit said...

Kirk:

You write, "An evolutionary search engine has 10 exp42 searches to find something that occupies only about 10exp-248 % of 240-residue sequence space. I don't think I need to tell you what the implications are of that for the effectiveness of an evolutionary search engine."

I think you must be kidding. Evolutionary search programs routinely find solutions that occupy a much smaller proportion of the total space than that, with nowhere near 10^42 random choices. I myself have used evolutionary algorithms to find strings with thousands of bits having certain properties, and in just a few hours of computer time.

Jeffrey Shallit said...

Kirk:

You write, "A number of published papers have pointed out that stable folding regions of sequence space are surrounded by non-folding regions, and my own exploration of sequence space is bearing that out."

How about revisiting the discussion here? There were several papers cited there that take issue with your claims, and it appears to me that actual biologists are not terribly convinced by your arguments. Funny how other people are also convinced that you dishonestly misrepresent the published literature. I wonder why that is.

Doppelganger said...

Kirk:

"Examples of ID:
a) a MacBook Pro
b) a radio signal from interstellar space that contains the instructions for a working fusion reactor
c) the Venter Institutes 'watermarks' in their artificial genome
d) the results from a ground penetrating radar scan that reveals the foundations of an ancient Egyptian city (can't recall the name …. it was on Discovery channel a couple weeks ago)."

Great!

And this lends credence to the notion that some disembodied yet anthropomorphic superbeing 'designed' and created the bacterial flagellum, etc., how?


I am quite unimpressed by arguments via strained analogy, and I suspect few but the already 'converted' are, either.

When you come up with something legitimate and relevant, let us all know.

Doppelganger said...

Jee writes:

"Now your definition of "intelligent" involves "planning". Your definition of "planning" involves "thinking". In other words, evolution couldn't possibly be intelligent because it is not the result of "thinking". But "thinking" in common usage is about minds, so you end up denying evolution is intelligent because it is not implemented in a mind."


Exactly so. Creationist arguments are quite often arguments via definition. Gitt and pals simply 'define' all mutations to be a loss of information, and since evolution relies on mutation, it is thus a loss of information. Voila!

The same silly tautologies are the 'best' creationists of all stripes seem capable of, and yet they are just so proud and cocksure of their claims. Weird.

Wesley said...

"but that now we know that there is much more to explain on the cellular level than our predecessors could ever imagine."

Charles Darwin's proposed hypothesis of heredity, pangenesis, *required* a hugely complex organization of sub-cellular systems. IDC advocates skate right by that all the time. Darwin was wrong about the mechanism, but he was right about the complexity, though IDC advocates still emit the "blob of jelly" canard quite often.

Wesley R. Elsberry

Anonymous said...

Hmmm.... my ears were burning, now I see why.

Kirk said, about my analysis of Axe (2004):

"I just skimmed through this, because I do my own work and do not rely on anyone else when it comes to understanding protein sequence space. However, Hunt's 3-D graphics where very, very misleading. They make it look like finding a folding, functional protein is a hill-climbing problem with a large base. While it is true that for most proteins, the functional efficiency of the sequence space that defines that protein tends to drop off toward the edges, it drops off very rapidly, such that the island is more like an area bounded by steep cliffs. It is also a reality of sequence space that the distance between these islands is vast. I've done some preliminary computations using real data from Pfam, and a very rough but conservative result is that if all the sequences that define a particular structure or fold-set where gathered into an area 1 square meter in area, the next island would be more than a thousand light years away. There are increasing numbers of scientists beginning to notice the paucity of folding sequences in sequence space. That is why we need functional information to locate these functional biological sequences. Biology cannot make them up. Biology has to 'find' the sequences that physics pre-determines will do the job. It is the 'finding' that is the problem, unless of course, one is permitted to search using intelligence. Intelligence can do stuff like that."

1. My illustrations were spot-on, IMO. A good way to see this is to think about studies such as those of Meinke et al (Biochemistry 47, 6859-6869, 2008). Notice in particular the results obtained with Fip1. Add these to studies such as those of Lange et al. (Science 320, 1471, 2008) and one can begin to uncover a remarkable universe of protein dynamics, and how accessible this universe is to the engine of mutation, variation, and selection. (I’ll have more to say about Meinke et al. on my own blog in the next few weeks.)

2. Kirk, you have to pay attention to the details of Axe (2004) to see why your arguments (and his) make no sense. Axe's own method leads one to the conclusion that a fully-functional, robust beta-lactamase is more "abundant" in sequence space than the severely-crippled variant he dissected. My discussion is pretty much the only way to resolve this paradox. That it pretty much demolishes your arguments is tangential but revealing.

3. It's experimentally-observed fact that there are many ways to "leap" from functional island to functional island.

4. The occurrence of functional islands in sequence space is far less rare than you let on, Kirk. The scope of the problem for you is illustrated (briefly, and among other places - I apologize for the shameless self-promotion) here, , and . Basically, Kirk, if things were as you say, the examples discussed in these essays simply would not exist.

5. Said occurrence may be far more common than even these studies imply. Studies such as those of Chiarabelli et al. (Chemistry and Biodiversity 3, 840-859, 2006) show that folded sequences are rather common (20%!) in collections of random polypeptides. This is a large part of the problem of function, and it’s not nearly the impossibly inaccessible one you imply.

Wesley said...

Kirk,

What proportion of the 10 exp 42 mutations you concede were insertions? Deletions? Duplications? I didn't see anything in your calculations that would account for the rather different effects on information content each of those well-known mutational processes has, and I'm certain I'm not cataloging the full extent of known mutational processes. In fact, it looks like the only mutational process that you do consider is single nucleotide point mutations.

Perhaps fixing that part of the expectation should be a priority for you.

Wesley R. Elsberry

Jeffrey Shallit said...

Here are Art's last 2 comments reformatted:

4. The occurrence of functional islands in sequence space is far less rare than you let on, Kirk. The scope of the problem for you is illustrated (briefly, and among other places - I apologize for the shameless self-promotion)) here, here, and here. (Basically, if things were as you say, Kirk, the examples discussed in these essays simply would not exist.)

5. Said occurrence may be far more common than you let on, Kirk. Studies such as those of Chiarabelli et al. (Chemistry and Biodiversity 3, 840-859, 2006) show that folded sequences are rather common (20%!) in collections of random polypeptides. This is a large part of the problem of function, and it's not nearly the impossibly inaccessible one you imply.

Jeffrey Shallit said...

By the way, Art's blog is here.

RBH said...

Kirk wrote

Jeffrey, now I'm beginning to think that it is you who is being evasive. I am reminded of Bill Clinton's question on what the meaning of the word 'is' is, during his grand jury testimony in the Monica Lewinsky affair. I would expect that most of us know what the word 'planning' means, but if anyone wants some clarification as to what it means, you can start here.

The request is for some operational definition that allows us to reliably and validly distinguish effects that are the result of planning from those that are not. This is the point Durston repeatedly evades. Appealing to ordinary conceptions of terms is not useful if one wants to do actual research.

Kirk wrote

Until that paper comes out, however, it is important that one does not confuse intelligently designed evolutionary algorithms with what goes on in nature.

Durston here neglects the important distinction between the intelligent design -- code writing -- of a particular instantiation of an evolutionary algorithm on the one hand, and the system of operators that is being instantiated -- the abstract algorithm itself on the other. The genetic algorithms my company uses are intelligently designed, in the sense that we write the code. The abstract algorithm itself, consisting mainly of the evolutionary operators random mutation, recombination, and differential reproduction as a function of relative fitness -- is quite general and is characteristic of biological systems -- Hell, it's stolen from biological systems! Sure, we impose a fitness function -- an equation for calculating the relative fitness of agents -- on the system, but that's because we have a specific applied problem to be solved, and we write the fitness function in the light of the problem we want solved. In nature there is no comparable specific problem to be solved, aside from pure survival and reproduction, that is set for a population. Rather, anything that improves relative fitness can be grabbed by the population. So the "intelligent design" of the fitness functions in human-made genetic algorithms is not a part of the algorithm used by nature. In nature only relative fitness in a given selective environment is relevant, and there is a wide range of "problems" that might be solved to increase relative fitness, as distinguished from human-written GAs employed to solve specific problems.

Kirk asked

Do you know how much functional information is encoded for the average protein of 300 to 400 residues?

Durston, do you know how much functional information, under the definition you claim to be using in your link, is encoded for the average protein of 300 to 400 residues? No, you do not. The precise way in which Kirk phrased that question demonstrates that he is not using the definition he claims to be using, since that definition requires specification of more than merely the length of the protein:

We conclude that rigorous analysis of the functional information of a system with respect to a specified function x requires knowledge of two attributes: (i) all possible configurations of the system (e.g., all possible sequences of a given length in the case of letters or RNA nucleotides) and (ii) the degree of function x for every configuration.

Durston is engaging in yet another ID terminological bait and switch. Absent the second requirement one cannot estimate the functional information of a protein. Kirk claims, without any evidence offered, that it is too great to be generated by the evolutionary algorithm. But that claim is based on nothing more substantial than his personal incredulity with some definitional flim flam thrown in to make it look good.

For example, Durston provides an alleged example of calculating the functional information where he writes

Insert that probability into the formula for functional information published by Hazen et al to get 140 bits of functional information that could accidentally be generated through mutation in the history of the earth. So if we observe an effect that requires significantly more functional information than 140 functional bits, then we can reasonably conclude that it is an example of ID.

But that calculation makes no reference to "the degree of function x for every configuration" which is required to calculate the functional information in the Hazen, et al. definition. So Durston's numbers are plainly meaningless. None of Durston's subsequent comments on probabilities and information repair that defect.

In summary, Durston is not actually using the definition of "functional complexity" he claimed to be using, but is using some other idiosyncratic definition, with the definition altered at whim to suit the argument he wants to make at the moment.

Unknown said...

Introductory Comments:

I must say that I am surprised by the amount of vitriol and sneering insults that can be observed in some of the responses to my last post. It makes me wonder if some do not have more fundamental issues that have nothing to do with science. I am constantly amazed when I give a brief explanation of something but do not expound upon every last detail, and some here immediately assume I've made an unforgivable oversight and am a liar and moron. Good grief! For example, rbh, in the middle of his simplistic version of what he thinks that I think about functional information, writes, " Durston is engaging in yet another ID terminological bait and switch." I also observe that some here cannot take good natured poking. I realize that it is often the case that people can come across as socially inept on the internet, but in real life are fine people. I'm going to give you all the benefit of the doubt and assume that is the case here. I won't do any more good natured poking if the guilty parties here can calm themselves and focus on the technical aspects of what we are discussing. Hopefully, this can be a collegial discussion.

As I mentioned in my last post, I cannot address everything that everyone raises. There are many tempting things to respond to here, but I will choose what I believe to be the most important thing, functional information.

Functional Information:

rbh points out that there are two criteria for calculating functional information. "(i) all possible configurations of the system (e.g., all possible sequences of a given length in the case of letters or RNA nucleotides) and (ii) the degree of function x for every configuration.".

rbh then goes on to assume that I've ignored the second criterion.

By way of response, I certainly have not ignored it. Let us consider two examples: (i)RecA and (ii) the amount of functional information that could reasonably be generated by 10exp 42 mutational events.

For RecA, which typically has 240 residues, the first criterion is easily known. The total number of possible configurations for 240 sites, where there are 20 options per site, is simply 20exp240. To satisfy the second criterion, we must get an idea of how many configurations will satisfy the functional requirements of biological life. Not all configurations will perform their function with the same efficiency, so the degree of efficiency is important. Ultimately, it is the cell that decides whether the degree of efficiency of any particular allele of RecA is good enough or not. Mutation events are constantly sampling sequence space for functional RecA sequences whose degree of function or efficiency is sufficient for biological life. Those alleles that fall below the threshold of minimum acceptable efficiency are eliminated by natural selection. Those that are above that threshold can be preserved.

We have a record, in the genomes of life, of those RecA sequences that are above that minimum threshold of efficiency. At the time I computed the functional information required to code for RecA, there were 1,553 aligned sequences available on Pfam. The assumption that I made is that, although they may all have varying degrees of efficiency, all of them are sufficiently functional for biological life.

The next step is to compute the probabilities. Hazen et al use a simplified version of Shannon's equation for uncertainty or entropy. If we simply plug 1,553 unto the equation for M(Ex), we will get an inordinately high value for functional information. It is better to use a more sophisticated equation to do the computation that recognizes that not all sequences are equally probable …. which Shannon's equation does. The equation for functional information (ζ ) that I used for RecA was

ζ = ΔH (Xg(ti), Xf(tj))

where

ΔH(Xø(ti), Xf(tj)) = log (W) - H(Xf(ti))

where Xø represents the null state Log W, where W represents the total number of possible states and where Xø is a special case of the more general ground state Xg, and where Xf represents the joint variable of both data and function, and where

H(Xf(t)) = -ΣP(Xf(t)) logP(Xf(t))

Keep in mind that I've already stated that the cell decides what degree of function is sufficient and natural selection eliminates those sequences that fail the test. What is left is recorded in Pfam. In other words, I most certainly do concern myself with the issue of function and degree of function. Furthermore, we are not completely in the dark about what portion of sequence space will satisfy my Xf or Hazen's M(Ex), which are one and the same thing. The next thing I did was to take the 1,553 sequences and compute the 4,800 probabilities (20 different probabilities for each site), corresponding to the frequency of occurance of each amino acid at each site. I then plugged everything into the above equations to get a functional information value of 832 Fits (Functional Bits).

Hazen's formula is only for the special case when all sequences are equally probable. My approach is not to make that assumption and compute accordingly. Furthermore, the record of life does indicate that not all sequences are equally probably. I infer from this that those sequences that are less efficient, not too far above the minimum threshold of function, will appear less often and then, only for those organisms that can tolerate a lower degree of efficiency. Be that as it may, if we do make Hazen's assumption that all sequences are equally probable, and we take 832 Fits and solve for M(Ex), we find that, in theory, about 10 exp 250 different possible sequences could satisfy the function of RecA (not 1,553 as some might assume). There are some assumptions that have been made, however, that make that number on the high end, not the least of which is the assumption of equal probability. The existence of varying degrees of function or efficiency is almost certainly going to make that number smaller.

Another worry some may have is whether those 1,553 sequences are an adequate sampling of sequence space. If we have only one sequence, we will get a erroneously high value for functional information if, in fact, there are 10 exp 250 different functional sequences. The more sequences we have, the lower the value for functional information drops until we have a sufficient number to see the curve flatten out. With Hazen's approach, that works simply with the number of sequences, you will not see this, but with the more detailed approach I outlined, which looks at the frequency of occurrence of each amino acid at each site, we do. If find that using the method I outlined above, we need at least 500 sequences before we start to see the curve level off. 1,000 or 1,500 sequences are even better and, by that time, the curve is pretty flat at a particular value, in this case, 832 Fits. In other words, after obtaining 1,000 or so sequences, the frequency of occurrence of each amino acid at each site is becoming closer to being a constant. The addition of more sequences has little, if any effect on the frequency of occurrence of each amino acid at each site.

With regard to the second case, we wanted to get an idea of what we could expect 10 exp 42 trials to produce, by way of functional information. rbh takes me to task for not taking into account function, or degree of function, in my computation. Well, I certainly did. Looking at Hazen's equation, the highest value for functional information is obtained when M(Ex) = 1. This should be obvious. That is what we want, the highest reasonable estimate for how much functional information could accidentally be generated by 10 exp 42 mutational events. So we are stuck with 1; it does not get better than that. Plugging the values into Hazen's equation, we get the value I suggested. If rhb insists upon having more than one functional configuration, then the upper limit drops, which I don't think he's going to want, because it raises the probability of ID in the long run.

I want to discuss evolutionary algorithms, fitness functions and the distribution of functional sequences in sequence space, but I only have time for one thing at a time, and I'll stop here.

Any questions?

Wesley said...

Kirk,

"10 exp 42" is not the configurational space that you can plug into Hazen's formula as N.

Each mutational event (and I notice that you haven't dealt with the fact that not all mutations are point mutations) alters a genome. The M. genitalium genome is 580,073 bases long, for example. If we go with an assumption of uniform distribution of point mutations occurring in an M. genitalium-sized genome, we might find some interesting information. 1e42 being the estimate of total mutational events, we should apportion those that affect one species of bacteria. Perhaps one ten billionth the total will serve as a guess, as total species diversity estimates tend to top out at about 5 billion. One finds that 1e32 such mutations would be expected to alter *each* base within that sized genome 1.7e26 times. There are 720 nucelotides in the coding sequence for RecA for a total of some 1.2e29 mutational events altering a RecA-sized set of bases. 1.2e29 isn't the total number of variants produced, either. Recombination can, and does, mix and match parts of sequences. Let's say we start with some nonsense sequence, perhaps ALU repeats, of 720 bases in length. One point mutation occurs, is replicated, and later another point mutation occurs. At this point, there are three variants in the population. Recombination, though, can bring the two point mutations that are so far separate together into a sequence that brings the variant total up to four. And that's just considering the start of this process of adding, one by one, 1.2e29 point mutations to that sequence space.

Kirk, your approach to estimating sequence diversity is way, way off base. The number of mutational events is *not* the number of variants, and is not a *limit* on the number of variants.

Try again.

Wesley R. Elsberry

Wesley said...

Kirk,

The PFAM database number of entries for a protein is not an estimate of M(E_x) to be plugged into Hazen's formula or your derivative. The "curve levels off" phenomena doesn't speak to the adequacy of sampling functional states, but rather to the sequences being derived via common ancestry. You assume that living organisms with proteins entered in PFAM survey the possible functional states, and that is an unwarranted assumption. It will overestimate the "functional information" values you obtain.

Wesley R. Elsberry

Wesley said...

Kirk,

According to PFAM, RecA in M. genitalium has 322 amino acid residues, not 240 as you stated earlier.

Looking back, I find that this is not the first time that your reported use of PFAM hasn't been, well, entirely accurate.

There's another inherent problem with using protein database entries as an estimate of the extent of functional divergence of amino acid sequence: proteomics does not yet do much of anything with the well-known information that protein polymorphism is ubiquitous. Usually, only one protein sequence per protein family is cataloged per species, which itself makes for a ridiculous restriction on functional variation that is found intraspecifically.

PFAM and other protein databases don't get one anywhere near a figure for number of functional variants of a protein. That's not what they are there for.

Wesley R. Elsberry

Wesley said...

Kirk,

Looking back, I see that you didn't claim that M. genitalium had a 240-residue RecA protein, but you did assert that 240 residues was typical for RecA.

According to PFAM, though, 2,384 out of 2,423 RecA sequences have 342 residues. I think that pretty much would stand as "typical" for most uses of "typical". Where exactly are you getting your data?

Wesley R. Elsberry

Unknown said...

Introduction: The comments below are entirely a response to Wesley and center around the measurement of functional information for proteins. It seems that he is unfamiliar with Pfam, did not carefully read what I wrote in my previous entry, and does not understand the equations posted in my previous entry.

Wesley wrote: " The PFAM database number of entries for a protein is not an estimate of M(E_x) to be plugged into Hazen's formula or your derivative. ….. You assume that living organisms with proteins entered in PFAM survey the possible functional states, and that is an unwarranted assumption. It will overestimate the "functional information" values you obtain."

Wesley, I must say that it is discouraging to read such sloppy scholarship. Before you critique someone's work, it is important that you read it. You are, of course, right in that you do not simply plug the total number of Pfam sequences into the value for M(Ex), but I clearly stated that in my previous entry! If you will read what I wrote, you will note that I said, "If we simply plug 1,553 unto the equation for M(Ex), we will get an inordinately high value for functional information. It is better to use a more sophisticated equation to do the computation that recognizes that not all sequences are equally probable …. which Shannon's equation does."

Furthermore, based on your sloppy reading of my entry, you have gone on to badly misrepresent me on that other antievolution blog. Please make the appropriate corrections.

The length of RecA

Wesley wrote: " Looking back, I see that you didn't claim that M. genitalium had a 240-residue RecA protein, but you did assert that 240 residues was typical for RecA. According to PFAM, though, 2,384 out of 2,423 RecA sequences have 342 residues. I think that pretty much would stand as "typical" for most uses of "typical". Where exactly are you getting your data?"

I want to respond to two areas where you appear to be confused: (i) the typical length of RecA and (ii) how Pfam works regarding the number of sequences posted.

The number of RecA sequences on Pfam: I note both here and in that other antievolution thread you mentioned, that you imply I'm out to lunch because the number of RecA sequences I analyzed (1,553) does not match the number on Pfam as of today (2,423). What you appear not to know is that the Pfam data is constantly being updated and revised. You also, once again, failed to carefully read what I posted in my previous post. In my previous post I wrote, " At the time I computed the functional information required to code for RecA, there were 1,553 aligned sequences available on Pfam." I computed the functional information for RecA in the summer of 2006. At that time, only 1,553 sequences were listed. Before you critique anyone's work, good scholarship demands that you not only read what they wrote, but you also understand it and understand the tools they used, in this case Pfam.

The typical length of RecA:

If you will carefully look at the Pfam data, you will notice that there are two different numbers given for the typical length of RecA. In the short write-up, you will see that it states, " RecA is a protein of about 350 amino-acid residues." However, if you will look at that actual data summary for the 2,423 sequences, you will see that it states that the average length is only 234.3 residues, much closer to the typical length I gave of 240 residues. So which is right?

This is where it is helpful to do your own work and not to simply quote an abstract. It turns out that if you look at the set of aligned sequences for RecA, given in Pfam, you will find that, as of today, there are 548 columns in the aligned set. This is because there are numerous insertions found in RecA, which are recorded in the RecA data. The first thing I do, in pre-processing the data, is attempt to compress the RecA sequence. One assumption I make is that the basic function of RecA is independent of insertions. To clarify, if the insertion is removed, the protein will still be functional. Now if you do not like this, you can keep the insertions in, but it will automatically give you a significantly higher value for the functional information (and I have actually done the computational analysis on this to verify this). A Darwinist does not want a higher functional information value. Furthermore, any honest researcher does not want to inflate the value. For these reasons, part of the pre-processing software I have written, removes insertions from the aligned sequence data if they occur less than a specified percentage of the overall data (that is an input value, but typically is set at 5% - 10%). When this was done with RecA, the resulting length of the sequences was 240 residues, very close to the latest value for the average length of 234.3 residues given by Pfam as of today. Hopefully, that answers your question, " Where exactly are you getting your data?"

The number of functional sequences of RecA:

Wesley wrote: " PFAM and other protein databases don't get one anywhere near a figure for number of functional variants of a protein. That's not what they are there for."

Of course, not, and I certainly did not imply that. In fact, if you will actually read what I wrote in my previous entry, you will notice that I predicted, on the basis of the method outlined, that " if we do make Hazen's assumption that all sequences are equally probable, and we take 832 Fits and solve for M(Ex), we find that, in theory, about 10 exp 250 different possible sequences could satisfy the function of RecA (not 1,553 as some might assume)." Again, Wesley, I must protest your very sloppy scholarship when it comes to critiquing what I wrote.

What is a generous estimate for the number of locations in sequence space that evolutionary processes could have tried over 4 billion years?

Wesley wrote: " Each mutational event (and I notice that you haven't dealt with the fact that not all mutations are point mutations) alters a genome." and also, " Kirk, your approach to estimating sequence diversity is way, way off base. The number of mutational events is *not* the number of variants, and is not a *limit* on the number of variants."

First of all, Wesley, I do not know why you assume that when I speak of mutations, I am only speaking of point mutations. That is an assumption on your part that is false. Insertions and deletions are also considered to be mutations (at least they were when I studied genetics at the U of Waterloo). It makes no difference when sampling sequence space whether the mutation is a point mutation or an insertion/deletion (indel). Both a point mutation and an indel result in a single specific coordinate in sequence space. A point mutation keeps you at the same level (residue-space) and an indel moves you up or down levels within sequence space. Think of sequence space as an inverted pyramid that gets exponentially larger as you move up through the levels.

Secondly, when you mention recombination as an important factor in sampling sequence space, you seem to be thinking of eukaryotes, since "the term genetic recombination as applied to bacteria and bacteriophages, leads to the replacement of one or more genes present in one strain with those from a genetically distinct strain. While this is somewhat different from our use of genetic recombination in eukaryotes, …." (my U of Waterloo genetics textbook, Klug & Cummings, 'Concepts of Genetics', (2000), p.179). I do wonder how carefully you have thought about the concept of evolutionary processes as a search engine in sequence space. First of all, eukaryotes make an insignificant contribution in comparison to prokaryotes, which are estimated in number to be around 10 exp 30. Since prokaryote recombination exchanges entire genes, rather than recombining genes sections, the primary way that prokaryotes sample protein sequence space is by mutations that occur within prokaryote genes. (Note: it may be the case that my U of Waterloo text is out of date, and it would surprise me if recombination with bacterial genes fragments never took place, so if one can put a published figure on the actual recombination rate of a bacterial gene, then by all means, add it to the computation). I hope you see the difference between the recombination of a genome and the recombination of a gene, when it comes to prokaryotes.

For the record, this is how I came up with the estimated 10 exp 42 search events for evolutionary processes:

First, based on population figures, eukaryotic searches are insignificant in comparison to prokaryote searches, so we will focus on the prokaryote search engine.

Total estimated number: 10 exp 30 (Whitman et al)
Replication rate: every 30 minutes
Mutation rate: 10 exp -6 per 1,000 bp per replication
average genome size: 100,000 protein-coding genes (generous for prokaryotes …. humans have roughly 26,000)
# trials/lineage over 4 billion years: 10 exp 15 trials
Total # trials for 10 exp lineages: 10 exp 42 trials (unique locations in sequence space)

This is only a rough estimate. If anyone else wants to work through a different estimate, with reasonable numbers, please go ahead and do it. Furthermore, if there is a published number, I'd much rather go with that, provided it can withstand review. I find that Darwinists seem very reluctant to publish this information, but I may have missed it.

Computing the functional information required to encode a protein:
Wesley, there are several statements you made in your critique that indicate to me that you not only did not carefully read what I wrote in my earlier post, but you also do not understand the implications and application of the equations I mentioned in my earlier post. As a starting point, you need to understand the difference between the following three methods to estimate the functional information of proteins:

1. Take the entire known number of functional sequences for a protein and plug that number into the value for M(Ex). Problem: this will give you an erroneously high value for the functional information, as I mentioned in my previous post and you seemed to have missed.

2. For each site in the protein, observe the number of different amino acids (Af) that can occur at that site. Then calulate Af/20 for each site to get a Pf value for each site. Then take the sum of Pf log(Pf) for all sites to estimate the functional uncertainty, then compute the difference between the null state and this one. Problem: you get an erroneously low estimate of functional information. For example, if you do this for Ankyrin, you will find that for all 33 sites (if memory serves me correctly), any one of the 20 amino acids will do and any one of the 33 sites. Yet most random 33-residue sequences are non functional. This is because it is often the case that substitutions that may be tolerated singly, are often not tolerated in conjunction with other mutations. The reason for this is that there are higher order dependencies within protein sequences. I currently have a paper nearing completion (being co-authored by a professor and a post-doc, both from Jeffrey's university, the University of Waterloo) that looks at ubiquitin. As it stand right now, that paper will have an extremely mathematically rigorous explanation of functional complexity/function information. I have found 13 higher order dependencies in ubiquitin, one is a seventh-order relationship, using software developed at the U of Waterloo that is being applied to proteins for the first time. I have written some software that uses this second type of approach simply to get a lower limit for the amount of functional information for a given protein. Incidentally, there reaches a point, using this method, where the addition of any further sequences does not change Pf for each site. In other words, all the amino acids that are tolerated at a given site are embedded within the Pfam data provided the sample size is sufficient.

3. The method I briefly outlined in my previous post. This is the best method we have available today. Instead of looking at how many different amino acids are tolerated at each site, the frequency of occurrence of each of the normal 20 amino acids at each site is computed. For method (2), we have one Pf value per site, for this third method, we have 20 per site. Before you attempt to critique my method, please attempt to understand it and, if you have any questions, please ask me before waxing eloquent on some blog somewhere.

With both methods (2) and (3) you do not need to sample all of sequence space (there isn't enough time before the universe lapses into heat death to sample all of sequence space, even if quantum computers do achieve full bloom). All you need to do is to sample enough of sequence space to either find out how many different amino acids are tolerated at each site (erroneous method 2), or what the frequency of appearance of each amino acid is at each site. This can be checked, I have checked it for various protein families, and there are sufficient sequences for many protein families, to get a pretty accurate set of figures for these values (i.e., the curve approaches a horizontal asymptote and further increasing the sample size no longer adds any further significant information). I have never encountered a Darwinist who has had the curiosity to actually do this. One may be out there somewhere, but I'm unaware of any. In general, biology is about 50 years behind the rest of science when it comes to information theory and how it relates to protein structure and function. Hazen's paper, simple approach as it is, should have been published 50 years ago, within 10 years of Shannon's original paper. Leon Brillouin actually published a similar paper to Hazen's albeit more rigorous, in 1951. The Darwinists saw the implications and abandoned ship. I've personally, on several occasions, run into Darwinists who deliberately recommend against applying information theory to biopolymers.

Wesley, I ask that before you critique anything I write, that you read it carefully and that you cease from misrepresenting me on the internet. Note that Jeffrey's initial post referred to ID proponents as liars (with the implication that Darwinists honestly represent ID proponents). I don't see that you have made an honest attempt to honestly represent me in your comments, neither here nor elsewhere. Prudence might dictate that if you think you've discovered a mistake somewhere you might want to question me on that before going off half-cocked. I have gone into the functional complexity of proteins much, much deeper than briefly outlined here. I am finding that Darwinists simply have not done the work. I'm going to give you the benefit of the doubt that your misrepresentation of me was a result of sloppy scholarship and not deliberate or intentional. I would prefer a more collegial, back-and-forth question/response, such as what one encounters in a defense. That way, we can have a constructive and informative discussion and actually make some progress in science in this area.

Wesley said...

Kirk,

I haven't had time to check the insertions issue on typical RecA length, but that at least sounds plausible. Let's set that one aside for the moment. That takes care of the "substantial discrepancy" comment I had.

But I wasn't implying that the 1,553 proteins in your dataset to the current PFAM 2,400+ proteins difference was an issue. That's part of why I said "current", after all. I know quite well that protein databases are expanding, so please leave off on the patronizing. Admonitions for careful reading are best heeded all around, wouldn't you think?

More later as I get time.

Wesley R. Elsberry

Anonymous said...

Kirk, your approach fails utterly for beta-lactamases. We know why it fails, and the reasons lead one to conclude that your approach does little more than estimate an unrealistic upper bound for the improbability (or lower bound for the "information") associated with biochemical function.

Unknown said...

Wesley wrote: "… I know quite well that protein databases are expanding, so please leave off on the patronizing. Admonitions for careful reading are best heeded all around, wouldn't you think?"

Agreed, Wesley. And I will also attempt to avoid patronizing, although I'd rather describe my previous post as 'frank and direct'. Perhaps I was a little too frank and direct, however, which is not conducive to a scholarly and friendly discussion. I am so often taken to task by people who have not carefully read or worked through the problem that it is a little hard to restrain myself sometimes and write in a way that comes across as gracious. I do enjoy a detailed and informed discussion that includes incisive questions designed to highlight what the other person perceives to be a problem. I love that kind of stuff. So don't hesitate to raise a question that focuses on what you think is the 'Achilles heel' and I will see what I can do about it. I don't know if Art's beta-lactamase concern qualifies in both of your minds or not. If so, I'm willing to look at it.

art said … "Kirk, your approach fails utterly for beta-lactamases."

Art, up until I read your post this morning, I've never looked at beta-lactamase and never run it through my software. I'm not sure how you came to the conclusion that my approach 'fails utterly' without actually running the family through my software. Could you be more specific? …. are you talking about the equations to measure functional information, or are you talking about my approach in compressing the sequences by removing low-frequency insertions?. Removing insertions for some protein families can be problematic …. I've run into that on one other family. That is why the type of insertions removed is an input variable as I indicated earlier …. one does want to remove insertions that do not significantly contribute to functionality, but one does not want to remove important insertions either. I do know that there are older versions of my software 'out there' ….. did you run beta-lactamase through my software and find a problem? I notice that as of today, there are 3,207 sequences available on Pfam, which might be a good sampling.

If there is sufficient interest and expertise to critique my results, I could run the family through my software and see what comes up. It would take about a day of my time, however, so I'd rather only do it if you and others really think this is a really big issue.

Anonymous said...

Hi Kirk,

Before I get to the beta-lactamases, I would remind you that you seem to have missed my earlier response to your comments about my review of Axe (2004). Of particular importance are the experimental results that refute your claims about the "shape" of the functional protein universe.

As for the beta-lactamases, I suggested these because, if one focuses on function (which is the proper concept if one is concerned with evolution), then these proteins confound your approach to estimating the information needed to fulfill a given biochemical function. This is because two completely different, unrelated at the sequence and structural levels, protein families can do the same reaction. If you group these families together (as you must if the emphasis is on the evolution of function), then your method will arrive at the conclusion that darned near any sequence will be able to catalyze the beta-lactamase reaction. (I can say this without even running your programs, because the alignments that one would get would almost certainly insert any amino acid at any position, with but a modest preference for specific amino acids at particular places. These preferences will be statistically dampened by the consequences that come with aligning totally unrelated sequences.)

This example raises (once again - I've been over this with you before) the issue that you insist on avoiding, but inevitably destroys your argument. Until you figure out a way to account for the hypothetical possibility that totally unrelated sequences can perform the same function (a possibility that is demonstrated fact!), then the approach of relying solely on extant sequences to estimate absolute information content is fatally flawed. The best you can do is get at an upper limit, a value that is pretty useless. It's the lower limit you want to be getting at, and you are completely ignoring the notion that such a limit is different from the one you estimate.

Unknown said...

Art, I was not avoiding your comments about the 'shape' of functional sequence space for proteins. As I mentioned earlier, I have chosen to start with the central issue, that of defining how one measures functional information for protein families. We've got a good start on that, and your most recent post above offers opportunity to further advance this concept of functional information.

Measuring functional information on the basis of function:
I have not looked at or worked with the beta-lactamases, so will respond on the basis of your brief description of what you see as a problem. As you stated, "two completely different, unrelated at the sequence and structural levels, protein families can do the same reaction. If you group these families together (as you must if the emphasis is on the evolution of function), then your method will arrive at the conclusion that darned near any sequence will be able to catalyze the beta-lactamase reaction. (I can say this without even running your programs, because the alignments that one would get would almost certainly insert any amino acid at any position …"

Art, your suggestion that I would have to conflate the sequence alignments for the two families into one alignment in order to measure functional information, is not correct. To explain the proper way to measure functional information for two different protein families that share the same function, I will use Hazen's simplified approach, rather than the more sophisticated equations I use. This spares us from a plethora of superscripts, subscripts and mathematical symbols and still gives us the same general answer.

Recall the equation for functional information published by Hazen et al is

I(Ex)=-log2[M(Ex)/N]

where I(Ex) = functional information for specified degree of function x,
M(Ex) = the number of different configurations that meets or exceeds the specified degree of function x and
N=the total number of possible configurations.

For our purpose, Ex = the minimum degree of function required by the cell.

In the case where the same function is satisfied by two different protein families, which we will call (a) and (b), we get

I(Ex)a+b= -log2[(M(Ex)a+M(Ex)b)/(Na+Nb)

If we know the average length of the two seed sequences, then Na and Nb are trivial to calculate. The unknowns in the above equation are I(Ex)a+b, M(Ex)a, and M(Ex)b. With three unknowns, we need two more equations in addition to the one we already have above. The two additional equations are:

I(Ex)a = -log2[M(Ex)a/Na]

and

I(Ex)b = -log2[M(Ex)b/Nb]


Using my software, the sequence alignment for family (a) is input, and the software computes I(Ex)a. That value is inserted into the second equation, which then allows one to solve for M(Ex)a. The same process is done for family (b).

The two values obtained by the above method are then inserted into the first equation to solve for I(Ex)a+b. This gives us the value for the functional information required to satisfy the specific function under investigation.

Art wrote:
" This example raises (once again - I've been over this with you before) the issue that you insist on avoiding, but inevitably destroys your argument. Until you figure out a way to account for the hypothetical possibility that totally unrelated sequences can perform the same function (a possibility that is demonstrated fact!), then the approach of relying solely on extant sequences to estimate absolute information content is fatally flawed."

Well, Art, as I have just shown above, this is not the fatal problem you imagined it to be and I'm certainly not insisting on avoiding this non-existent problem. I can compute the functional information required for a particular structure, or for a particular function, or even for a set of functions or a fold set. Some protein families can give two completely different folds or two completely different functions. I can handle that as well. In fact, I can handle as many folds or functions as you wish. It is not particularly difficult, which is why I said earlier that biology is about 50 years behind the other sciences when it comes to analyzing the functional information encoded into biopolymers, and what that information actually does. The paper of Hazen et al published in 2007 is less advanced than Leon Brillouin's paper of 1951. (Brillouin was one of the primary contributors to the literature after Shannon's paper of 1948.)

Upper and Lower limits:
Art wrote, " The best you can do is get at an upper limit, a value that is pretty useless. It's the lower limit you want to be getting at, and you are completely ignoring the notion that such a limit is different from the one you estimate."

There's certainly no ignoring anything on my part. I discussed this in one of my earlier posts, but let me go over this again. First, the upper limit is discussed by Hazen et al, and I won't go over it again here. Obviously, when one looks at Hazen's equation for an upper limit, one can see that what I am doing is definitely not computing an upper limit. No one is particularly interested in the value I(Emax).

What we are interested in is I(Ex) where Ex denotes the minimum degree of function required by the cell. I make two assumptions:

1. Life is constantly sampling both functional and non-functional sequence space through mutations (point and indels).
2. Natural selection eliminates those sequences that have a degree of functionality less than Ex and favors those sequences that have a degree of function significantly higher than Ex (at least up to a point …. 100% efficiency can also lead to disaster in some cases).

The corollary for (2) is that those sequences that survive (i.e., what we have in Pfam) have a degree of functionality equal to or greater than Ex. If you work through the equations that I use in my software you will see that the value of I(Ex) that I compute is definitely NOT the upper limit. In fact, as I mentioned earlier, if I plot I(Ex) vs sample size, it begins with Hazen's I(Ex)max and rapidly drops for the first 500 or so sequences, beginning to become more horizontal between a sample size of 500 to 1,000 sequences and gradually leveling off after that, approaching what appears to be the true value for I(Ex). What that tells us is that even though it is impossible for life to sample all of sequence space, its sampling is sufficiently random enough to begin to trace the 'edges' of sequence space for a particular Ex required by life.

If one, for curiosity sake, is interested in an erroneously low value for I(Ex), one can take the entire sequence array and determine the total number of different amino acids permitted for each site and compute I(Ex)error on that basis. That method assumes that all of the permitted amino acids at that site are equally functional, regardless of what the rest of the sequence is. Of course, we know that there are higher order relationships within the primary structure that translate into secondary and tertiary structural relationships as well as external functional relationships. So computing I(Ex)error is not a true lower limit, but much lower than the true lower limit. Nevertheless, even I(Ex)error is impressive for many proteins.

I think I've answered your concerns, Art. Are there any further problems that you'd like me to address?

Unknown said...

An additional note:
In the example of calculating the functional information where two different protein families can perform the same function, if the Na and Nb are within an order or two of magnitude, then one ought to do the calculation as I outlined in my post above. However, there is an approach that can simplify things if I(Ex)a is less than I(Ex)b (or visa versa). I am not interested in the most complex sequence that will perform the function, but the least complex sequence that will do the job. This is related to the issue of compression that I raised earlier in dealing with the removal of insertions. So I would computationally calculate both I(Ex)a and I(Ex)b. If I(Ex)a was less than I(Ex)b, then I would ignore family (b) and go with family (a). Of course, this is not as thorough as the method I posted above, but as far as I am concerned, this second approach is better under certain circumstances. My rationale is that we are interested in the minimal amount of functional information required for life, not the maximum amount. Note that one cannot assume that the shorter protein family will require less functional information to encode. I have found that, sometimes, a shorter protein actually requires a larger amount of functional information to encode that some larger proteins.

Anonymous said...

Hi Kirk,
Thanks for your explanation. Unfortunately, there is still something missing.

As I understand things, you would estimate the information needed for a particular per my example, you would add the number of functional sequences for each of the two general (totally different) classes of beta-lactamase, with this sum being used to assess information content. The problem I see is that you are assuming that these two classes are the only ones, in all of sequence space, that could possibly perform the beta-lactamase reaction. You have no empirical basis for making such an assumption (for beta-lactamases or pretty much any protein or enzyme), and in fact there is a sizeable body of literature that shows that this assumption cannot be justified. (We’ve been over this before on other boards, so I won’t turn this comment into a lengthy exposition.) Indeed, it is likely that the total of functional sequence families dwarfs the numbers of functional sequences in a particular family, and by many tens of orders of magnitude. As I see things, this would render your estimates rather irrelevant.

As I stated in my review of Axe (2004), the literature gives us lower boundaries of information that fall into the “zero CSI” realm. Your approach does not change this, and as far as I can tell really doesn’t allow us to even explore this more pertinent side of the inequality.

As you think about this, perhaps you might consider running cytochrome P450’s through your method. Like the beta-lactamases, this family of enzymes is also instructive.