I particularly enjoyed this post because it touches on the subject of my Winter 2015 course here at the University of Waterloo. Arrington displays two strings of symbols and says "the second string is not a group of random letters because it is highly complex and also conforms to a specification". By implication he thinks the first string is a group of random letters, or at the very least, more random than the second.
Here are the two strings in question, cut-and-pasted from Arrington's post:
#1:
OipaFJPSDIOVJN;XDLVMK:DOIFHw;ZD
VZX;Vxsd;ijdgiojadoidfaf;asdfj;asdj[ije888
Sdf;dj;Zsjvo;ai;divn;vkn;dfasdo;gfijSd;fiojsa
dfviojasdgviojao’gijSd’gvijsdsd;ja;dfksdasd
XKLZVsda2398R3495687OipaFJPSDIOVJN
;XDLVMK:DOIFHw;ZDVZX;Vxsd;ijdgiojadoi
Sdf;dj;Zsjvo;ai;divn;vkn;dfasdo;gfijSd;fiojsadfvi
ojasdgviojao’gijSd’gvijssdv.kasd994834234908u
XKLZVsda2398R34956873ACKLVJD;asdkjad
Sd;fjwepuJWEPFIhfasd;asdjf;asdfj;adfjasd;ifj
;asdjaiojaijeriJADOAJSD;FLVJASD;FJASDF;
DOAD;ADFJAdkdkas;489468503-202395ui34
#2:
To be, or not to be, that is the question—
Whether ’tis Nobler in the mind to suffer
The Slings and Arrows of outrageous Fortune,
Or to take Arms against a Sea of troubles,
And by opposing, end them? To die, to sleep—
No more; and by a sleep, to say we end
The Heart-ache, and the thousand Natural shocks
That Flesh is heir to? ‘Tis a consummation
Devoutly to be wished. To die, to sleep,
To sleep, perchance to Dream; Aye, there’s the rub,
For in that sleep of death, what dreams may come,
When we have shuffled off this mortal coil,
Needless to say, Arrington -- a CPA and lawyer who apparently has no advanced training in the mathematics involved -- doesn't specify what he means by "group of random letters". I think a reasonable interpretation would be that he is imagining that each message is generated by a stochastic process where each letter is generated independently, with uniform probability, from some finite universe of symbols.
Even with just a cursory inspection of the two strings, we see that neither one of them is likely to be "random" in this sense. We immediately see this about the second string because the set of reasonable English texts is quite small among the set of all possible strings. But we also see the same thing about the first because (for example) the trigram "asd" occurs much more often than one could reasonably expect for a random string. Looking at a keyboard, it's a reasonable interpretation that somebody, probably Arrington, dragged his hands repeatedly over the keyboard in a fashion he or she thought was "random" -- but is evidently not. (It is much harder to generate random strings than most untrained people think.)
If we want to test this in a quantitative sense, we can use a lossless compression scheme such as gzip, an implementation of Lempel-Ziv. A truly random file will not be significantly compressible, with very very high probability. So a good test of randomness is simply to attempt to compress the file and see if it is roughly the same size as the original. The larger the produced file, the more random the original string was.
Here are the results. String #1 is of length 502, using the "wc" program. (This also counts characters like the carriage returns separating the lines.) String #2 is of length 545.
Using gzip on Darwin OS on my Mac, I get the following results: string #1 compresses to a file of size 308 and string #2 compresses to a file of size 367. String #2's compressed version is bigger and therefore more random than string #1: exactly the opposite of what Arrington implied!
I suppose one could argue that the right measure of "randomness" is not the size of the compressed file, but rather the difference in size between the compressed file and the original. The smaller this difference is, the more random the original string was. So let's do that test, too. I find that for string #1, this difference is 502-308 = 194, and for string #2, this difference is 545-367 = 178. Again, for string #2 this difference is smaller and hence again string #2 is more random than string #1.
Finally, one could argue that we're comparing apples and oranges because the strings aren't the same size. Maybe we should compute the percentage of compression achieved. For string #1 this percentage is 194/502, or 38.6%. For string #2 this percentage is 178/545, or 32.7%. String #2 was compressed less in terms of percentage and hence once again is more random than string #1.
Barry's implications have failed spectacularly in every measure I tried.
Ultimately, the answer is that it is completely reasonable to believe that neither of Barry's two strings is "random" in the sense of likely to have been generated randomly and uniformly from a given universe of symbols. A truly random string would be very hard to compress. (Warning: if you try to do this with gzip make sure you use the entire alphabet of symbols available to you; gzip is quite clever if your universe is smaller.)
By the way, I should point out that Barry's "conforms to a specification" is the usual ID creationist nonsense. He doesn't even understand Dembski's criterion (not surprising, since Dembski stated it so obscurely). String #2 can be said to "conform" to many, many different specifications: English text, English text written by Shakespeare, messages of length less than 545, and so forth. But the same can be said for string #1. We addressed this in detail in our long paper published in Synthese, but it seems most ID creationists haven't read it. For one thing, it's not good enough to assert just "specification"; even by Dembski's own claims, one must determine that the specification is "independent" and one must compute the size of the space of strings that conforms to the specification. For Dembski, it's not the probability of the string being generated that is of concern; it's the relative measures of the universe of strings and the strings matching the specification that matters! Most ID creationists don't understand this basic point.
Elsewhere, Arrington says he thinks string #1 is more complex than string #2 (more precisely he says the "thesis ... that the first string is less complex than the second string ... is indefensible").
Maybe Barry said the exact opposite of what he meant; his writing is so incoherent that it wouldn't surprise me. But his statement, as given, is wrong again. For mathematicians and computer scientists, complexity of a string can be measured as the size of the optimal compressed version of that string. Again, we don't have a way to determine Kolmogorov complexity, so in practice one can use a lossless compression scheme as we did above. The larger the compressed result, the more complex the original string. And the results are clear: string #1 is, as measured by gzip, somewhat less complex than string #2.
ID creationists, as I've noted previously, usually turn the notion of Kolmogorov complexity on its head, pretending that random strings are not complex at all. We made fun of this in our proposal for "specified anti-information" in the long version of our paper refuting Dembski. Oddly enough, some ID creationists have now adopted this proposal as a serious one, although of course they don't cite us.
Finally, one unrelated point: Barry talks about his disillusion when his parents lied to him about the existence of a supernatural figure --- namely, Santa Claus. But he doesn't have enough introspection to understand that the analogy he tries to draw (with "materialist metaphysics") is completely backwards. Surely the right analogy is Santa Claus to Jesus Christ. Both are mythical figures, both are celebrated by and indoctrinated in by parents, both supposedly have supernatural powers, both are depicted as wise and good, and both are comforting to small children. The list could go on and on. How un-self-aware does one have to be to miss this?




