Monday, August 13, 2007

Combinatorics on Words, A Puzzle, and a Silly Research Proposal

One of the areas I work in is combinatorics on words. In this field, we are interested in words (strings of symbols) and their properties.

For example, a square is a nonempty word of the form xx, where x is a word. Some examples of squares in English include tartar and murmur. If a word contains no subword that is a square, we say it is squarefree. Notice that the word squarefree is not squarefree, as it contains the square ee.

A classical question, first discussed a hundred years ago by the Norwegian mathematician Axel Thue, is whether there exist infinite squarefree words over a finite alphabet. Over a two-letter alphabet, this is not possible, as any binary word of length 4 or more contains a square. (Proof: let the first letter be a. Then if the next letter is a, we have a square. So assume the next letter is b. If the third letter is b, we have a square again. So assume the third letter is a. Now, whatever letter is chosen as the fourth letter, we get a square.)

But is it possible over a three-letter alphabet? The answer is yes, as Thue showed.

Erdős considered a variation on this problem. Instead of trying to avoid squares, let's try to avoid abelian squares. An abelian square is a nonempty string of the form x x', where x' is a permutation of x. Some nice examples of abelian squares in English include reappear (the first half of the word reappears in the second half) and mesosome. Can you find the longest word in English that is an abelian square? Hint: it represents part of the body. If you're interested in this sort of wordplay, see my page on repetitions in natural languages, where the solution to the puzzle is given.

For many years it was not known whether one could construct an infinite word over a 4-letter alphabet that contains no abelian squares. But in 1992, Veikko Keränen showed that one could. His construction involves constructing a map that sends each letter in {a,b,c,d} to a string of length 85. Starting with a, one then simply iterates this map. For more information, visit Keränen's page.

Now we've covered combinatorics on words, and the puzzle. It's time for the third item in the title. Go visit this silly research proposal by a professor of civil engineering and engineering mechanics at Columbia University. The proposal is so poorly written that I can't really tell what he has in mind, but he seems to suggest representing strategies for development in third-world countries by strings of symbols that specify a temporal sequence of events. I really hope this is a joke, but if it is not a joke, I hope it doesn't get funded.

ollie said...

Hmmm, it seems as if the "square free" problem would have a group theoretic solution.

Jeffrey Shallit said...

There are indeed some connections with group theory. Note that the set of all words over a finite alphabet form a free monoid (or, if you don't count the empty word, a semigroup), not a group. Of course, you can always add inverses, if you like, to get a free group.

I don't know any purely group-theoretic approach to Thue's result on three letters. However, people have applied Thue's result and related ones to problems in group theory, such as Burnside's problem.

As for avoiding abelian squares, that does have some group-theoretic aspects. There is a famous paper by Dekking that uses a group-theoretic approach to construct words avoiding higher abelian powers.

David Swart said...

my wordlist also has an engish abelian square "horseshoer".

Jeffrey Shallit said...

David:

"Horseshoer" is a nice one! Thanks.