My student,
Narad Rampersad, had a successful defense (or "defence", as they like to say in Canada) of his Ph. D. thesis yesterday. That's a good excuse to discuss some aspects of combinatorics on words.
In combinatorics on words, we are interested in words and their properties. By a
word, I mean a finite or infinite string of symbols. The Norwegian mathematician Axel Thue initiated this field a hundred years ago in his study of infinite words avoiding squares and overlaps. A
square is a word of the form
xx, where
x is a nonempty word. For example, the English word
murmur is a square, with
x =
mur. Thue asked, is it possible to create an infinite word over a finite alphabet, such that it avoids squares (i.e., contains no squares)?
Over a two-letter alphabet, this can't be done: the first letter is
0, say. So, the next letter must be
1, otherwise we'd have the square
00. The third letter must be
0, for otherwise
011 would have the square
11. But then whatever you choose for the fourth letter gives a square.
Over a three-letter alphabet, however, it turns out you can construct an infinite word without squares. The construction is quite clever, and it was rediscovered by several people after Thue -- he published his work in a rather obscure Norwegian journal.
Thue also considered another kind of repetition avoidance: avoiding overlaps. An
overlap is a pattern that is just slightly more than a square: it consists of two repetitions of a word
x, followed by the first letter of a third repetition. For example, the English word
alfalfa is an overlap, because it consists of two copies of
alf, followed by the first letter of
alf.
Narad's thesis was entitled "Overlap-free words and generalizations", and he made several beautiful contributions to combinatorics on words. To describe one, I need the idea of a
morphism, which is just a transformation of words defined by replacing each letter with a given word, and then joining all the results together. For example, consider the morphism μ defined as follows: replace each
0 with
01 and each
1 with
10. If I apply μ to a word like
011, I get
011010. I also need the idea of
fixed point; a word is said to be a fixed point of a morphism if, when you apply the morphism, you get the same word back again. Thue discussed a special word
t =
01101001100101101001011001101001...
which is a fixed point of the morphism μ I just described. He proved the following amazing facts: first, the infinite word
t contains no overlaps (we say it is "overlap-free"). Second, if an infinite binary word is overlap-free and is the fixed point of a morphism, then it either equals
t or its complement, which is obtained by changing every
0 to
1 and vice-versa.
Narad generalized this last result of Thue. To explain his generalization, we need the notion of fractional power of a word. Consider a word like
ingoing. We can consider this word to consist of
ingo followed by the first three letters of the same word; in other words, it is a 7/4-power. More generally, a word is a
k+
p/
q power if it consists of
k repetitions of some block of length
q, followed by the first
p letters of that block. So the French word
entente (which has been absorbed into English), is a 7/3 power.
What Narad proved is the following: the word
t defined by Thue (and now called the
Thue-Morse word) is the only one, other than its complement, that is 7/3-power-free and is a fixed point of a morphism. The proof is rather subtle and it appeared in
International Journal of Foundations of Computer Science in 2005.
This is only one of many results contained in his thesis. I can't discuss them all, but I can't resist mentioning one other result, because it has a nice picture attached to it. Suppose we generalize our notion of infinite words to two dimensions: in other words, with every lattice point with non-negative integer coordinates, we associate some symbol. Then one can ask about avoiding squares in the plane: not just every row, column, and diagonal, but
for every word defined by any line with rational slope. It turns out that one can indeed avoid squares in this case, as was shown by the Italian computer scientist Carpi in a beautiful 1988 paper; his construction used an alphabet of 16 symbols. Now instead of avoiding squares, suppose one only wants to avoid
sufficiently large squares. Can we use a smaller alphabet in that case? Narad showed that if we want to avoid squares
xx with the length of
x greater than 2, then an alphabet of size 4 suffices. The construction can be illustrated by assigning each lattice point a square of a particular color, a different color for each symbol. Here's a portion of the result:
In January, Narad will be taking up an NSERC postdoc at the University of Winnipeg, to work with James Currie. Congratulations to Narad for a job well-done!