My student, Narad Rampersad
, had a successful defense (or "defence", as they like to say in Canada) of his Ph. D. thesis yesterday. That's a good excuse to discuss some aspects of combinatorics on words.
In combinatorics on words, we are interested in words and their properties. By a word
, I mean a finite or infinite string of symbols. The Norwegian mathematician Axel Thue initiated this field a hundred years ago in his study of infinite words avoiding squares and overlaps. A square
is a word of the form xx
, where x
is a nonempty word. For example, the English word murmur
is a square, with x
. Thue asked, is it possible to create an infinite word over a finite alphabet, such that it avoids squares (i.e., contains no squares)?
Over a two-letter alphabet, this can't be done: the first letter is 0
, say. So, the next letter must be 1
, otherwise we'd have the square 00
. The third letter must be 0
, for otherwise 011
would have the square 11
. But then whatever you choose for the fourth letter gives a square.
Over a three-letter alphabet, however, it turns out you can construct an infinite word without squares. The construction is quite clever, and it was rediscovered by several people after Thue -- he published his work in a rather obscure Norwegian journal.
Thue also considered another kind of repetition avoidance: avoiding overlaps. An overlap
is a pattern that is just slightly more than a square: it consists of two repetitions of a wordx
, followed by the first letter of a third repetition. For example, the English word alfalfa
is an overlap, because it consists of two copies of alf
, followed by the first letter of alf
Narad's thesis was entitled "Overlap-free words and generalizations", and he made several beautiful contributions to combinatorics on words. To describe one, I need the idea of a morphism
, which is just a transformation of words defined by replacing each letter with a given word, and then joining all the results together. For example, consider the morphism μ defined as follows: replace each 0
and each 1
. If I apply μ to a word like 011
, I get 011010
. I also need the idea of fixed point
; a word is said to be a fixed point of a morphism if, when you apply the morphism, you get the same word back again. Thue discussed a special wordt
which is a fixed point of the morphism μ I just described. He proved the following amazing facts: first, the infinite word t
contains no overlaps (we say it is "overlap-free"). Second, if an infinite binary word is overlap-free and is the fixed point of a morphism, then it either equals t
or its complement, which is obtained by changing every 0
Narad generalized this last result of Thue. To explain his generalization, we need the notion of fractional power of a word. Consider a word like ingoing
. We can consider this word to consist of ingo
followed by the first three letters of the same word; in other words, it is a 7/4-power. More generally, a word is a k
power if it consists of k
repetitions of some block of length q
, followed by the first p
letters of that block. So the French word entente
(which has been absorbed into English), is a 7/3 power.
What Narad proved is the following: the word t
defined by Thue (and now called the Thue-Morse word
) is the only one, other than its complement, that is 7/3-power-free and is a fixed point of a morphism. The proof is rather subtle and it appeared in International Journal of Foundations of Computer Science
This is only one of many results contained in his thesis. I can't discuss them all, but I can't resist mentioning one other result, because it has a nice picture attached to it. Suppose we generalize our notion of infinite words to two dimensions: in other words, with every lattice point with non-negative integer coordinates, we associate some symbol. Then one can ask about avoiding squares in the plane: not just every row, column, and diagonal, but for every word defined by any line with rational slope
. It turns out that one can indeed avoid squares in this case, as was shown by the Italian computer scientist Carpi in a beautiful 1988 paper; his construction used an alphabet of 16 symbols. Now instead of avoiding squares, suppose one only wants to avoid sufficiently large
squares. Can we use a smaller alphabet in that case? Narad showed that if we want to avoid squares xx
with the length of x
greater than 2, then an alphabet of size 4 suffices. The construction can be illustrated by assigning each lattice point a square of a particular color, a different color for each symbol. Here's a portion of the result:
In January, Narad will be taking up an NSERC postdoc at the University of Winnipeg, to work with James Currie. Congratulations to Narad for a job well-done!