My student,

Narad Rampersad, had a successful defense (or "defence", as they like to say in Canada) of his Ph. D. thesis yesterday. That's a good excuse to discuss some aspects of combinatorics on words.

In combinatorics on words, we are interested in words and their properties. By a

*word*, I mean a finite or infinite string of symbols. The Norwegian mathematician Axel Thue initiated this field a hundred years ago in his study of infinite words avoiding squares and overlaps. A

*square* is a word of the form

*xx*, where

*x* is a nonempty word. For example, the English word

`murmur` is a square, with

*x* =

`mur`. Thue asked, is it possible to create an infinite word over a finite alphabet, such that it avoids squares (i.e., contains no squares)?

Over a two-letter alphabet, this can't be done: the first letter is

`0`, say. So, the next letter must be

`1`, otherwise we'd have the square

`00`. The third letter must be

`0`, for otherwise

`011` would have the square

`11`. But then whatever you choose for the fourth letter gives a square.

Over a three-letter alphabet, however, it turns out you can construct an infinite word without squares. The construction is quite clever, and it was rediscovered by several people after Thue -- he published his work in a rather obscure Norwegian journal.

Thue also considered another kind of repetition avoidance: avoiding overlaps. An

*overlap* is a pattern that is just slightly more than a square: it consists of two repetitions of a word

*x*, followed by the first letter of a third repetition. For example, the English word

`alfalfa` is an overlap, because it consists of two copies of

`alf`, followed by the first letter of

`alf`.

Narad's thesis was entitled "Overlap-free words and generalizations", and he made several beautiful contributions to combinatorics on words. To describe one, I need the idea of a

*morphism*, which is just a transformation of words defined by replacing each letter with a given word, and then joining all the results together. For example, consider the morphism μ defined as follows: replace each

`0` with

`01` and each

`1` with

`10`. If I apply μ to a word like

`011`, I get

`011010`. I also need the idea of

*fixed point*; a word is said to be a fixed point of a morphism if, when you apply the morphism, you get the same word back again. Thue discussed a special word

**t** =

`01101001100101101001011001101001`...

which is a fixed point of the morphism μ I just described. He proved the following amazing facts: first, the infinite word

**t** contains no overlaps (we say it is "overlap-free"). Second, if an infinite binary word is overlap-free and is the fixed point of a morphism, then it either equals

**t** or its complement, which is obtained by changing every

`0` to

`1` and vice-versa.

Narad generalized this last result of Thue. To explain his generalization, we need the notion of fractional power of a word. Consider a word like

`ingoing`. We can consider this word to consist of

`ingo` followed by the first three letters of the same word; in other words, it is a 7/4-power. More generally, a word is a

*k*+

*p*/

*q* power if it consists of

*k* repetitions of some block of length

*q*, followed by the first

*p* letters of that block. So the French word

`entente` (which has been absorbed into English), is a 7/3 power.

What Narad proved is the following: the word

**t** defined by Thue (and now called the

*Thue-Morse word*) is the only one, other than its complement, that is 7/3-power-free and is a fixed point of a morphism. The proof is rather subtle and it appeared in

*International Journal of Foundations of Computer Science* in 2005.

This is only one of many results contained in his thesis. I can't discuss them all, but I can't resist mentioning one other result, because it has a nice picture attached to it. Suppose we generalize our notion of infinite words to two dimensions: in other words, with every lattice point with non-negative integer coordinates, we associate some symbol. Then one can ask about avoiding squares in the plane: not just every row, column, and diagonal, but

*for every word defined by any line with rational slope*. It turns out that one can indeed avoid squares in this case, as was shown by the Italian computer scientist Carpi in a beautiful 1988 paper; his construction used an alphabet of 16 symbols. Now instead of avoiding squares, suppose one only wants to avoid

*sufficiently large* squares. Can we use a smaller alphabet in that case? Narad showed that if we want to avoid squares

*xx* with the length of

*x* greater than 2, then an alphabet of size 4 suffices. The construction can be illustrated by assigning each lattice point a square of a particular color, a different color for each symbol. Here's a portion of the result:

In January, Narad will be taking up an NSERC postdoc at the University of Winnipeg, to work with James Currie. Congratulations to Narad for a job well-done!