In combinatorics on words, we are interested in words and their properties. By a

*word*, I mean a finite or infinite string of symbols. The Norwegian mathematician Axel Thue initiated this field a hundred years ago in his study of infinite words avoiding squares and overlaps. A

*square*is a word of the form

*xx*, where

*x*is a nonempty word. For example, the English word

`murmur`is a square, with

*x*=

`mur`. Thue asked, is it possible to create an infinite word over a finite alphabet, such that it avoids squares (i.e., contains no squares)?

Over a two-letter alphabet, this can't be done: the first letter is

`0`, say. So, the next letter must be

`1`, otherwise we'd have the square

`00`. The third letter must be

`0`, for otherwise

`011`would have the square

`11`. But then whatever you choose for the fourth letter gives a square.

Over a three-letter alphabet, however, it turns out you can construct an infinite word without squares. The construction is quite clever, and it was rediscovered by several people after Thue -- he published his work in a rather obscure Norwegian journal.

Thue also considered another kind of repetition avoidance: avoiding overlaps. An

*overlap*is a pattern that is just slightly more than a square: it consists of two repetitions of a word

*x*, followed by the first letter of a third repetition. For example, the English word

`alfalfa`is an overlap, because it consists of two copies of

`alf`, followed by the first letter of

`alf`.

Narad's thesis was entitled "Overlap-free words and generalizations", and he made several beautiful contributions to combinatorics on words. To describe one, I need the idea of a

*morphism*, which is just a transformation of words defined by replacing each letter with a given word, and then joining all the results together. For example, consider the morphism μ defined as follows: replace each

`0`with

`01`and each

`1`with

`10`. If I apply μ to a word like

`011`, I get

`011010`. I also need the idea of

*fixed point*; a word is said to be a fixed point of a morphism if, when you apply the morphism, you get the same word back again. Thue discussed a special word

**t**=

`01101001100101101001011001101001`...

which is a fixed point of the morphism μ I just described. He proved the following amazing facts: first, the infinite word

**t**contains no overlaps (we say it is "overlap-free"). Second, if an infinite binary word is overlap-free and is the fixed point of a morphism, then it either equals

**t**or its complement, which is obtained by changing every

`0`to

`1`and vice-versa.

Narad generalized this last result of Thue. To explain his generalization, we need the notion of fractional power of a word. Consider a word like

`ingoing`. We can consider this word to consist of

`ingo`followed by the first three letters of the same word; in other words, it is a 7/4-power. More generally, a word is a

*k*+

*p*/

*q*power if it consists of

*k*repetitions of some block of length

*q*, followed by the first

*p*letters of that block. So the French word

`entente`(which has been absorbed into English), is a 7/3 power.

What Narad proved is the following: the word

**t**defined by Thue (and now called the

*Thue-Morse word*) is the only one, other than its complement, that is 7/3-power-free and is a fixed point of a morphism. The proof is rather subtle and it appeared in

*International Journal of Foundations of Computer Science*in 2005.

This is only one of many results contained in his thesis. I can't discuss them all, but I can't resist mentioning one other result, because it has a nice picture attached to it. Suppose we generalize our notion of infinite words to two dimensions: in other words, with every lattice point with non-negative integer coordinates, we associate some symbol. Then one can ask about avoiding squares in the plane: not just every row, column, and diagonal, but

*for every word defined by any line with rational slope*. It turns out that one can indeed avoid squares in this case, as was shown by the Italian computer scientist Carpi in a beautiful 1988 paper; his construction used an alphabet of 16 symbols. Now instead of avoiding squares, suppose one only wants to avoid

*sufficiently large*squares. Can we use a smaller alphabet in that case? Narad showed that if we want to avoid squares

*xx*with the length of

*x*greater than 2, then an alphabet of size 4 suffices. The construction can be illustrated by assigning each lattice point a square of a particular color, a different color for each symbol. Here's a portion of the result:

In January, Narad will be taking up an NSERC postdoc at the University of Winnipeg, to work with James Currie. Congratulations to Narad for a job well-done!

## 5 comments:

How come your student has focused mostly on journal publications? Why not publish many conference papers as well? I thought conference publications were important in computer science.

Umm, if you look at his CV, you'll see he's published in lots of conferences: WACAM '04, DLT '04, WORDS '03, DLT '07. He also shared the best student paper award at DLT '04.

Maybe add them under "Conference publications" then. Currently, there's only one paper in that section.

Nice. Morphisms are 1D CAs? I see an interesting link to algorithmic randomness: the amount of compressibility of a symbol sequence is clearly related to internal symmetries defined through a grammar of transformations. Non-squaring transformation systems start to look semi-random.

Perhaps a bathroom tiling project in the future?

Post a Comment