Saturday, November 10, 2007

Narad Rampersad's Work on Combinatorics on Words

My student, Narad Rampersad, had a successful defense (or "defence", as they like to say in Canada) of his Ph. D. thesis yesterday. That's a good excuse to discuss some aspects of combinatorics on words.

In combinatorics on words, we are interested in words and their properties. By a word, I mean a finite or infinite string of symbols. The Norwegian mathematician Axel Thue initiated this field a hundred years ago in his study of infinite words avoiding squares and overlaps. A square is a word of the form xx, where x is a nonempty word. For example, the English word murmur is a square, with x = mur. Thue asked, is it possible to create an infinite word over a finite alphabet, such that it avoids squares (i.e., contains no squares)?

Over a two-letter alphabet, this can't be done: the first letter is 0, say. So, the next letter must be 1, otherwise we'd have the square 00. The third letter must be 0, for otherwise 011 would have the square 11. But then whatever you choose for the fourth letter gives a square.

Over a three-letter alphabet, however, it turns out you can construct an infinite word without squares. The construction is quite clever, and it was rediscovered by several people after Thue -- he published his work in a rather obscure Norwegian journal.

Thue also considered another kind of repetition avoidance: avoiding overlaps. An overlap is a pattern that is just slightly more than a square: it consists of two repetitions of a wordx, followed by the first letter of a third repetition. For example, the English word alfalfa is an overlap, because it consists of two copies of alf, followed by the first letter of alf.

Narad's thesis was entitled "Overlap-free words and generalizations", and he made several beautiful contributions to combinatorics on words. To describe one, I need the idea of a morphism, which is just a transformation of words defined by replacing each letter with a given word, and then joining all the results together. For example, consider the morphism μ defined as follows: replace each 0 with 01 and each 1 with 10. If I apply μ to a word like 011, I get 011010. I also need the idea of fixed point; a word is said to be a fixed point of a morphism if, when you apply the morphism, you get the same word back again. Thue discussed a special word

t = 01101001100101101001011001101001...

which is a fixed point of the morphism μ I just described. He proved the following amazing facts: first, the infinite word t contains no overlaps (we say it is "overlap-free"). Second, if an infinite binary word is overlap-free and is the fixed point of a morphism, then it either equals t or its complement, which is obtained by changing every 0 to 1 and vice-versa.

Narad generalized this last result of Thue. To explain his generalization, we need the notion of fractional power of a word. Consider a word like ingoing. We can consider this word to consist of ingo followed by the first three letters of the same word; in other words, it is a 7/4-power. More generally, a word is a k+p/q power if it consists of k repetitions of some block of length q, followed by the first p letters of that block. So the French word entente (which has been absorbed into English), is a 7/3 power.

What Narad proved is the following: the word t defined by Thue (and now called the Thue-Morse word) is the only one, other than its complement, that is 7/3-power-free and is a fixed point of a morphism. The proof is rather subtle and it appeared in International Journal of Foundations of Computer Science in 2005.

This is only one of many results contained in his thesis. I can't discuss them all, but I can't resist mentioning one other result, because it has a nice picture attached to it. Suppose we generalize our notion of infinite words to two dimensions: in other words, with every lattice point with non-negative integer coordinates, we associate some symbol. Then one can ask about avoiding squares in the plane: not just every row, column, and diagonal, but for every word defined by any line with rational slope. It turns out that one can indeed avoid squares in this case, as was shown by the Italian computer scientist Carpi in a beautiful 1988 paper; his construction used an alphabet of 16 symbols. Now instead of avoiding squares, suppose one only wants to avoid sufficiently large squares. Can we use a smaller alphabet in that case? Narad showed that if we want to avoid squares xx with the length of x greater than 2, then an alphabet of size 4 suffices. The construction can be illustrated by assigning each lattice point a square of a particular color, a different color for each symbol. Here's a portion of the result:

In January, Narad will be taking up an NSERC postdoc at the University of Winnipeg, to work with James Currie. Congratulations to Narad for a job well-done!


Anonymous said...

How come your student has focused mostly on journal publications? Why not publish many conference papers as well? I thought conference publications were important in computer science.

Jeffrey Shallit said...

Umm, if you look at his CV, you'll see he's published in lots of conferences: WACAM '04, DLT '04, WORDS '03, DLT '07. He also shared the best student paper award at DLT '04.

Anonymous said...

Maybe add them under "Conference publications" then. Currently, there's only one paper in that section.

Erdos56 said...

Nice. Morphisms are 1D CAs? I see an interesting link to algorithmic randomness: the amount of compressibility of a symbol sequence is clearly related to internal symmetries defined through a grammar of transformations. Non-squaring transformation systems start to look semi-random.

David Swart said...

Perhaps a bathroom tiling project in the future?