Thursday, August 23, 2007

The Noncommutative Frobenius Problem is Solved!

I rarely get the opportunity to talk about research problems on this blog, but this is an exception.

Consider the famous "Chicken McNuggets problem": if Chicken McNuggets are sold at McDonald's only in boxes of 6, 9, or 20 McNuggets, what's the largest number of McNuggets you can't buy at McDonald's? The answer happens to be my favorite number, 43. (Why it is my favorite is a story that will have to wait for another day.) Notice that you can buy any number of McNuggets greater than 43. For example,

44 = 4*6 + 1*20,

45 = 5*9,

46 = 1*6 + 2*20,

47 = 3*6 + 1*9 + 1*20,

48 = 8*6,

49 = 1*9 + 2*20,

and any number greater than 49 can be obtained by adding an appropriate multiple of 6 to these.

In general, you're given a set S of integers, and you want to know the largest number that cannot be expressed as a non-negative integer linear combination of the elements of S. This is called the Frobenius number because Frobenius is supposed to have mentioned it often during his lectures.

Of course, such an integer may not exist. For example, if S = {4, 6}, then you can only get multiples of 2 by forming an integer linear combination of 4 and 6. So you need to impose an important condition: the greatest common divisor of the elements of S must be 1. If this is the case, then you can express every sufficiently large integer as a non-negative integer linear combination of the elements of S.

Now, solving the classical Frobenius problem is easy in some cases. For example, here's a well-known formula for the case when S has only two elements, x and y, and here it is: xy - x - y. So, for example, the Frobenius number of 5 and 7 is 23. Every number larger than 23 can be expressed as a non-negative integer linear combination of 5 and 7. 24, for example, would be 2*5 + 2*7, and so forth -- but 23 itself cannot.

Greenberg wrote a paper in 1988, and Davison wrote another paper in 1994, in which they gave an efficient algorithm for three numbers.

Unfortunately, the general problem was proved NP-hard (under Turing reductions) by Ramirez Alfonsin in 1996. Roughly speaking, this means the problem is at least as hard as many classical problems for which we still have no efficient solution, such as the traveling salesman problem.

About 6 years ago, I suggested generalizing this problem from numbers to strings of symbols (sometimes called "words"). This kind of generalization is a typical activity in mathematics and theoretical computer science. You take a well-studied problem over one kind of domain, and see how the problem translates in another. The classical Frobenius problem dealt with positive integers, so we'll replace them with strings. Now S will be a set of strings.

Now we have to find the analogue of a non-negative integer linear combination. This is no problem: strings like to join up with other strings, in a process called concatenation, so our non-negative integer linear combinations will be replaced by all possible ways to concatenate the strings of S, using any of them any number of times (including 0 times). This process is sometimes called Kleene closure after the American logician Stephen Kleene, and we write it as S*. For example, if S = {0, 01}, then S* = { ε, 0, 00, 01, 000, 001, 010, ... }. That funny ε at the beginning is the usual mathematical representation for the empty string: the string with no letters at all in it. Notice that order matters in concatenation (houseboat isn't the same as boathouse), so we've gone from a commutative setting to a noncommutative one.

So S* is an infinite set of strings. (By the way, in this particular example, where S = {0, 01}, the set S* happens to have a nice description: it's the set of all strings that don't begin with a 1, and don't have two consecutive 1's appearing anywhere.)

OK, now we've generalized the "non-negative integer linear combination" part. How about the condition that the greatest common divisor (gcd) of the elements of S must be 1? Well, the point of that condition was just to ensure that every sufficiently large integer had a representation, so rather than worrying about how to define the greatest common divisor of two strings, let's replace the gcd condition with: every sufficiently long string must be in S*. There's another way to say this: the complement of a set T is the set of all strings occurring in the universal set that don't occur in T. Here the universal set is just the set of all strings. So our condition is that the complement of S* should be finite. In the arcane lingo of mathematicians, we can abbreviate this as: S* is co-finite.

Now, are there any examples where S* is co-finite? Yes. One example is S = {1, 00, 01,10, 000, 01010}. It shouldn't be too hard to convince yourself that every string except 0 and 010 can be written as concatenation of strings in this set. So in this case S* is co-finite.

Finally, we need an analogue of the largest integer not representable. That's not hard --- it's just the length of the longest string not in S*. In the previous example, where S = {1, 00, 01,10, 000, 01010}, the answer would be 3, since 010 is the longest string omitted from S*. Let's call this the Frobenius number of the set of strings S.

Now it's not too hard to show that if the longest string in S is of length n, and S* is co-finite, then the Frobenius number is bounded by a function which is exponential in n. For example, over an alphabet of two letters, you can show that the Frobenius number is < (2/3)*(4n - 1). (In case the exponent doesn't show up well in your browser, that's 4 to the n power, minus 1, times two-thirds.)

I figured out this generalization a while ago, and the upper bound a year or so ago. But I could not find any example where the Frobenius number was anywhere near as big as that upper bound. Indeed, for a long time, the worst-case example I knew was when you took S to be the set of all strings of length n, together with all the strings of length n+1. In this case, the Frobenius number is easily shown to be n2 - n - 1.

I gave the problem to my Ph. D. student Zhi Xu, who only started a few months ago. We tried many different possibilities, but eventually we figured out that you should try sets S that contain almost all strings of some lengths, omitting only a few. And recently he made a big breakthrough: he was able to construct examples where the Frobenius number is exponentially large in n, the length of the longest string. Here's one of his examples: take S to be over a binary alphabet, that is, over {0,1}, and let it contain all the strings of length 3, all the strings of length 5, except that you remove the strings {00001, 01010, 10011}. Then S* is co-finite, and the length of the longest string not in S* is 25. For example, 0000101001100000001010011 can't be gotten by concatenating the strings of S.

So our understanding of Frobenius problem for strings is now greatly improved, in the sense that there is an exponential upper bound and a class of examples that provides an exponential lower bound. The bounds are not exactly tight, but they are much tighter than ever before. Zhi Xu has many other results on this problem, and you can see our paper on the subject here. We wrote it together with Jui-Yi Kao, another student of mine who also made some very nice contributions. I expect this paper will form the basis of Zhi Xu's Ph. D. thesis. Of course, there is much more to be done in this area.

I hope you enjoyed this tour of my recent research, and congratulations to Zhi Xu for a job well done!


ollie said...

Congratulations! I love problems like these: relatively easy to state, but difficult to do.

David Swart said...

Please send my congratulations along to your students.

Progress like this was always the highlight of my time spent at school.

Also, strongly consider relating your interest in 43, as you have now piqued my and possibly many others' interests.