## Bounded in an eggshell

It should come as no surprise to anyone that I’m trying to learn some programming languages so I can get a job in software. Just now I googled “google interview questions” and found a blog post with ten. I’ve already seen half of them, though I only remembered the solutions to some, and one of them gave me an “oh, not that one again” moment because I can never remember the idea behind its best solution.  The solution given there isn’t very good, and even the improved version is badly motivated.  Thus, I will explain it here.  The problem is this:

You are standing before a 100-story building with two (identical) eggs and wondering how far one can fall without breaking.  How many tries will it take you to find out?

I will spoil it for you: the answer is 14.   It does not come from the following logic: if you tried to test every floor up from the bottom until failure, it would take as many as 100 tries.  This is clearly wrong, since you only used one egg.  You could test every other floor (up to 50 tries) and, when the first egg breaks, test the floor below with the other one.  That’s 51.  Continuing in the pattern of testing by increments, you find that if you test floors n at a time, then you need 100/n tries for the first egg and then an additional n – 1 tries for the second, for a total of (100/n) + n – 1 tries.  Using calculus (that is, differentiating and setting to zero) you will figure out that this is minimal if $n = \sqrt{100} = 10$.

This is a little wrong: if n doesn’t divide 100, you have to round down 100/n and then the last few floors won’t be as many as n – 1; however, since 100 is a perfect square, the ideal answer is actually an integer, and the various near-miss integer values of n, although they lead to slightly decreased trials because of rounding, don’t actually go below the ideal.  This method gives 19 tries, by the way.

This is tempting, but just because you have pushed one idea to the limit doesn’t mean that you’ve found the best answer.  Let me tell you how you can tell that there is a better answer: suppose you want to find the best number N of trials.  Imagine that you have N trials to use up, and it may take fewer, but never more.  Therefore, both of your eggs will break somewhere between 1 and N trials, because it is only when an egg breaks that you learn something definitive.  Intuitively, you realize that the two numbers (when the first egg breaks and when the second does) alone must determine the floor of the building up to which an egg can be safely dropped, and so all you need to do is choose N large enough that the number of such pairs is at least 101 (the egg could be unable to survive any fall, so “the ground floor” is a possible answer).   The answer to this partition problem is $\binom{N + 1}{2}$; it turns out that $\binom{15}{2} = 105$ and $\binom{14}{2} = 91$, so we should have N = 15 – 1 = 14.

I asserted that it is intuitively clear that you can read off the answer “how many floors” just from knowing the two numbers of trials it took to break each egg.  Let me elaborate: first, when I say “when each egg breaks” I really mean “when the first breaking of an egg occurs” and “when the second breaking occurs”; it really doesn’t matter if you use the same egg continually or switch off, because all that matters is the breaking of some egg.  Second, although while doing the tests you are also keeping in mind the additional information of which floors you dropped from, that information is redundant.   Here’s why.

Suppose you and I made up a weird scheme such that, after breaking each egg after only one try, it were possible that the answer was either “first floor” or “tenth floor”, and then you went off and did the tests and reported back to me that this is what happened.  I would ask: how do you know which answer it is?  You might answer: well, the last floor I tested was the first.  I would say: Oh, so because of the way the scheme works, that means you started from the second floor.  (For example.)  I guess if you’d told me that then you wouldn’t have had to remember what the last floor you tested was, because the testing scheme enforces a particular progression of floors.  If it were possible to make an arbitrary choice at some point, and if these choices didn’t all give the same answer, then even you, performing the tests yourself, couldn’t know that the floor you end up on is really the best, unless you checked all the possibilities, thus making the choice pointless.  And if they do all give the same answer, then you may as well modify the scheme so that you always choose, say, the lowest possible floor among the alternatives to test next, and the choice goes away.

What I’m saying is that this testing scheme is really a way of encoding the floor number in a pair of integers: where each egg breaks.  The required number of tests is whatever increases the capacity of this code above the number of floors (100, in this case).  Once you have a formula for the capacity, you can back-solve for the required number.  This actually proves that 14 is the minimal number of trials: there are no better ideas that you can push to lower limits.

There’s still the issue of how to construct a testing scheme (that is, a code) that does this.  To do this, you have to move past thinking of the pair of breaking times as being just an abstract set of data and remember that these numbers are obtained by successive experimentation.  For example, suppose that both eggs break in one try; I have to assign a floor to this, and that has to be the lowest possible floor that an egg can survive up to.  That means that, for every lower floor, either I tested it, or I didn’t test it but I tested some higher floor and the egg survived.  I’ve only done two tests, and the egg didn’t survive either one, so I can’t have skipped any floors; therefore, the egg can’t survive any fall: the answer is “the ground floor”.  If the first break was after one trial and the second after two, well, I know that the first time I test the second egg it must be on the first floor (since until I do the test I can’t know whether it will survive, and if it fails, I can’t have skipped any floors), and that blows my chance to skip any with the second (failed) trial, so that has to occur on the second floor.  So the pair (1, 2) means “first floor” (the last place the egg could fall from successfully).

If you go on like this, you’ll realize that if the first egg fails its first trial, the second egg has to successively test the floors 1-13 from the bottom up, producing test data (1, n) with n up to 13. Which floor did you break the first egg on? It has to be 14, because breaking it there must tell you that the correct floor is somewhere 13 or below.  What if the first egg survives: I have the pair (2,1) of trial data?  The answer should be floor 14 or above; that is, the first egg surviving its first trial means that I can ignore the first 13 floors and the ground.  So basically, I’m testing an egg on a building 86 stories high with the previous 14th floor as the “ground”, I am allowed only 13 trials (since I used up one to get past the bottom 14), and I have the data (1,1), which I already showed must mean the “ground” floor: that is, floor 14 of the actual building.  The remaining pairs (2, n) with n up to 12 (so the total number of trials is still 14) must, in the same way as before, test floors 14 through 26. Evidently, we have now broken the first egg on floor 27.

So the form of the solution actually determines a unique algorithm: test the first egg on floors 14, 14 + 13, 14 + 13 + 12, …, and when it fails, test the second egg up from the last floor where you succeeded with the first.  And since everyone knows that $14 + 13 + 12 + ... + 1 = \binom{15}{2}$, this reproduces the numbers I had before, only now with a method as well as madness.

I think it’s interesting to compare the two methods, since they are very similar and presented side-by-side, you might find it unintuitive that the one which involves decreasing the number of tests actually works out better.  I think of both of them as being a scheme for constructing “decimal representations” of 100 (of course, the decimal version of a number is a code for it: for example, literally the number 5 is the set with elements |||||, but we have a symbol 5 for it.  The number 10 is ||||||||||, but we don’t have a separate symbol: we use positioning to give the existing symbols 1 0 more significance).  The first method actually is constructing the decimal representation: first you find out how many 10’s go into your answer, and then you find out how many 1’s go after.  The reason 10 is significant is simply that you are given two “digits” to work with (the trial numbers of the two eggs), and with two digits in base n, you can represent any number up to (but not including) $n^2$.  Here, you want $n^2 = 100$, so n = 10.

(By the way, once you have expressed the floor number in base 10, the number of trials is the sum of its digits.)

Here’s why I think this is “morally” incorrect: as you could see from the technique described there, this testing scheme has a bit of a continuous character to it.  For example, it turns out that bases 9 and 11 also work and give the answer 19, because the discrete error as compared to the continuous ideal allows them to compete with what should be the unique best answer.  In other words, this method is “discretely wrong”, even if “continuously right”.  This seems to me to be related to the fact that one can write a decimal representation of any real number (using, ah, decimals); that is, the method is not optimal for the domain of the problem, which is the integers.

The second method is also constructing a kind of decimal representation: a discrete one.  Normally, to write a number in base n, you express it as a sum

$\displaystyle a_0 n^0 + a_1 n^1 + a_2 n^2 + \dots$

but we are trying to do something more like

$\displaystyle \underline{a}_0 \binom{n}{0} + \underline{a}_1 \binom{n}{1} + \underline{a}_2 \binom{n}{2} + \dots$

The notation is a little ad-hoc, but I mean: $\underline{3}(14) = 14 + 13 + 12$, as compared of course to $3(14) = 14 + 14 + 14$; this is like the “falling factorial” $14^{\underline{3}} = 14 \cdot 13 \cdot 12$, which is really a “falling exponent”, compared to $14^3 = 14 \cdot 14 \cdot 14$.  If you look back at the algorithm I described, you’ll realize that the conversion from a pair $(a_1 + 1, a_0 + 1)$ of trial data to a floor number is exactly $a_0 + \underline{a}_1(14)$.  This is very much apparently (to me) a “discrete base 14” representation of that floor number.

That’s a lot of talking about a silly problem, but it turns out that even bounded within an eggshell we can count ourselves the masters of infinite space, and unlike with nuts, there aren’t even any bad dreams.