In this post, I use the two envelope paradox as a tool to explore high level properties of human probabilistic inference.

I was getting drinks with some friends of mine in the AI department the other day when the infamous two-envelope paradox came up. Most people were stumped by it, and I got to thinking about how the inherent difficulty of the problem for humans presents a good opportunity to analyze properties of human probabilistic inference, much as optical illusions allow us to investigate properties of human vision [1]. But before we explore what the problem tells us about human probabilistic inference, let's see the problem itself:

You're in middle school and your neighbor asks you to come by and help him mow his lawn. When you show up, he's not there, and you realize you'd forgotten to negotiate a price up front. Nonetheless, you decide to get started before he comes back rather than sit idly, and he arrives just about the time you finish. When you ask for payment, he decides to get clever. He sets out two indistinguishable white envelopes on a table, and tells you that you can pick one and take the cash in it. Having already done the work, you are in no position to negotiate, and walk towards the table to pick up one of the envelopes. But as you start walking to the table, your neighbor says, "I will tell you that one envelope has twice the amount of cash as the other. After you open the first one, you can switch if you want". You open the envelope on the left and find $10. But now you start eyeing the other envelope and imagine all the things you could do if it contained $20. But of course, you would be quite disappointed to leave with only $5. Assuming that you want to maximize your expected reward, the question is, should you switch?

*Argument a*: The argument for switching says that you had a 50% chance of picking up an
envelope with the greater sum of money, and a 50% chance of picking up the
lesser envelope. Thus, there is a 50% chance you will get $20 by switching,
and a 50% chance you'll get $5. That means the expected value if you switch is
$12.50. If you stay with your current envelope, you'll leave with a guaranteed
$10 - so switching is a no-brainer, right?

*Argument b*: But something goes wrong if we assume
that argument (a) is true. Imagine that instead, you had picked up
the letter on the right first, and opened it to find

So which is it. Does switching increase the expected value of your payday over sticking with your guns?

You'll appreciate what follows more if you try to answer this question yourself. I didn't figure out what was going on with this paradox for more than a month after I first heard it in undergrad, so don't worry if you're less than 100% confident in your answer.

The above is actually a trick question. While "not switching" is in some sense a better answer, because it at least avoids the flawed argument for switching, the real answer is that the problem itself is not well posed - there is simply not enough information to make a choice.

To see why, let's add some information and see how the problem would change. Suppose you had prior knowledge that one envelope contained $5, and the other $10. After opening the left envelope (with $10 in it), you would know the other envelope contains $5 and choose not to switch. With this simple prior, the problem becomes easy. But there are many other prior distributions one could imagine as well. Prior knowledge could have told you that the lower envelope could has either $10 or $20 with equal probability, and the higher envelope $20 or $40. In this case, conditioned on observing that the first envelope opened had $20, you would switch (why?). If it had $40, you wouldn't switch.

Given either of these specific priors, we can see that the
decision to switch is dependent on the observed value. When we opened the left
envelope and found

Thus, given any prior distribution *posterior* probability,

So the biggest problem with argument (a) appears to be that it uses the prior probabilities of having chosen
the lesser or the greater envelope, rather than the posterior probabilities. But how is it that
the brain, which is supposed to be so smart, can make such a simple mistake? The answer is that
the brain *does not* make this mistake, but instead gets tripped up on something more subtle -
it uses an common inference technique for dealing with unspecified priors, which happens to go
horribly wrong for this adversarially designed problem.

To keep things simple, we'll assume our probability distributions are discrete, but a slight modification on the following
analysis would also work for continuous distributions. From our uniform prior assumption, and the fact that both distributions
can take the same cardinality of distinct values, we find that all observable values are equally
likely in either the greater or lower distributions, i.e.

So the brain fails because it applies the implicit uniform prior jointly over two related distributions at the same time. Under this model, it does the usual Bayesian inference, but the nonsense result occurs because the earlier technique is inappropriate in this probabilistic illusion - this scenario is specifically constructed to play games with the techniques for dealing with missing information that normally work quite well.

Second, we can see that in spite of our base tendency to incorporate intuition from context into problems, we can actually take natural language instructions to formulate more specific notions of what a prior should be. When posed with the above problem, almost no one makes the argument to switch or stay based on the nuances of being a middle schooler mowing a lawn. We actually parse the instructions as telling us that there is a particular type of prior, and try to use that to solve the problem. Parsing priors that others specify through natural language is clearly a valuable skill, and we don't have anything quite like this yet in artificial intelligence (although, see similar work that has explore related issues). Other well known inference illusions in psychology have even exploited the same idea to create problems that trick the brain into falsely forming uniform priors over hypotheses that actually have logical constraints implied by their descriptions.

Finally, the fact that the brain can even be tricked here at all implies that it is not using traditional Bayesian statistics in the ordinary computer science sense of instantiating a prior, computing a posterior (even if approximately), and then producing an answer. The existence of more abstract or symbolic shortcuts to proper Bayesian inference means that many problems might be more efficiently approached by more direct discriminative methods. Just because Bayesian inference is correct doesn't mean it's computationally efficient enough to be useful. The most appropriate representation of data and computation for probabilistic inference in AI systems thus remains a largely open question. But the two-envelope paradox illuminates at least some of the challenges that we will face in constructing machines with the ability to solve higher level inference tasks.

[2] Computing the distribution over the greater envelope given a distribution over the lesser envelope has a little nuance. Given a discrete distribution over the lesser envelope, we can simply say for every value

[3] Although, surprisingly, there do exist proper distributions that motivate always switching. One can prove though that all such distributions have an infinite expected value.