Keith Burgun believes that the optimal win-rate for a game is 50%. This causes no end of debate every time he mentions it which annoys me both because I mostly agree with him and because it’s the same conversation every time. Here I want to give my opinion and by opinion I mean a very strict proof of a very specific thing that will hopefully inform, possibly entertain and likely wont stop everyone arguing. First, let me give my more formal hypothesis,
For any game that ends with a binary win or loss with probabilities p and 1-p respectively, expected information gain for the player on the quality of their play is at a maximum when p=0.5.
Here is an outline of how we will prove this. First we must give a formal definition of ‘information’. We will then use Shannon entropy to derive a formula for the average information gain of a random variable. We will then find the maximum of this function using differentiation.
Lets start with the particular example we are interested. Imagine we have a player about to start a game. Say we also have records of a large number of their past games and we know that they have won 80% of them. Before we know what they are going to do during the course of the game is it reasonable to believe that we could assign a 80% probability to the event that they win? Given that win-rate, if I told you they won the game, how surprised would you be? How about if they lost the game, would you be more or less surprised? If they had a 100% chance of winning how surprised would you be when they won? This idea of ‘amount of surprise’ is synonymous with the idea of information content. For an event with probability p the information content I(p) has the following properties,
- I(p) is anti monotonic, as p increases I(p) decreases and as p decreases I(p) increases.
- I(0) is undefined.
- I(p) is always greater than or equal to 0.
- I(1) = 0, events that are guaranteed to occur communicate no information.
- I(p1 p2) = I(p1) + I(p2), information due to independent events is additive.
It turns out that the following function has all these properties,
I(p) = log(1/p) = -log(p)
We will use this function in the next section as our measurement of the information content of an event with probability p. The base of the logarithm must be some fixed number greater than 1 but other than that it doesn’t actually matter as it is possible to convert between the bases of the logarithm. For the rest of this article you can assume if not stated that the base of our logarithms will always be 2 to keep the working simple and consistent.
Now that we can calculate the information content of an event, how can we calculate something similar for a random variable? If we want to find the average information content of a random variable, we will need to find the information content of each even and then weight it by the probability that event occurs. This average information content is known as Shannon entropy and is calculated as follows,
When a random variable X has only two possible outcomes with probabilities p and q, we can redefine q as q = 1-p. This allows us to rewrite the formula for the entropy of a random variable with only two outcomes using just the probability of one event as follows,
Proof of Maximum
Now that we have a function of one variable for the information content of a random variable of only two possible outcomes we want to know what probability gives us the highest average information content. It is possible to find the maximum of a function using differentiation. Differentiating a function f(x) gives us another function f'(x) which describes the gradient of the original function. As an example, for f(x)=2x, f'(x)=2, because the gradient of f(x) is a constant upwards slope. For any function f(p) with derivative f'(p) over a closed interval [a,b], the maximum value of f(p) must be at either,
- f'(p) is undefined.
If the gradient of a function at a point x is non zero the function is either still increasing of decreasing. This means if the gradient is positive it we could increase x slightly and get a greater value and if it were negative we could decrease x slightly and get a greater value. Because of this we know the maximum value must be at a point where either the gradient is 0 or undefined, or at the edge of our closed interval as that would stop us from increasing or decreasing x slightly.
Now the proof that the maximum entropy is at 0.5:
- Differentiate f(p) to find f'(p).
- Let f'(p)=0 and solve for p.
- Show that f(0.5) is the maximum
From this we see that the maximum entropy is achieved when the probability of both outcomes is equal to 0.5. So to bring this back to the point, the win-rate that communicates the most information on average to the player is a 50% win-rate. Extensions to this proof that I may attempt to prove in the future are that you can increase information transmission by adding more possible outcomes than just win/loss and that the closer to even probabilities the outcomes are the greater the information transmission is. This would suggest that a game with a fine tuned exactly 50% win-rate would be less efficient at transmitting information than a game with 5 possible ending outcomes, each only roughly 20% likely.