Ridere, ludere, hoc est vivere.

Monday, October 15, 2018

Notes on games with mixed strategies


In an earlier post regarding theory of simultaneous move games, I concluded with an example of a game between two tennis players that did not demonstrate a Nash equilibrium between its two pure strategies. Sam Hillier: Consulting Philosopher more recently elaborated on the topic with an excellent post on mixed strategies. Whereas I had approached the question of an equilibrium for a single tennis shot and concluded that none existed, a tennis match of course includes many shots, so players have an opportunity to invoke a weighted mix of shots and defenses between the two options.
To recap, the two-player zero-sum simultaneous-move game represents a tennis volley in which one player - the defender - decides whether to defend a shot down the line (D) or cross court (C) at the same time the opponent - the shooter - decides which type of shot to attempt. The payoff is depicted as the probability of the shot scoring a point in a 2x2 matrix cross-referencing each player's decision with the other's. The shooter seeks to maximize that result; the defender wants to minimize it.

Sam stipulated that in a game of mixed strategies between C and D, “the server will shoot C 30% of the time … while the defender will defend against C 40% of the time,” but didn't explain where those mixed strategy results came from. There’s a little arithmetic in figuring that out that arises from approaching the problem by trying to “minimize the maximum loss.” 

Suppose p represents the frequency with which the shooter will select C. The shooter wants to maximize the worse case score regardless of the defender’s choice. If the defender chooses C, then the shooter’s expected value of the result will be

EC = 0.2p + 0.8(1-p)

If the defender chooses D, the shooter’s expected result is

ED = 0.9p + 0.5(1-p)

To avoid the defender exploiting one better result over another, the shooter will find the frequency p that gives the same result regardless of the chosen defense by setting the expected values equal to each other:

EC = ED

0.2p + 0.8(1-p) = 0.9p + 0.5(1-p)

p = (0.8 – 0.5) / (0.9 – 0.5 – 0.2 + 0.8)

= 0.3/1.0

= 30%

This result means that if the shooter chooses C 30% of the time, their expected score will be

EC = 0.2(0.3) + 0.8(1-0.3) = 0.62
or
ED = 0.9(0.3) + 0.5(1-0.3) = 0.62

regardless of the defender’s choice. If the shooter chooses C less frequently (i.e. p < 30%), the defender can always choose D and cause a lower shooter score. Similarly, if the shooter chooses C more frequently, the defender can always choose C and again cause a lower shooter score. So, the shooter’s best mixed strategy is to choose C 30% of the time.

Similar analysis – whereby the defender attempts to minimize the maximum possible shooter score – results in the defender choosing C

q = (0.9 – 0.5) / (0.8 – 0.5 – 0.2 + 0.9)

= 0.4/1.0

= 40%
of the time.

Sam also discusses the idea of Evolutionary Game Theory, which interests me from the standpoint that superior strategies can emerge in a population of game players without necessarily invoking explicit analysis. In fact, most games are sufficiently complex that exhaustive analysis is not possible, and the evolutionary strategies that emerge may demonstrate consistent success without a complete understanding of why they succeed. I expect Dr. Wictz and I will discuss that topic at some point in the future as well.

No comments:

Post a Comment