Ridere, ludere, hoc est vivere.

Monday, September 24, 2018

Notes on simultaneous-move games, and an exploration of the Stag Hunt

Some time ago, the design team of Dr. Wictz and I started discussing the book Games of Strategy by Dixit, Skeath, and Riley.  I wrote on a couple of topics:
In this post, I'd like to address simultaneous-move games with a specific focus on pure discrete strategies. (We recorded our discussion on this topic in April of last year.)  I recall such games represented in my earliest readings on game theory in the form of a decision-payoff matrix.  In a two-player game in which each player makes a single decision from among a finite number of choices, without knowledge of the other player's decision, the decision-payoff matrix labels the rows with one player's options and the columns with the other player's options.  The corresponding cell for a given combination of decisions yields the payoff to both players.

These are the purest simultaneous-move discrete-strategy games - each player faces a single decision based on known payoff results of all possible decisions but without knowing the decision of the other player beforehand. A payoff matrix cross-indexing each player's decision options to yield their payoffs can represent the game in an abstract, analyzable way. For example, suppose the two players are hunters, and each has the option to cooperate to pursue a stag, or to go off independently to hunt a hare. This Stag Hunt game is discussed nicely by Sam Hillier on his Consulting Philosopher blog, which inspired me to explore it a little more closely in the context of our game theory book study. In the matrix below, I represent the decision options of each player as 'S' for "Stag" or 'H' for "Hare." Let's assume that if both players hunt for the stag, each is rewarded with enough meat for three days, but if one hunts for the stag alone, he receives nothing. A player who hunts for a hare is rewarded with enough meat for himself alone for one day.

Stag Hunt payoff matrix
In each cell of the payoff matrix, the upper right value is Player B's payoff, and the lower left is Player A's. (I've color-coded them for clarity.) Both players are better off if they decide to hunt the stag, but if one fears that the other will hunt for a hare, he is better off abandoning the stag and hunting for a hare himself. Thus the 'S' strategy is not stable, and in the absence of any communication, the rational outcome is that the two players hunt for hares rather than risk going hungry.

Prisoners' Dilemma payoff matrix
People are generally more familiar with the Prisoners' Dilemma, which is similar - but not the same. Two prisoners kept in isolation each face charges of conspiracy to murder (carrying a sentence of 25 years) and kidnapping (3 years). The police have sufficient evidence to convict both of kidnapping but need testimony from one to convict the other of murder. The police offer each prisoner to drop the murder charge and reduce the kidnapping sentence to one year to testify against the other - but if both prisoners testify, the police will only reduce the charge to ten years for each. Here sentences are represented by negative numbers, and each player seeks to minimize their sentence.

In the interest of comparison, let's normalize the Prisoners' Dilemma to payoffs comparable to the Stag Hunt. I'll map values as follows:
  • -25 => 0
  • -10 => 1
  • -3 => 2
  • -1 => 3
Now let's look at the normalized Prisoners' Dilemma in comparison to the Stag Hunt:
Normalized comparison of Stag Hunt and Prisoners' Dilemma
The important difference here is that each player in the Stag Hunt is motivated to cooperate in hunting for the stag if they believe their counterpart will do the same, whereas in the Prisoners' Dilemma, each player is better off defecting and testifying against their counterpart even if they believe their counterpart would cooperate in an agreement not to do so. Ironically, both players' payoffs would be better if they both cooperated, i.e. if neither testified against the other, but the payoff matrix renders that equilibrium point unstable.

In a later post I hope to explore the significance of communication, or signaling, among players in a game. Clearly players are motivated to communicate and foster trust in the Stag Hunt, whereas the Prisoners' Dilemma fosters suspicion and is ripe for betrayal.

Stag Hunt in a winner-take-all format
There was some discussion of the Stag Hunt on the State of Games podcast as well, but they described the game in a winner-take-all format: The payoff for catching the stag went only to the player that delivered the killing blow, in some reward or recognition from the reigning noble. Now the payoff matrix must be tempered by each player's evaluation of their probability of winning the contest. If Player A evaluates their probability of killing the stag at less than 33%, then the expected value of their payoff in participating in the stag hunt falls to less than 1, and their decision matrix looks more like a Prisoners' Dilemma. Player A is better off hunting for a hare, and Player B is left to realize the same result, since he can't catch the stag alone.

In general, when a given strategy is better for a player regardless of what action the opponent takes, that strategy is said to be dominant. Any strategy that is objectively worse than some other strategy, regardless of the opponent's action, is a dominated strategy. In the case of the winner-take-all Stag Hunt where one player's expected reward for the stag hunt is less than the hare, the hare (for that player) becomes the dominant strategy. The opponent's best strategy under that dominant strategy - to give up the stag and hunt for a hare as well - defines the Nash equilibrium for the game - the outcome that neither player can improve for themselves unilaterally.

More complex simultaneous move games (those with more than two choices for each player) can be simplified by the elimination of dominated strategies from consideration. Best-response analysis consists of each player identifying their best response to each of the opponent's strategies. If any strategy combination between players is a best response for both, that combination is a Nash equilibrium (even if neither strategy is dominant).

Some time ago, I tried a game theory approach to a three-player game, which involved multiple decision matrices among the players. The same general principles apply - best-response analysis identifies strategy combinations that result in a Nash equilibrium,  as I found in my analysis of the three-player case.

Coffee Shop 'C' Preferred By Both
Some games can have multiple equilibria. "Will Harry Meet Sally" supposes that two people agree to meet at a coffee shop, but they were not specific as to which of two favorite coffee shops to meet. If they are unable to communicate, each will have to decide where to show up and hope that the other is there. This situation is called a pure coordination game, and if there is a mutually agreeable reason to prefer one coffee shop over the other - and both players know that the other knows it is preferred - that preference or criterion is a focal point that facilitates the coordination even in the absence of communication between the players.

I happened to notice that this payoff matrix looks like that of the Prisoners' Dilemma (or even the Stag Hunt) except that neither player is rewarded for defecting unilaterally. Both are purely motivated to coordinate their decisions.

Sally likes 'C' better; Harry prefers 'D'
A variation on "Will Harry Meet Sally" is called the "Battle of the Sexes," where each prefers a different coffee shop but both prefer to be together than to be alone at either. I wrote about a sequential version of "Battle of the Sexes"; this simultaneous version involves the two parties finding a focal point so that they can independently arrive at the same coffee shop.

"Chicken" is a different kind of coordination game in which players drive cars straight toward each other with the payoff going to the player who stays the course if the other swerves to avoid the collision. The penalty for both staying the course, however, is significant. Coordination games like these typically motivate communication, or signaling, to influence an opponent's decision. I expect to discuss signaling and screening in later posts.

No equilibrium
Finally, a game can have no equilibrium at all, such as a tennis volley in which one player has to decide whether to defend a shot down the line or cross court at the same time the opponent is deciding which type of shot to attempt. If the payoff is depicted as the probability of the shot scoring a point (in which case the defender prefers as low a result as possible), there will be no combination of strategies whose result can not be improved upon by one player or the other. Best-response analysis will show that there is no strategy combination that is selected by both players.

Even in the simplest simultaneous-move games, the configuration of the payoff matrix can drive very different player behavior. It will be interesting to uncover game design ramifications resulting from studying this theory, as Sam Hillier has done.

No comments:

Post a Comment