Stochastic game explained

In game theory, a stochastic game (or Markov game), introduced by Lloyd Shapley in the early 1950s,^[1] is a repeated game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

Stochastic games generalize Markov decision processes to multiple interacting decision makers, as well as strategic-form games to dynamic situations in which the environment changes in response to the players’ choices.^[2]

Two-player games

Stochastic two-player games on directed graphs are widely used for modeling and analysis of discrete systems operating in an unknown (adversarial) environment. Possible configurations of a system and its environment are represented as vertices, and the transitions correspond to actions of the system, its environment, or "nature". A run of the system then corresponds to an infinite path in the graph. Thus, a system and its environment can be seen as two players with antagonistic objectives, where one player (the system) aims at maximizing the probability of "good" runs, while the other player (the environment) aims at the opposite.

In many cases, there exists an equilibrium value of this probability, but optimal strategies for both players may not exist.

We introduce basic concepts and algorithmic questions studied in this area, and we mention some long-standing open problems. Then, we mention selected recent results.

Theory

The ingredients of a stochastic game are: a finite set of players

; a state space

(either a finite set or a measurable space

(S,{lS})

); for each player

i\inI

, an action set

Aⁱ

(either a finite set or a measurable space

(A^i,{lA}ⁱ⁾

); a transition probability

from

S x A

, where

A= x _i\inAⁱ

is the action profiles, to

, where

P(S\mids,a)

is the probability that the next state is in

given the current state

and the current action profile

; and a payoff function

from

S x A

R^I

, where the

-th coordinate of

gⁱ

, is the payoff to player

as a function of the state

and the action profile

The game starts at some initial state

s₁

. At stage

, players first observe

s_t

, then simultaneously choose actions

	i
a
	t\in

Aⁱ

, then observe the action profile

	i
a
	t)

, and then nature selects

s_t+1

according to the probability

P( ⋅ \mids_t,a_t)

. A play of the stochastic game,

s_1,a_1,\ldots,s_t,a_t,\ldots

,defines a stream of payoffs

g_1,g_2,\ldots

, where

g_t=g(s_t,a_t)

The discounted game

\Gamma_λ

with discount factor

(

0<λ\leq1

) is the game where the payoff to player

	infty
\sum
	t=1

(1-λ)^t-1

	i
g
	t

. The

-stage gameis the game where the payoff to player

\bar{g}

n:=	1n\sum
	_t=1

	i

	t

The value

v_n(s₁₎

, respectively

v_λ(s₁₎

, of a two-person zero-sum stochastic game

\Gamma_n

, respectively

\Gamma_λ

, with finitely many states and actions exists, and Truman Bewley and Elon Kohlberg (1976) proved that

v_n(s₁₎

converges to a limit as

goes to infinity and that

v_λ(s₁₎

converges to the same limit as

goes to

The "undiscounted" game

\Gamma_infty

is the game where the payoff to player

is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum

\Gamma_infty

and in defining equilibrium payoffs of a non-zero-sum

\Gamma_infty

. The uniform value

v_infty

of a two-person zero-sum stochastic game

\Gamma_infty

exists if for every

\varepsilon>0

there is a positive integer

and a strategy pair

\sigma_\varepsilon

of player 1 and

\tau_\varepsilon

of player 2 such that for every

\sigma

and

\tau

and every

n\geqN

the expectation of

	i
\bar{g}
	n

with respect to the probability on plays defined by

\sigma_\varepsilon

and

\tau

is at least

v_infty-\varepsilon

, and the expectation of

	i
\bar{g}
	n

with respect to the probability on plays defined by

\sigma

and

\tau_\varepsilon

is at most

v_infty+\varepsilon

. Jean-François Mertens and Abraham Neyman (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value.^[3]

If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a Nash equilibrium. The same is true for a game with infinitely many stages if the total payoff is the discounted sum.

The non-zero-sum stochastic game

\Gamma_infty

has a uniform equilibrium payoff

v_infty

if for every

\varepsilon>0

there is a positive integer

and a strategy profile

\sigma

such that for every unilateral deviation by a player

, i.e., a strategy profile

\tau

with

\sigma^j=\tau^j

for all

j ≠ i

, and every

n\geqN

the expectation of

	i
\bar{g}
	n

with respect to the probability on plays defined by

\sigma

is at least

	i
v
	infty

-\varepsilon

, and the expectation of

	i
\bar{g}
	n

with respect to the probability on plays defined by

\tau

is at most

	i
v
	infty

+\varepsilon

. Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff.^[4] The non-zero-sum stochastic game

\Gamma_infty

has a limiting-average equilibrium payoff

v_infty

if for every

\varepsilon>0

there is a strategy profile

\sigma

such that for every unilateral deviation by a player

, the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by

\sigma

is at least

	i
v
	infty

-\varepsilon

, and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by

\tau

is at most

	i
v
	infty

+\varepsilon

. Jean-François Mertens and Abraham Neyman (1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value, and Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a limiting-average equilibrium payoff. In particular, these results imply that these games have a value and an approximate equilibrium payoff, called the liminf-average (respectively, the limsup-average) equilibrium payoff, when the total payoff is the limit inferior (or the limit superior) of the averages of the stage payoffs.

Whether every stochastic game with finitely many players, states, and actions, has a uniform equilibrium payoff, or a limiting-average equilibrium payoff, or even a liminf-average equilibrium payoff, is a challenging open question.

A Markov perfect equilibrium is a refinement of the concept of sub-game perfect Nash equilibrium to stochastic games.

Stochastic games have been combined with Bayesian games to model uncertainty over player strategies.^[5] The resulting stochastic Bayesian game model is solved via a recursive combination of the Bayesian Nash equilibrium equation and the Bellman optimality equation.

Applications

Stochastic games have applications in economics, evolutionary biology and computer networks.^[6] ^[7] They are generalizations of repeated games which correspond to the special case where there is only one state.

External links

Lecture on Stochastic Two-Player Games by Antonin Kucera

Notes and References

L. S. . Shapley . Stochastic games . . 39 . 10 . 1095–1100 . 1953 . 10.1073/pnas.39.10.1095. 1953PNAS...39.1095S . 16589380 . 1063912 . free .
Eilon . Solan . Nicolas . Vieille . Stochastic Games . PNAS . 2015 . 112 . 45 . 13743–13746 . 10.1073/pnas.1513508112 . 26556883 . 4653174 . free .
J. F. . Mertens . amp . A. . Neyman . Stochastic Games . International Journal of Game Theory . 10 . 2 . 53–66 . 1981 . 10.1007/BF01769259 . 189830419 .
Book: Vieille, N. . Stochastic games: Recent results . Handbook of Game Theory . 1833–1850 . Amsterdam . Elsevier Science . 2002 . 0-444-88098-4 .
Stefano . Albrecht . Jacob . Crandall . Subramanian . Ramamoorthy . Belief and Truth in Hypothesised Behaviours . . 235 . 63–94 . 2016 . 10.1016/j.artint.2016.02.004 . 1507.07688 . 2599762 .
http://www-net.cs.umass.edu/~sadoc/mdp/main.pdf Constrained Stochastic Games in Wireless Networks
Djehiche. Boualem. Tcheukam. Alain. Tembine. Hamidou. 2017-09-27. Mean-Field-Type Games in Engineering. AIMS Electronics and Electrical Engineering. 1. 18–73. en. 10.3934/ElectrEng.2017.1.18. 1605.03281. 16055840.

Stochastic game explained

Two-player games

Theory

Applications

See also

Further reading

External links

Notes and References