Description
Background: Cooperative Games (more info)
1.1 Generalities
A game in the sense of game theory is an abstract mathematical model of a scenario in which some sort of agents interact. It
is abstract in the sense that irrelevant detail is omitted: the game aims to capture only those features of the scenario that are
relevant to the decisions that must be made by players within the game.
Agents can be anything depending on the application: typically real humans (e.g. investors, political parties, etc.), but also
signals/towers in a communication network, genes in a biological setup or even variables in a predictive/learning problem.
The form of games we are interested in is the most basic and widely-studied model of cooperative games.
More specifically, our games are populated by a (non-empty) set P = {1, . . . , p} of agents: the players of the game. A
coalition C is simply any subset of the player-set P. The grand coalition is the set P of all players.
All this said, let’s define what we mean by a characteristic function game. In the following, with 2
P we will denote the
power-set, that is, the set of all subsets, of P.
Definition 1. A characteristic function game G is given by a pair (P, ν), where P = {1, . . . , p} is a finite, non-empty set of
agents, and ν : 2P 7→ R is a characteristic function, which maps each coalition C ⊆ P to a real number ν(C).
The number ν(C) is usually referred to as the value of the coalition C.
There are many possible interpretations for the characteristic function, but note right away that characteristic function games
assign values to a coalition as a whole, and not to its individual members. That is, the model behind a characteristic function
game does not tell you how the coalition value ν(C) should be divided among the members of C.
In fact, the question of how to divide the coalition value is a fundamental research topic in cooperative game theory1
.
Notice also that, from a computational point of view, the naïve representation of a characteristic function game that consists
in explicitly listing every coalition C together with the associated value ν(C), being of the order 2
p
in size, is not practical
unless the number of players is very small. On the other hand, most real-life interactions admit an encoding of size polynomial
in p; such an encoding provides an implicit description of the characteristic function.
As an example, consider modeling the decision-making process in voting bodies.
Example 1. (weighted voting games)
• A country has a 101 seats in its parliament, and each representative belongs to one of four parties: Liberal (L 40
seats), Moderate (M 22 seats), Conservative (C 30 seats), or Green (G 9).
• The parliament needs to decide how to allocate “1 billion euros” of discretionary spending.
1An implicit assumption here: the coalition value ν(C) can be divided among the members of C in any way that the members of C choose.
Formally, games with this property are said to be transferable utility games (TU games)
1
• The decision is made by a simple majority vote, and we assume that all representatives vote along the party lines.
• Parties can form coalitions; a coalition has value “1 billion euros” IF it can win the vote no matter what the other
parties do, and value 0 otherwise.
Hence, after some thinking, we see that the associated 4-players game has P = {L, M, C, G} and characteristic function
ν(S) = (
0 if
#S 6 1
or
G ∈ S
109 otherwise
where #S = {cardinality of the coalition S}.
1.2 Solution Concepts: the Shapley Value
A key problem in game theory is to try to understand what the outcomes of a game will be. In our cooperative framework,
an outcome of a game G consists of two parts:
1. a coalition structure, that is, a partition CS = {C1, . . . , Cm} of the player-set P = {1, . . . , p} into coalitions;
2. a payoff vector x = (x1, . . . , xp) ∈ R
p
for a coalition structure CS = {C1, . . . , Cm}, which distributes the value ν(Cj )
of each coalition among its members. Any legit payoff vector x must satisfy the following natural conditions:
• xj > 0 for all j ∈ P;
•
P
r∈Cj
xr 6 ν(Cj ) for any j ∈ {1, . . . , m}
Now, any partition of agents into coalitions and any payoff vector that respects this partition corresponds to a plausible
outcome of a characteristic function game. However, not all outcomes are equally desirable.
For instance, if all agents contribute equally to the value of a coalition, a payoff vector that allocates the entire payoff to just
one of the agents is less appealing than the one that shares the profits equally among all agents.
Similarly, an outcome that push all agents to work together is (typically) preferable to an outcome that some of the agents
want to deviate from.
More broadly, one can evaluate outcomes according to two criteria: fairness (i.e., how well each agent’s payoff reflects his
contribution), and stability (i.e., what are the incentives for the agents to stay in the coalition structure). These criteria give
rise to two families of solution concepts having as most notable members the Shapley Value and the Core respectively.
Given its broad success for defining variable importance in supervised learning problems (see here, here, here, here, and
also here, just to mention a few) in the following we will focus on the former, the Shapley Value, forged in the ’50s by the one
and only, sir Lloyd S. Shapley.
The Shapley value is a solution concept that is usually formulated with respect to the grand coalition: it defines a way of
distributing the value ν(P) that could be obtained by the grand coalition, and it is based on the intuition that the payment
that each agent receives should be proportional to his contribution.
Idea v1.0: a naïve implementation of this idea would be to pay each agent according to how much he increases the value of
the coalition of all other players when he joins it, i.e., set the payoff of player j to ν(P) − ν(P\{j}).
Problem: Under this payoff scheme the total payoff assigned to the agents may differ from the value of the grand coalition.
Idea v2.0: we can fix an ordering of the agents and pay each agent according to how much he contributes to the coalition
formed by his predecessors in this ordering: Agent 1 receives ν({1}), Agent 2 receives ν({1, 2}) − ν({1}), and so on.
It is easy to see that this payoff scheme distributes the value of the grand coalition among the agents.
Problem: agents that play symmetric roles in the game may receive different payoffs depending on their position in the order.
Shapley’s insight: the dependence on the agents ordering can be eliminated by averaging over all possible orderings
(or permutations) of the players.
Now, to formally define the Shapley value, we need some additional notation.
1. Fix a characteristic function game G = (P, ν) and let ΠP be the set of all permutations of P. Given a specific permutation
π ∈ ΠP , we denote by Sπ(j) the set of all predecessors of player j in π, i.e., we set Sπ(j) = {r ∈ P such that π(r) < π(j)}.
For example, if P = {1, 2, 3}, then
ΠP =
(1, 2, 3),(1, 3, 2),(2, 1, 3),(3, 1, 2),(3, 2, 1)
,
moreover, if π = (3, 1, 2), then Sπ(3) = ∅, Sπ(1) = {3} and Sπ(2) = {1, 3}.
2. The marginal contribution of an agent j with respect to a permutation π in a game G = (P, ν) is denoted by ∆G
π
(j)
and measures how much player j increases the value of the coalition consisting of its predecessor in π. More specifically:
∆G
π
(j) = ν
Sπ(j) ∪ {j}
− ν
Sπ(j)
. (1
We can now define the Shapley value ψ
G(j) of player j: it is simply his average marginal contribution, where the average
is taken over all permutations of the player-set (assumed to be all equally likely):
ψ
G(j)
def. = Eπ
∆G
π
(j)
=
1
p!
X
π∈ΠP
∆G
π
(j)
(♥)
=
X
C : j /∈C
(#C)!(p − 1 − #C)!
p!
ν(C ∪ {j}) − ν(C)
. (2)
The second version (♥) can be derived by noting that the marginal contributions ∆G
π
(j) are of the form ν(C ∪ {j}) − ν(C)
where C is a coalition not containing j. Now, for how many orderings does one have Sπ(j) = C? There are (#C)! possible
orderings of C and (p − 1 − #C)! orderings of P\(C ∪ {j}) another nice probabilistic interpretation (see page 307 here).
It can be shown that the Shapley value is in fact the only payoff division scheme that has all these four properties:
• Efficiency:
P
j ψ
G(j) = ν(P).
• Dummy player: if player j is dummy (i.e., ν(C ∪ {j}) = ν(C) for any coalition C), then ψ
G(j) = 0.
• Symmetry: if j1 and j2 are symmetric players (i.e., ν(C ∪ {j1}) = ν(C ∪ {j2}) for any coalition C), then ψ
G(j1) = ψ
G(j2).
• Additivity: given any two characteristic function games G1 = (P, ν1) and G2 = (P, ν2), and their sum G+ = (P, ν1 + ν2),
ψ
G+
(j) = ψ
G1
(j) + ψ
G2
(j) for all j ∈ P.
Example 2. Suppose that Ciccio-Pharma (C) is a small biotech company who discovered a new vaccine but, to manufacture
and market it, needs to team up with a larger partner. Two candidates: Aristo-Medical (A) and BruttiMaBuoni-Inc (B).
If A or B teams up with C, the big firm will split “1 billion” with C. Here is a possible characteristic function
ν(A) = ν(B) = ν(C) = ν(AB) = 0 and ν(AC) = ν(BC) = ν(ABC) = 1.
Since there’re only 3 players, to get Shapley’s payoffs we can make a table indicating the value brought to a coalition by each
player on the way to formation of the gran coalition:
Permutation Player A Player B Player C
ABC 0 0 1
ACB 0 0 1
BAC 0 0 1
BCA 0 0 1
CAB 1 0 0
CBA 0 1 0
total value 1 1 4
Since in the derivation of the Shapley value it is assumed that
the 3! = 6 permutations/arrival sequences are all equally likely,
the average value of each biotech company is simply:
ψ(A) = 1
6
, ψ(B) = 1
6
, ψ(C) = 4
6 =
2
3
.
So Ciccio-Pharma, the drug discoverer, will get two-thirds of
the billion, and the big companies split the remaining third.
Example 3. (Shapley in Simple Games)
A game G = (P, ν) is said to be simple if it is monotone (i.e., ν(C1) 6 ν(C2) if C1 ⊆ C2) and its characteristic function only
takes values 0 and 1. The game in Example 1 is clearly simple as soon as we rescale the payoffs so that ν(P) = 1.
In simple games, the Shapley (a.k.a. Shapley–Shubik power index in this context) have a particularly attractive interpretation:
it measures the power of a player, i.e., the probability that she can influence the outcome of the game.
Indeed, the Shapley value of a player j in a simple game G = (P, ν) with #P = p can be rewritten as follows:
ψ
G(j)={proportion of permutations where j is pivotal}=
#
π ∈ ΠP such that ν
Sπ(j)
= 0 and ν
Sπ(j) ∪ {j}
= 1
p!
(3)
In other words, if agents join the coalition in a random order, ψ
G(j) is exactly the probability that player j turns a
losing coalition into a winning one.
Example 4. (Shapley for Variable Importance)
Suppose we are dealing with a generic (supervised) learning problem where we need to predict a response Y based on a bunch
of covariates X = (X1, . . . , Xp). The goal here is to measure the importance of Xj in this task.
We can cast this problem into a suitable characteristic function game having the p covariates as players P = {1, . . . , p}, and
some measure of fit as characteristic function ν. To be more specific, for any coalition (of covariates) C, let XC = (Xj : j ∈ C)
and Yb(C) = E(Y | XC), the ideal optimal predictor (under a squared risk) based on the covariates in XC. Now, if we take
ν(C) = −E
Y − Yb(C)
2
then ψ(j) = 1
p!
X
π
E
h
Yb
Sπ(j)
− Yb
Sπ(j) ∪ {j}
i2
population quantity to be estimated! (4)
This is just an averaged version (over all possible submodels) of LOCO, another, recently introduced var-importance measure.
It is instructive to try to rephrase in this context the previous 4 properties that characterize the Shapley value.
Remark: this is all nice and good but, of course, Shapley for variable importance is not perfect. For example, it defines
variable importance with respect to all submodels, but most of those submodels are not of interest and, in addition, it is
strongly influenced by the correlation between covariat
2. The Exercise: Let’s get together and feel all right. . .
Your job
1. Introductory
To check your understanding of Shapley, let’s start easy by playing around with a tiny game having only 3 players. So,
imagine (!) there’re three students curiously named: Antwohnette (A), BadellPadel (B) and Chumbawamba (C). Exactly
one of them needs to be working (not necessarily all day long) on this HW to complete it. Here’s their working-hours:
A Antwohnette 14:00-17:00
B BadellPadel 11:00-16:00
C Chumbawamba 9:00-13:00
A coalition C is an agreement by one or more students as to the
times they will be really working2
. Its values will be given by:
ν(C)={# of hours potentially saved by well organized coalition}
Clearly ν(A)=ν(B)=ν(C)= 0 and ν(ABC)= 4. Complete the definition of ν and find Shapley (see Example 2).
2. Probabilistic
Now let’s get serious. Imagine you are in a real, larger team with 12 people having the following working schedule:
Now, I might ask you to find the characteristic function as before. . . but here you’re dealing with 2
12 = 4096 coalitions
I built it for you3
(Bonus: write an R function, the shorter the better, to get it for a general game of this type).
I might also ask you to calculate the Shapley payoffs for each team member. . . but now there’re 12! = 479, 001, 600
orderings to consider, not exactly trivial as in Example 2. . .
Nevertheless, look again at Equation 2: the Shapley is “just” an expectation (w.r.t. a very specific distribution), so:
• Review our notes on MonteCarlo Approximation and Error Control.
• Select 3 players of the team.
• Fix a suitable4
(, α) pair of tolerance and confidence, and design an Hoeffding-based simulation to approximate
the Shapley payoffs of those 3 players (Hint: if you look closely, all you need is essentially sample())
• Produce separate Hoeffding-based confidence intervals for these payoffs and briefly comment all your results.
3. Statistical
Imagine you are a financial investor currently dealing with a portfolio of p stocks whose returns (see notes) are
modeled by a set of random variables {X1, . . . , Xp}. In this setup, it may be important to allocate to each asset of the
portfolio its contribution to the total utility defined as Uω
Pp
j=1 Xj
, where the utility function Uω(·) for us will be
a linear combination of the portfolio average return and volatility (Markowitz ’ style!, see here), that is
Uω(X) = E(X) − ω · Var(X) for some weight ω > 0.
In a recent paper, the Authors tackle the problem from a cooperative game theory perspective by defining a suitable
variance game over P = {1, . . . , p}, whose Shapley value turns out to be very simple, intuitive (and familiar).
More specifically, let C be any coalition of the p assets and RC =
P
j∈C Xj their return. Now, for any C ⊂ P, define
ν(C) = Uθ(RC) = E(RC) − ω · Var(RC) = νE(C) − ω · νV(C).
Hence, the game G = (P, ν) is a linear combination of two other games, GE = (P, νE) and GV = (P, νV) =⇒ by the
additivity of Shapley, ψ
G(j)=ψ
GE (j) − ω · ψ
GV (j). By the linearity of expectation, the first term is trivially equal to
ψ
GE (j)=E(Xj ), whereas the second one, after elementary manipulations, turns out to be equal to ψ
GV (j) = Cov(Xj , RP ).
2This is an example of cooperative game theory applied to a resource allocation problem.
3The characteristic function is stored as a list char_fun with p = 12 slots so that, for example, char_fun[[2]] contains the
12
2
= 66 values
associated to coalitions formed by 2 players.
4First and foremost in terms of computational time!
In conclusion, once we pick a suitable weight ω > 0, the Shapley allocation to each stock j ∈ {1, . . . , p} is given by
ψ(j)
G = E(Xj ) − ω · Cov(Xj , RP ) = E(Xj ) − ω ·
Xp
r=1
Cov(Xj , Xr) a population quantity to be estimated! (5)
In other words, to learn the Shapley in this financial game, we need to study the marginal correlation graph among
some standard measure of stock relative performance (see Appendix (B) in the notes). To this end, we may collect the
daily closing prices for p stocks5
, selected within those consistently in the S&P500 index.
The stocks are categorized into 10 Global Industry Classification Standard (gics) sectors, including Consumer
Discretionary, Energy, Financials, Materials, Health Care, Utilities, Industrials, Information Technology,
Consumer Staples, and Telecommunications Services. It is expected that stocks from the same gics sectors should
tend to be clustered together, since they (allegedly) tend to interact more with each other6
.
So, although cherry picking stocks to optimize our portfolio is not our goal here, ideally we want to collect few stocks
from different gics to boost the overall value/utility of the portfolio7
. Here’s the details:
• Select a sensibly sized portfolio of p stocks and take data from the “Covid-Age”: January 1, 2020 till today
(see notes). More specifically, if ct,j is the closing price of stock j on day t, let xt,j = log(ct,j/ct−1,j ).
• Based on the data matrix X =
xt,j
t,j having time on the rows and stocks on the columns, we want to estimate
the Pearson-based correlation graph over stocks (i.e. each node in the graph is a stock), by simply treating the
instances {xt,j}t as independent replicates even though they are not (they form a time series).
In particular we place an (undirected) edge between stock j1 and stock j2 only if an asymptotic Normal-based
confidence interval for their Pearson-correlation does not include zero (you choose the level α of the many ci’s).
• Artistically (?) visualize the resulting graph (see here, and here), and draw some conclusion based on the decisions
made along the way (i.e. the number and sector of the stocks, the level α, the interval type, etc.).
• Finally, always based on the data matrix X, provide and possibly visualize bootstrapped confidence intervals for the
Shapley values of the p stocks in your portfolio. As before, you choose the weight(s) ω, the level α, and the interval
type. Comment the results based on the choice you made.
Remark: as done before, also here make the simplifying assumption that, for each stock, we are dealing with
independent replicates.
5The number of stocks p will affect the computational time and the statistical performance, hence, as usual, choose it wisely!
6
I wrote “allegedly” since this is a hypothesis we should verify empirically instead of taking it for granted.
7Diversification in investing is key, somebody once said.
Hint: keep this in mind when talking about esemble tecniques like random forest to tackle predictive problems.
5

