Random Coefficient Logit Models (BLP)
Zijing Hu
November 9, 2022
Contents
1 Introduction 1
2 Estimation 3
2.1 Nested Fixed Point Algorithm (NFP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Mathematical Program with Equilibrium Constraints (MPEC) . . . . . . . . . . . . . 4
2.3 Approximate BLP (ABLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Tricks for Numerical Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Identification 5
4 Elasticities and Marginal Costs 6
5 Welfare Analysis 6
1 Introduction
*This section is based mainly on Nevo (2000) and Vincent (2015).
Let u
ijt
denote the indirect utility that individual i receives from the consumption of product j in
market t. This is assumed to be a linear function of a K ×1 vector of product characteristics x
jt
, price
p
jt
, an unobserved (to the econometrician) component ξ
jt
, and an idiosyncratic error ε
ijt
. Hence,
u
ijt
= α
i
(y
i
p
jt
) + x
jt
β
i
+ ξ
jt
+ ε
ijt
where y
i
is individual income, β
i
is a K × 1 vector of coefficients, and α
i
is the marginal utility of
income. The term ξ
jt
can be regarded as deviations from observed product quality that are common
to all individuals. Consumer i can also choose to buy the outside product j = 0, with normalized
utility u
i0t
= α
i
y
i
+ ε
i0t
.
Both β
i
and α
i
are assumed to be linear functions of a d × 1 vector of demographic factors, D
i
,
and a (K + 1) × 1 vector of unobservable components, ν
i
. In particular,
β
i
α
i
=
β
α
+ πD
i
+
i
where π and L are (K + 1) × d and (K + 1) × (K + 1) matrices, respectively. The distribution of the
demographics, F
D
(D), is assumed to be known or estimable (e.g., researchers can use the American
Community Survey data to estimate the empirical distribution or directly use the loyalty program
data if available). For ν
i
, it is assumed that
i
iid
N(0, Σ), where Σ = L
L is the covariance matrix
1
of the coefficients β
i
, α
i
conditional upon D
i
and L functions similar to Choleski factor of Σ (See
section 9.2.5 of Train (2009) for more details). One can add interaction terms into D
i
to capture more
information from demographic variables. Requiring non-negative variances on random coefficients is
not absolutely necessary. So the model can be written as
u
ijt
= α
i
y
i
+ δ
jt
(x
jt
, ξ
jt
; θ
1
) + µ
ijt
(x
jt
, ν
i
, D
i
; θ
2
) + ε
ijt
δ
jt
(x
jt
, ξ
jt
; θ
1
) = x
jt
β αp
jt
+ ξ
jt
µ
ijt
(x
jt
, ν
i
, D
i
; θ
2
) = [x
jt
, p
jt
]
(πD
i
+
i
) ,
It is worth noting that we do not require random coefficients on all characteristics. The model can
be rewritten as
u
ijt
= α
i
y
i
+ δ
jt
x
(1)
jt
, p
jt
, ξ
jt
; θ
1
+ µ
ijt
x
(2)
jt
, p
jt
, ν
i
, D
i
; θ
2
+ ε
ijt
There are two reasons for this (see the appendix of Nevo (2000) for more details). First, we might not
want to allow for random coefficients on some characteristics. Second, if we include brand dummy
variables they will only enter x
(1)
jt
, the linear part of the model, while the product characteristics will
be included only in x
(2)
jt
, the nonlinear part of the model.
Define the set A
ijt
= (ε
it
: u
ijt
u
imt
, m = j), where ε
it
= (ε
i0t
, . . . , ε
iJt
), then the probability
that individual i selects product j, in market t, given D
i
and ν
i
is
Pr
ijt
=
Z
A
ijt
d F (ε
it
| D
i
, ν
i
)
Integrating out the unobservables D
i
and ν
i
yields
Pr
jt
=
Z
D
i
Z
ν
i
Pr
ijt
d F (ν
i
| D
i
) d
b
F
D(D
i
)=
R
D
i
R
ν
i
Pr
ijt
d F
ν
(ν
i
) d
b
F
D
(
D
i
)
The probability Pr
jt
is the same for all i and can be estimated by the product market shares s
jt
=
q
jt
/I
t
, where q
jt
denotes the sales. The error in this approximation is O
p
I
1/2
t
and will be negligible
for large I
t
, which is often the case.
To evaluate the integrals, it is assumed that the errors ε
ijt
are i.i.d. and have a type-I extreme-value
distribution. Then,
Pr
ijt
=
exp
x
jt
β
i
α
i
p
jt
+ ξ
jt
1 +
P
J
m=1
exp (x
mt
β
i
α
i
p
mt
+ ξ
mt
)
Because income y
i
appears in the (indirect) utility function for all alternatives, including the outside
option, α
i
y
i
cancels in the expression for Pr
ijt
.
The integrals cannot be evaluated analytically, but they can be approximated by Monte Carlo
integration with R random draws of (D
i
, ν
i
) from the distributions
b
F
D
(D) and N (0, I
K+1
). Letting
δ
jt
= x
jt
β αp
jt
+ ξ
jt
denote the mean utility,
s
jt
=
1
R
R
X
i=1
Pr
ijt
=
1
R
R
X
i=1
exp
δ
jt
+
x
jt
, p
jt
(πD
i
+
i
)
1 +
P
J
m=1
exp {δ
mt
+ (x
mt
, p
mt
) (πD
i
+
i
)}
The simulation error can be reduced by increasing the number of draws, R. This has a convergence
rate of O
R
1/2
and requires R to be increased by a factor of 100 for every additional digit of
accuracy.
2
2 Estimation
2.1 Nested Fixed Point Algorithm (NFP)
Initialization
1. Draw random values for D
ijt
and ν
ijt
for i = 1, 2, . . . , R and for all j and t from the distributions
b
F
D
(D) and N (0, I), where s is set by researchers the number of customers in each market.
2. Compute initial δ
jt
based on homogeneous logit: δ
0
t
= log s
t
log s
0t
.
3. Set initial values for nonlinear parameters π and L. In Vincent (2015), these are set to 0.5.
Inner Loop (Fix θ
2
)
1. For a given δ
h
t
, iteratively compute δ
h+1
t
until n such that δ
n
t
δ
n1
t
< ϕ, where ϕ is a
small number set by researchers (Dub´e, Fox, and Su (2012) recommend setting the contraction
mapping tolerance close to machine precision ϕ = 1 × 10
14
.). Denote δ
t
(θ
2
) = δ
n
t
.
Contraction Mapping by Berry, Levinsohn, and Pakes (1995). Very slow.
δ
h+1
t
= δ
h
t
+ log(s
t
) log(s(δ
h
t
; θ
2
))
One can alternatively use the following way to compute the contraction mapping:
exp (δ
h+1
t
) = exp (δ
h
t
)
s
t
s(δ
h
t
; θ
2
)
SQUAREM by Varadhan and Roland (2008) (There are other two choices of steplength
in the paper but the one used here has the most desired feature; PyBLP uses a different
steplength). Reynaerts, Varadha, and Nash (2012) show that SQUAREM offers significant
increases in speed and stability over BLP.
r
h
t
= log(s
t
) log(s(δ
h
t
; θ
2
))
v
h
t
= log(s(δ
h
t
; θ
2
)) log(s(δ
h+1
t
; θ
2
))
a
h
t
= −∥r
h
t
2
/v
h
t
2
δ
h+1
t
= δ
h
t
2a
h
t
r
h
t
+ (a
h
t
)
2
v
h
t
2. Construct the GMM objective function
(a) Given δ(θ
2
), estimate θ
1
and compute ξ using 2SLS
θ
2SLS
1
(θ
2
) = (X
Z(Z
Z)
1
Z
X)
1
X
Z(Z
Z)
1
X
δ(θ
2
)
ξ
2SLS
(θ
2
) = δ(θ
2
) Xθ
2SLS
1
(θ
2
)
(b) Note that 2SLS is not asymptotically efficient if there is heteroskedasticity, i.e., the covari-
ance of the moments is not σ
2
ξ
E(Z
i
Z
i
) (See section 8.3.3 of Wooldridge (2010) for more
details).
So we need to construct the GMM weight matrix W = (
˜
Z
˜
Z)
1
where
˜
Z
i
= Z
i
ξ
2SLS
i
.
Then
θ
GMM
1
(θ
2
) = (X
ZW Z
X)
1
X
ZW X
δ(θ
2
)
ξ
GMM
(θ
2
) = δ(θ
2
) Xθ
GMM
1
(θ
2
)
(c) Return to the outer loop.
Outer loop
Search for the optimal θ
2
= (π, L) to minimize the GMM objective function (one can use built-in
optimization tools to do this)
argmin
θ
2
ξ
GMM
(θ
2
)
ZW Z
ξ
GMM
(θ
2
)
3
2.2 Mathematical Program with Equilibrium Constraints (MPEC)
*Very few examples available online that use free software to implement this algorithm to estimate
structural models. Jean-Pierre Dub´e only provides MATLAB code to use KNITRO on his website.
Dub´e, Fox, and Su (2012) propose an alternative approach to compute the GMM estimator in
the BLP model (Su and Judd (2012) showed that the MPEC and NFP algorithms compute the same
statistical estimator):
min
θ,ξ
g(ξ)
W g(ξ)
s.t. s(ξ; θ) = S,
where g(ξ) is the moment condition term.
One potential threat is larger-dimensional optimization problem. This concern can be addressed
by exploiting the sparsity structure of the Jacobian of the market share equations: the demand shocks
for market t do not enter the constraints for other markets t
= t. We can exploit sparsity even further
by treating the moments as additional parameters and reformulating the problem as:
min
θ,ξ,η
η
W η
s.t. g(ξ) = η,
s(ξ; θ) = S.
The additional constraint g(ξ) η = 0 does not increase computational difficulty. The advantage
with this alternative formulation is that, by introducing additional variables and linear constraints,
the Hessian of the Lagrangian function is sparse. In general, supplying exact first-order and secon-
dorder derivatives to the optimizer will decrease computational time substantially. In addition, the
capability of MPEC will be enhanced further when the sparsity patterns of first-order (Jacobian)
and second-order (Hessian) derivatives are provided to the optimizer. Exploiting the sparsity of the
optimization problem will both increase the speed of MPEC and enable MPEC to handle larger-
dimensional problems.
2.3 Approximate BLP (ABLP)
The ABLP estimator was proposed by Lee and Seo (2015). The procedure is as follows. First, posit
a specific parameter guess θ
0
. Second, find the corresponding solution ξ
0
using the BLP contraction
mappings. The procedure can start with any arbitrary initial guess θ
0
as in the BLP estimation
procedure.
After the initial center of approximation ξ
0
is identified, the ABLP estimation procedure iterates
the followings until convergence of ξ and θ is reached:
Stage 1: obtain a new estimate θ
h
θ
h
= arg min
θ
Ψ(θ; ξ
h1
)
Z
c
W Z
Ψ(θ; ξ
h1
),
where
Ψ(θ, ξ) = ξ + [
ξ
ln s(ξ; θ)]
1
[ln S ln s(ξ; θ)]
Stage 2: update ξ using θ
h
of Stage 1
ξ
h
= Ψ(ξ
h1
; θ
h
).
Stage 1 minimizes an approximate GMM objective function. Rather than solving the market
share equations, the ABLP inversion Ψ is adopted and the unobserved product characteristics are
only approximated given parameter values. Stage 2 is one of the steps of Newton’s method that solves
4
ln S = ln s(ξ; θ) for ξ. If iterating Stage 2 only, ξ
h
would converge to the solution, provided ξ
0
is
close enough to the solution. Stage 2 does not solve the market share equations before reaching the
limit, but the limit of ξ
h
does, if it exists.
2.4 Tricks for Numerical Computing
Several tricks discussed in Conlon and Gortmaker (2020) could help in numerical computing:
Use log-sum-exp function to prevent overflow problems
LSE(x) = log
X
k
exp x
k
= max{0, x} + log
X
k
exp (x
k
max{0, x})
”Hot start”: the starting value for the iterative procedure is the δ
h1
t
that solved the system of
equations for the previous guess of θ
2
.
3 Identification
*The rest sections are based mainly on Dr. Fernando Luco’s lecture notes.
In an ideal case, we would like to see variations in prices, characteristics, and availability of prod-
ucts, and see where consumers switch (i.e., shares of which products respond). In practice, we use
IVs that try to mimic this. IVs play a dual role: (1) generate moment condition to identify non-linear
parameters and (2) deal with the correlation of prices and ξ. Source of instruments are listed as follows:
Supply information (Berry, Levinsohn, and Pakes (1995))
π
jt
= p
jt
q
jt
C
j
(q
jt
)
By solving the F.O.C., we have
p
jt
= C
j
(q
jt
)
η
η 1
So, cost shifters and product characteristics (that affect η) are valid instruments. Note that
jt
does not enter the demand equation for product j, which leads researcher to use functions of
rival product characteristics as instruments.
Many markets. Nevo (2001) suggests to include product fixed effects as characteristics but no
time-invariant characteristics would remain. Hausman (1996) argue that one source of instru-
ments is prices of the same brand in other cities. National or regional demand shocks (may)
render the invalidity. Similar with advertising.
Micro-moments (Petrin (2002)). Use individual-level data to relate the demographic informa-
tion of smaller market segment to the characteristics of the products they purchase.
Differentiation IVs. Gandhi and Houde (2019) argue that differences in characteristics are
important.
Z
D, Local
jt
=
1, x
jt
, w
jt
,
X
kI
R
\{j}
1 (|d
jk
| < σ
d
)
X
k /J
1 (|d
jk
| < σ
d
)
Z
D, Quad
jt
=
1, x
jt
, w
jt
,
X
k
k
\{j}
d
2
jk
,
X
k /J
k
d
2
jk
where d
jk
= x
k
x
j
.
5
4 Elasticities and Marginal Costs
Suppose that we draw random values for D
ijt
and ν
ijt
for i = 1, 2, . . . , R. For each pair of D
ijt
and
ν
ijt
we can compute the probability of each product getting chosen
s
ijt
=
exp (δ
jt
+ µ
ijt
)
1 +
P
k
exp (δ
kt
+ µ
ikt
)
where
δ
jt
= x
jt
ˆ
β ˆαp
jt
+ ξ
jt
µ
ijt
= [x
jt
, p
jt
]
ˆπD
i
+
ˆ
i
Then, the elasticities are given by
η
jkt
=
s
jt
p
kt
p
kt
s
jt
=
p
jt
s
jt
1
N
N
X
i=1
α
i
s
ijt
(1 s
ijt
) if j = k
p
kt
s
jt
1
N
N
X
i=1
α
i
s
ijt
s
ikt
otherwise
Now define F
f
as the set of products produced by the firm f. Assume that the markets follow
Bertrand-Nash pricing, then each firm in the market seeks to maximize its profits:
Π
f
=
X
jF
f
(p
j
mc
j
) s
j
M C
f
,
where M is market size, s is market share, and C
f
is a fixed cost. Then the F.O.C. is given by:
s
j
+
X
kF
f
(p
k
mc
k
)
s
k
p
j
= 0
Define
jr
=
S
r
p
j
if j, r F
f
. Otherwise
jr
= 0. Then, the FOC can be written as
S + (P M C) = 0 = MC = P +
1
S.
This means that if we have a model of competition () and estimates
s
p
, we can recover an estimate
of marginal costs.
5 Welfare Analysis
The main reason to work with structural models is to conduct counterfactuals. In these counterfactu-
als, policy/environmental changes affect the economy. We are interested in how these changes affect
welfare. Let u
ijt
= α
i
(y
i
p
jt
)+x
jt
β
i
+ξ
jt
+ε
ijt
= v
ijt
+ε
ijt
. The measure of welfare is the expected
maximum utility out of a choice set: W
it
= E [max
jA
u
ijt
].
6
E
max
jA
u
ijt
= E
X
jA
1 [d
it
= j] u
ijt
=
X
jA
Pr [d
it
= j] u
ijt
=
X
jA
Pr [d
it
= j] (v
ijt
+ ε
ijt
)
= Pr [d
it
= 1] (v
ijt
+ ε
ijt
) + . . . + Pr [d
it
= J] (v
ijt
+ ε
ijt
)
Then,
E[max
jA
u
ijt
]
v
ikt
= Pr (d
it
= k) =
exp(v
ikt
)
P
jA
exp(v
ijt
)
. So
W
i
=
Z
E [max
jA
u
ijt
]
v
ikt
dv = c + log
X
j
exp (v
ijt
)
The division by α
i
translates utility into dollars, since 1
i
= dy
i
/ du
ijt
.
We can consider three cases:
The choice set remains constant. Let pre-intervention welfare be
W
i
(p
t
, x
t
, ξ
t
| A
t
) = log
X
jA
t
exp (v
ijt
)
Let post-intervention welfare be
W
i
(p
t
, x
t
, ξ
t
| A
t
) = log
X
jA
t
exp
v
ijt
The change in consumer welfare is then W = W
W .
The choice set after the intervention is a subset of the original choice set. Let
pre-intervention welfare be
W
i
(p
t
, x
t
, ξ
t
| A
t
) = log
X
jA
t
exp (v
ijt
)
Let post-intervention welfare be
W
i
(p
t
, x
t
, ξ
t
| A
t
) = log
X
jA
t
exp
v
ijt
where A
t
A
t
. The change in consumer welfare is then W = W
W . A major issue is
that ε
ijt
has support on the entire real line. Hence, removing options always decreases welfare
(mechanically) and adding the worst product you can think of will mechanically weakly increase
welfare. Song (2015), Marshall (2015), and Duch-Brown et al. (2020) all propose frameworks to
deal with this.
The choice set before the intervention is a subset of the final choice set. Let pre-
intervention welfare be
W
i
(p
t
, x
t
, ξ
t
| A
t
) = log
X
jA
t
exp (v
ijt
)
7
Let post-intervention welfare be
W
i
(p
t
, x
t
, ξ
t
| A
t
) = log
X
jA
t
exp
v
ijt
where A
t
A
t
. The change in consumer welfare is then W = W
W . We can deal with
this if we either have pre- and post- intervention data (Petrin (2002)) or a model of how ξ are
created. The key is to allow ξ to change to fit the data.
References
Berry, Steven, James Levinsohn, and Ariel Pakes (1995). “Automobile prices in market equilibrium”.
In: Econometrica: Journal of the Econometric Society, pp. 841–890.
Conlon, Christopher and Jeff Gortmaker (2020). “Best practices for differentiated products demand
estimation with pyblp”. In: The RAND Journal of Economics 51(4), pp. 1108–1161.
Dub´e, Jean-Pierre, Jeremy T Fox, and Che-Lin Su (2012). “Improving the numerical performance of
static and dynamic aggregate discrete choice random coefficients demand estimation”. In: Econo-
metrica 80(5), pp. 2231–2267.
Duch-Brown, N´estor et al. (2020). “Evaluating the Impact of Online Market Integration-Evidence
from the EU Portable PC Market”. In.
Gandhi, Amit and Jean-Fran¸cois Houde (2019). “Measuring substitution patterns in differentiated-
products industries”. In: NBER Working Paper( w26375).
Hausman, Jerry A (1996). “Valuation of new goods under perfect and imperfect competition”. In:
The Economics of New Goods. University of Chicago Press, pp. 207–248.
Lee, Jinhyuk and Kyoungwon Seo (2015). “A computationally fast estimator for random coefficients
logit demand models using aggregate data”. In: The RAND Journal of Economics 46(1), pp. 86–
102.
Marshall, Guillermo (2015). “Hassle costs and price discrimination: An empirical welfare analysis”.
In: American Economic Journal: Applied Economics 7(3), pp. 123–46.
Nevo, Aviv (2000). “A practitioner’s guide to estimation of random-coefficients logit models of de-
mand”. In: Journal of Economics & Management Strategy 9(4), pp. 513–548.
Nevo, Aviv (2001). “Measuring market power in the ready-to-eat cereal industry”. In: Econometrica
69(2), pp. 307–342.
Petrin, Amil (2002). “Quantifying the benefits of new products: The case of the minivan”. In: Journal
of Political Economy 110(4), pp. 705–729.
Reynaerts, Jo, R Varadha, and John C Nash (2012). “Enhencing the convergence properties of the
BLP (1995) contraction mapping”. In.
Song, Minjae (2015). “A hybrid discrete choice model of differentiated product demand with an
application to personal computers”. In: International Economic Review 56(1), pp. 265–301.
Su, Che-Lin and Kenneth L Judd (2012). “Constrained optimization approaches to estimation of
structural models”. In: Econometrica 80(5), pp. 2213–2230.
Train, Kenneth E (2009). Discrete choice methods with simulation. Cambridge university press.
Varadhan, Ravi and Christophe Roland (2008). “Simple and globally convergent methods for acceler-
ating the convergence of any EM algorithm”. In: Scandinavian Journal of Statistics 35(2), pp. 335–
353.
Vincent, David W (2015). “The Berry–Levinsohn–Pakes estimator of the random-coefficients logit
demand model”. In: The Stata Journal 15(3), pp. 854–880.
Wooldridge, Jeffrey M (2010). Econometric analysis of cross section and panel data. MIT press.
8