Random Coeﬃcient Logit Models (BLP)

Zijing Hu

November 9, 2022

Contents

1 Introduction 1

2 Estimation 3

2.1 Nested Fixed Point Algorithm (NFP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Mathematical Program with Equilibrium Constraints (MPEC) . . . . . . . . . . . . . 4

2.3 Approximate BLP (ABLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Tricks for Numerical Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Identiﬁcation 5

4 Elasticities and Marginal Costs 6

5 Welfare Analysis 6

1 Introduction

*This section is based mainly on Nevo (2000) and Vincent (2015).

Let u

ijt

denote the indirect utility that individual i receives from the consumption of product j in

market t. This is assumed to be a linear function of a K ×1 vector of product characteristics x

, price

, an unobserved (to the econometrician) component ξ

, and an idiosyncratic error ε

ijt

. Hence,

ijt

= α

− p

) + x

′

+ ξ

+ ε

ijt

where y

is individual income, β

is a K × 1 vector of coeﬃcients, and α

is the marginal utility of

income. The term ξ

can be regarded as deviations from observed product quality that are common

to all individuals. Consumer i can also choose to buy the outside product j = 0, with normalized

utility u

i0t

= α

+ ε

i0t

Both β

and α

are assumed to be linear functions of a d × 1 vector of demographic factors, D

and a (K + 1) × 1 vector of unobservable components, ν

. In particular,









+ πD

+ Lν

where π and L are (K + 1) × d and (K + 1) × (K + 1) matrices, respectively. The distribution of the

demographics, F

(D), is assumed to be known or estimable (e.g., researchers can use the American

Community Survey data to estimate the empirical distribution or directly use the loyalty program

data if available). For ν

, it is assumed that Lν

iid

∼ N(0, Σ), where Σ = L

′

L is the covariance matrix

of the coeﬃcients β

, α

conditional upon D

and L functions similar to Choleski factor of Σ (See

section 9.2.5 of Train (2009) for more details). One can add interaction terms into D

to capture more

information from demographic variables. Requiring non-negative variances on random coeﬃcients is

not absolutely necessary. So the model can be written as

ijt

= α

+ δ

, ξ

; θ

) + µ

ijt

, ν

, D

; θ

) + ε

ijt

, ξ

; θ

) = x

′

β − αp

+ ξ

ijt

, ν

, D

; θ

) = [x

, −p

]

′

(πD

+ Lν

) ,

It is worth noting that we do not require random coeﬃcients on all characteristics. The model can

be rewritten as

ijt

= α

+ δ



(1)

, p

, ξ

; θ



+ µ

ijt



(2)

, p

, ν

, D

; θ



+ ε

ijt

There are two reasons for this (see the appendix of Nevo (2000) for more details). First, we might not

want to allow for random coeﬃcients on some characteristics. Second, if we include brand dummy

variables they will only enter x

(1)

, the linear part of the model, while the product characteristics will

be included only in x

(2)

, the nonlinear part of the model.

Deﬁne the set A

ijt

= (ε

: u

ijt

≥ u

imt

, ∀m = j), where ε

= (ε

i0t

, . . . , ε

iJt

), then the probability

that individual i selects product j, in market t, given D

and ν

ijt

d F (ε

| D

, ν

)

Integrating out the unobservables D

and ν

yields

ijt

d F (ν

| D

) d

D(D

ijt

d F

(ν

) d

(

)

The probability Pr

is the same for all i and can be estimated by the product market shares s

, where q

denotes the sales. The error in this approximation is O



−1/2



and will be negligible

for large I

, which is often the case.

To evaluate the integrals, it is assumed that the errors ε

ijt

are i.i.d. and have a type-I extreme-value

distribution. Then,

ijt

exp



′

− α

+ ξ



1 +

m=1

exp (x

′

− α

+ ξ

)

Because income y

appears in the (indirect) utility function for all alternatives, including the outside

option, α

cancels in the expression for Pr

ijt

The integrals cannot be evaluated analytically, but they can be approximated by Monte Carlo

integration with R random draws of (D

, ν

) from the distributions

(D) and N (0, I

K+1

). Letting

= x

′

β − αp

+ ξ

denote the mean utility,

i=1

ijt

i=1

exp





′

, −p



(πD

+ Lν

)



1 +

m=1

exp {δ

+ (x

′

, −p

) (πD

+ Lν

)}

The simulation error can be reduced by increasing the number of draws, R. This has a convergence

rate of O



−1/2



and requires R to be increased by a factor of 100 for every additional digit of

accuracy.

2 Estimation

2.1 Nested Fixed Point Algorithm (NFP)

Initialization

1. Draw random values for D

ijt

and ν

ijt

for i = 1, 2, . . . , R and for all j and t from the distributions

(D) and N (0, I), where s is set by researchers the number of customers in each market.

2. Compute initial δ

based on homogeneous logit: δ

= log s

− log s

3. Set initial values for nonlinear parameters π and L. In Vincent (2015), these are set to 0.5.

Inner Loop (Fix θ

)

1. For a given δ

, iteratively compute δ

h+1

until n such that ∥δ

− δ

n−1

∥

∞

< ϕ, where ϕ is a

small number set by researchers (Dub´e, Fox, and Su (2012) recommend setting the contraction

mapping tolerance close to machine precision ϕ = 1 × 10

−14

.). Denote δ

(θ

) = δ

• Contraction Mapping by Berry, Levinsohn, and Pakes (1995). Very slow.

h+1

= δ

+ log(s

) − log(s(δ

; θ

))

One can alternatively use the following way to compute the contraction mapping:

exp (δ

h+1

) = exp (δ

)

s(δ

; θ

)

• SQUAREM by Varadhan and Roland (2008) (There are other two choices of steplength

in the paper but the one used here has the most desired feature; PyBLP uses a diﬀerent

steplength). Reynaerts, Varadha, and Nash (2012) show that SQUAREM oﬀers signiﬁcant

increases in speed and stability over BLP.

= log(s

) − log(s(δ

; θ

))

= log(s(δ

; θ

)) − log(s(δ

h+1

; θ

))

= −∥r

∥

/∥v

∥

h+1

= δ

− 2a

+ (a

)

2. Construct the GMM objective function

(a) Given δ(θ

), estimate θ

and compute ξ using 2SLS

2SLS

(θ

) = (X

′

Z(Z

′

−1

′

−1

′

Z(Z

′

−1

′

δ(θ

)

2SLS

(θ

) = δ(θ

) − Xθ

2SLS

(θ

)

(b) Note that 2SLS is not asymptotically eﬃcient if there is heteroskedasticity, i.e., the covari-

ance of the moments is not σ

E(Z

′

) (See section 8.3.3 of Wooldridge (2010) for more

details).

So we need to construct the GMM weight matrix W = (

′

−1

where

= Z

2SLS

Then

GMM

(θ

) = (X

′

ZW Z

′

−1

′

ZW X

′

δ(θ

)

GMM

(θ

) = δ(θ

) − Xθ

GMM

(θ

)

Outer loop

Search for the optimal θ

= (π, L) to minimize the GMM objective function (one can use built-in

optimization tools to do this)

argmin

GMM

(θ

)

′

ZW Z

′

GMM

(θ

)

2.2 Mathematical Program with Equilibrium Constraints (MPEC)

*Very few examples available online that use free software to implement this algorithm to estimate

structural models. Jean-Pierre Dub´e only provides MATLAB code to use KNITRO on his website.

Dub´e, Fox, and Su (2012) propose an alternative approach to compute the GMM estimator in

the BLP model (Su and Judd (2012) showed that the MPEC and NFP algorithms compute the same

statistical estimator):

min

θ,ξ

g(ξ)

′

W g(ξ)

s.t. s(ξ; θ) = S,

where g(ξ) is the moment condition term.

One potential threat is larger-dimensional optimization problem. This concern can be addressed

by exploiting the sparsity structure of the Jacobian of the market share equations: the demand shocks

for market t do not enter the constraints for other markets t

′

= t. We can exploit sparsity even further

by treating the moments as additional parameters and reformulating the problem as:

min

θ,ξ,η

′

W η

s.t. g(ξ) = η,

s(ξ; θ) = S.

The additional constraint g(ξ) − η = 0 does not increase computational diﬃculty. The advantage

with this alternative formulation is that, by introducing additional variables and linear constraints,

the Hessian of the Lagrangian function is sparse. In general, supplying exact ﬁrst-order and secon-

dorder derivatives to the optimizer will decrease computational time substantially. In addition, the

capability of MPEC will be enhanced further when the sparsity patterns of ﬁrst-order (Jacobian)

and second-order (Hessian) derivatives are provided to the optimizer. Exploiting the sparsity of the

optimization problem will both increase the speed of MPEC and enable MPEC to handle larger-

dimensional problems.

2.3 Approximate BLP (ABLP)

The ABLP estimator was proposed by Lee and Seo (2015). The procedure is as follows. First, posit

a speciﬁc parameter guess θ

. Second, ﬁnd the corresponding solution ξ

using the BLP contraction

mappings. The procedure can start with any arbitrary initial guess θ

as in the BLP estimation

procedure.

After the initial center of approximation ξ

is identiﬁed, the ABLP estimation procedure iterates

the followings until convergence of ξ and θ is reached:

• Stage 1: obtain a new estimate θ

= arg min

Ψ(θ; ξ

h−1

)

′

W Z

′

Ψ(θ; ξ

h−1

where

Ψ(θ, ξ) = ξ + [∇

′

ln s(ξ; θ)]

−1

[ln S − ln s(ξ; θ)]

• Stage 2: update ξ using θ

of Stage 1

= Ψ(ξ

h−1

; θ

Stage 1 minimizes an approximate GMM objective function. Rather than solving the market

share equations, the ABLP inversion Ψ is adopted and the unobserved product characteristics are

only approximated given parameter values. Stage 2 is one of the steps of Newton’s method that solves

ln S = ln s(ξ; θ) for ξ. If iterating Stage 2 only, ξ

would converge to the solution, provided ξ

close enough to the solution. Stage 2 does not solve the market share equations before reaching the

limit, but the limit of ξ

does, if it exists.

2.4 Tricks for Numerical Computing

Several tricks discussed in Conlon and Gortmaker (2020) could help in numerical computing:

• Use log-sum-exp function to prevent overﬂow problems

LSE(x) = log

exp x

= max{0, x} + log

exp (x

− max{0, x})

• ”Hot start”: the starting value for the iterative procedure is the δ

h−1

that solved the system of

equations for the previous guess of θ

3 Identiﬁcation

*The rest sections are based mainly on Dr. Fernando Luco’s lecture notes.

In an ideal case, we would like to see variations in prices, characteristics, and availability of prod-

ucts, and see where consumers switch (i.e., shares of which products respond). In practice, we use

IVs that try to mimic this. IVs play a dual role: (1) generate moment condition to identify non-linear

parameters and (2) deal with the correlation of prices and ξ. Source of instruments are listed as follows:

• Supply information (Berry, Levinsohn, and Pakes (1995))

= p

− C

)

By solving the F.O.C., we have

⋆

= C

′

)

η − 1

So, cost shifters and product characteristics (that aﬀect η) are valid instruments. Note that ↶

does not enter the demand equation for product j, which leads researcher to use functions of

rival product characteristics as instruments.

• Many markets. Nevo (2001) suggests to include product ﬁxed eﬀects as characteristics but no

time-invariant characteristics would remain. Hausman (1996) argue that one source of instru-

ments is prices of the same brand in other cities. National or regional demand shocks (may)

render the invalidity. Similar with advertising.

• Micro-moments (Petrin (2002)). Use individual-level data to relate the demographic informa-

tion of smaller market segment to the characteristics of the products they purchase.

• Diﬀerentiation IVs. Gandhi and Houde (2019) argue that diﬀerences in characteristics are

important.

D, Local







1, x

, w

k∈I

\{j}

1 (|d

| < σ

)

k /∈J

Ω

1 (|d

| < σ

)







D, Quad







1, x

, w

k∈ℓ

\{j}

k /∈J







where d

= x

− x

4 Elasticities and Marginal Costs

Suppose that we draw random values for D

ijt

and ν

ijt

for i = 1, 2, . . . , R. For each pair of D

ijt

and

ijt

we can compute the probability of each product getting chosen

ijt

exp (δ

+ µ

ijt

)

1 +

exp (δ

+ µ

ikt

)

where

= x

′

β − ˆαp

+ ξ

ijt

= [x

, −p

]

′



ˆπD

Lν



Then, the elasticities are given by

jkt

∂s

∂p











i=1

−α

ijt

(1 − s

ijt

) if j = k

−

i=1

−α

ijt

ikt

otherwise

Now deﬁne F

as the set of products produced by the ﬁrm f. Assume that the markets follow

Bertrand-Nash pricing, then each ﬁrm in the market seeks to maximize its proﬁts:

j∈F

− mc

) s

M − C

where M is market size, s is market share, and C

is a ﬁxed cost. Then the F.O.C. is given by:

k∈F

− mc

)

∂s

∂p

= 0

Deﬁne Ω

∂S

∂p

if j, r ⊆ F

. Otherwise Ω

= 0. Then, the FOC can be written as

S + Ω(P − M C) = 0 =⇒ MC = P + Ω

−1

This means that if we have a model of competition (Ω) and estimates



∂s

∂p



, we can recover an estimate

of marginal costs.

5 Welfare Analysis

The main reason to work with structural models is to conduct counterfactuals. In these counterfactu-

als, policy/environmental changes aﬀect the economy. We are interested in how these changes aﬀect

welfare. Let u

ijt

= α

− p

)+x

′

+ξ

+ε

ijt

= v

ijt

+ε

ijt

. The measure of welfare is the expected

maximum utility out of a choice set: W

= E [max

j∈A

ijt



max

j∈A

ijt



= E





j∈A

1 [d

= j] u

ijt





j∈A

Pr [d

= j] u

ijt

j∈A

Pr [d

= j] (v

ijt

+ ε

ijt

)

= Pr [d

= 1] (v

ijt

+ ε

ijt

) + . . . + Pr [d

= J] (v

ijt

+ ε

ijt

)

Then,

∂E[max

j∈A

ijt

]

∂v

ikt

= Pr (d

= k) =

exp(v

ikt

)

j∈A

exp(v

ijt

)

. So

∂E [max

j∈A

ijt

]

∂v

ikt

dv = c + log





exp (v

ijt

)





The division by α

translates utility into dollars, since 1/α

= dy

/ du

ijt

We can consider three cases:

• The choice set remains constant. Let pre-intervention welfare be

, x

, ξ

| A

) = log





j∈A

exp (v

ijt

)





Let post-intervention welfare be

′

, x

′

, ξ

| A

) = log





j∈A

exp



′

ijt







The change in consumer welfare is then ∆W = W

′

− W .

• The choice set after the intervention is a subset of the original choice set. Let

pre-intervention welfare be

, x

, ξ

| A

) = log





j∈A

exp (v

ijt

)





Let post-intervention welfare be

′

, x

′

, ξ

| A

′

) = log





j∈A

exp



′

ijt







where A

′

⊂ A

. The change in consumer welfare is then ∆W = W

′

− W . A major issue is

that ε

ijt

has support on the entire real line. Hence, removing options always decreases welfare

(mechanically) and adding the worst product you can think of will mechanically weakly increase

welfare. Song (2015), Marshall (2015), and Duch-Brown et al. (2020) all propose frameworks to

deal with this.

• The choice set before the intervention is a subset of the ﬁnal choice set. Let pre-

intervention welfare be

, x

, ξ

| A

) = log





j∈A

exp (v

ijt

)





Let post-intervention welfare be

′

, x

′

, ξ

| A

′

) = log





j∈A

exp



′

ijt







where A

⊂ A

′

. The change in consumer welfare is then ∆W = W

′

− W . We can deal with

this if we either have pre- and post- intervention data (Petrin (2002)) or a model of how ξ are

created. The key is to allow ξ to change to ﬁt the data.

References

Berry, Steven, James Levinsohn, and Ariel Pakes (1995). “Automobile prices in market equilibrium”.

In: Econometrica: Journal of the Econometric Society, pp. 841–890.

Conlon, Christopher and Jeﬀ Gortmaker (2020). “Best practices for diﬀerentiated products demand

estimation with pyblp”. In: The RAND Journal of Economics 51(4), pp. 1108–1161.

Dub´e, Jean-Pierre, Jeremy T Fox, and Che-Lin Su (2012). “Improving the numerical performance of

static and dynamic aggregate discrete choice random coeﬃcients demand estimation”. In: Econo-

metrica 80(5), pp. 2231–2267.

Duch-Brown, N´estor et al. (2020). “Evaluating the Impact of Online Market Integration-Evidence

from the EU Portable PC Market”. In.

Gandhi, Amit and Jean-Fran¸cois Houde (2019). “Measuring substitution patterns in diﬀerentiated-

products industries”. In: NBER Working Paper( w26375).

Hausman, Jerry A (1996). “Valuation of new goods under perfect and imperfect competition”. In:

The Economics of New Goods. University of Chicago Press, pp. 207–248.

Lee, Jinhyuk and Kyoungwon Seo (2015). “A computationally fast estimator for random coeﬃcients

logit demand models using aggregate data”. In: The RAND Journal of Economics 46(1), pp. 86–

102.

Marshall, Guillermo (2015). “Hassle costs and price discrimination: An empirical welfare analysis”.

In: American Economic Journal: Applied Economics 7(3), pp. 123–46.

Nevo, Aviv (2000). “A practitioner’s guide to estimation of random-coeﬃcients logit models of de-

mand”. In: Journal of Economics & Management Strategy 9(4), pp. 513–548.

Nevo, Aviv (2001). “Measuring market power in the ready-to-eat cereal industry”. In: Econometrica

69(2), pp. 307–342.

Petrin, Amil (2002). “Quantifying the beneﬁts of new products: The case of the minivan”. In: Journal

of Political Economy 110(4), pp. 705–729.

Reynaerts, Jo, R Varadha, and John C Nash (2012). “Enhencing the convergence properties of the

BLP (1995) contraction mapping”. In.

Song, Minjae (2015). “A hybrid discrete choice model of diﬀerentiated product demand with an

application to personal computers”. In: International Economic Review 56(1), pp. 265–301.

Su, Che-Lin and Kenneth L Judd (2012). “Constrained optimization approaches to estimation of

structural models”. In: Econometrica 80(5), pp. 2213–2230.

Train, Kenneth E (2009). Discrete choice methods with simulation. Cambridge university press.

Varadhan, Ravi and Christophe Roland (2008). “Simple and globally convergent methods for acceler-

ating the convergence of any EM algorithm”. In: Scandinavian Journal of Statistics 35(2), pp. 335–

353.

Vincent, David W (2015). “The Berry–Levinsohn–Pakes estimator of the random-coeﬃcients logit

demand model”. In: The Stata Journal 15(3), pp. 854–880.

Wooldridge, Jeﬀrey M (2010). Econometric analysis of cross section and panel data. MIT press.