Identification
Zijing Hu
November 29, 2023
Contents
1 Basic Concepts of Identification 1
2 Identification in Causal Model 3
2.1 Basic Assumptions: Unconfoundedness and Overlap . . . . . . . . . . . . . . . . . . . 3
2.2 Accessing Overlap Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Accessing Unconfoundedness Assumption . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Accessing unconfoundedness without further assumptions . . . . . . . . . . . . 5
2.3.2 Accessing unconfoundedness using external approaches . . . . . . . . . . . . . . 6
3 Identification in Classical Linear Model 8
4 Identification in Choice Model 8
*This note is based on ECMT 677: Applied Microeconometrics by Dr. Jackson Bunting, TAMU
1 Basic Concepts of Identification
The workflow of empirical analysis
Population: a collection of indices i I (could be either finite or not). Each unit is associated
with some information y
i
: i I. y
i
likely varies by i. We finalize this as y : I R. We call the
CDF of F
y
the population distribution. Other example: (Y
i
, X
i
, U
i
)
iI
and F
Y XU
.
Population parameters of interest: Θ(F
Y XU
). (e.g., E[Y ], E[Y
1
Y
0
]). If we know F
Y XU
,
we can compute Θ(F
Y XU
).
Problem 1: Y and X are observed but U is not.
Population data: the distribution of observed random variables. We conceptualize it as
F
Y X
= MakeData(F
Y XU
),
where MakeData (MD) is an objective data generation process that maps the population
to the observable population data. Usually we don’t impose any assumption on MD.
Problem 2: we only observe a finite sample data (Y
i
, X
i
)
n
i=1
The object becomes learning about Θ(F
Y XU
) from (Y
i
, X
i
)
n
i=1
.
Identification: learn about Θ(F
Y XU
) from F
Y X
= MD(F
Y XU
).
Estimation: take into account the difference between finite sample and population data.
Definition 1.1. Observational Equivalence (o.e.).
We say F
Y XU
and
˜
F
Y XU
are o.e. if MD(F
Y XU
) = MD(
˜
F
Y XU
)
We impose assumptions over the set F where the “true” population distribution F
Y XU
F.
1
Falsifiability: if you can see the population data, can you say your assumption(s) is true or not?
(e.g., it’s easy to test whether E[Y |X = x] = 0 but it might be hard to test unconfoundedness).
Structural models usually impose assumption on F (e.g., the random component of utility is
i.i.d. and takes a Gumbel distribution).
Definition 1.2. Identificiation Set.
Θ
I
=
θ Θ : θ = Θ(F
Y XU
) for some F
Y XU
F and F
obs
Y X
= MD(F
Y XU
)
=
[
{F
Y X U
∈F:MD(F
Y X U
)=F
Y X
}
{Θ(F
Y XU
)}
Types of identification
If Θ
I
= , we conclude that F
Y XU
∈ F, suggesting the assumptions on F might be problematic.
Point identification: |Θ
I
| = 1. This only indicates that
˜
F
Y XU
F, which is an o.e. of F
Y XU
.
Partial identification: |Θ
I
| > 1 and Θ
I
Θ.
Completely non-identification: |Θ
I
| > 1 and Θ
I
= Θ.
Sharp identification: Θ
I
is constructed by checking all F F.
The characterizations of point identification
F
Y XU
,
˜
F
Y XU
F and M D(F
Y XU
) = MD(
˜
F
Y XU
) = F
Y X
we have Θ(F
Y XU
) = Θ(
˜
F
Y XU
).
Proof. Suppose that |Θ
I
| > 1. Then F,
˜
F F s.t. MD(F ) = MD(
˜
F ) = F
Y X
and Θ(F ) =
Θ(
˜
F ). This leads to contradiction. |Θ
I
| = 1 won’t lead to any contradiction.
|Θ
I
| = 1 if Θ(F
Y XU
) = h(MD(F
Y XU
)) for an known h.
Proof. Suppose that |Θ
I
| > 1. Then F,
˜
F F s.t. MD(F ) = MD(
˜
F ) = F
Y X
. We have
h(MD(F )) = h(MD(
˜
F )) Θ(F ) = Θ(
˜
F ).
Missing data and partial identification
Example 1. Suppose that M D(F
Y Z
) = {P (Y = 1 | z = 0), P (Z = 1), P (Y = 1 | Z = 1} and we want
to identify Θ(F
Y Z
) = P (Y = 1). With no assumptions, we have
Θ
I
= [P (Y = 1 | Z = 1)P (Z = 1), P (Y = 1 | Z = 1)P (Z = 1) + P (Y = 1 | Z = 0)P (Z = 0)]
However, we can impose other assumptions.
Missing at random (strong): P (Y = 1 | Z = 0) = P (Y = 1 | Z = 1). Then we have
Θ
I
= {P (Y = 1 | Z = 1)}
Monotonicity. E.g., 1 P (Y = 1 | Z = 0) P (Y = 1 | Z = 1). Then we have
Θ
I
= [P (Y = 1 | Z = 1), P (Y = 1 | Z = 1)P (Z = 1) + P (Y = 1 | Z = 0)P (Z = 0)]
Example 2 (Blundell et al. 2007 ECMA). Suppose that y represents wages and z = 1 if working. The
inequality is measured by the interquartile range (IQR): Q
y |x
(0.75) Q
y |x
(0.25). However, we only
observe (y
i
, x
i
, z
i
= 1) and (x
i
, z
i
= 0). We can start from the extreme cases where everyone missing
is assumed to have the highest (+) or lowest (0) wage. More assumptions can be imposed later
(e.g., monotonicity: F
y |x,z=1
F
y|x,z=0
,“people tend to work if more productive”) and lead to tighter
bounds.
Falsifiability
Suppose that we include an instrument w for x such that F
y |wx
= F
y |x
. Then we can compare the
identification sets for different ws and check if they are overlapping or not. If they are not, then the
assumption of the instrument might be wrong.
2
2 Identification in Causal Model
Unit level causal model
Potential outcomes: Y
i
(x) = g(x, u
i
), x {x
1
, x
0
}
Counterfactual outcomes: Y
i
(x) for x = x
i
(Y
i
(x
i
) the realised outcome)
Unit level causal effect: Y
i
(x
1
) Y
i
(x
0
)
Population level causal model
F
Y (x
1
)Y (x
0
)
(t) = P (Y (x
1
) Y (x
0
) t)
ATE(x
0
x
1
), E[Y (x
1
) Y (x
0
)]
P (Y (x
1
) > Y (x
0
)): popUlation who benefit
Average marginal effect E[
x
g(x, U)]
Average structure function: ASF(x) = E[Y (x)]
Quantile treatment effect: Q
Y (x
1
)
(t) Q
Y (x
0
)
(t).
This is not the same as Q
Y (x
1
)Y (x
0
)
(t) or E[Y (x
1
) Y (x
0
) | Y (x
0
) = Q
Y (x
0
)
(t)]
The fundamental problem is that unit level causal effects are fully non-identified. But we might be
able to identify population level causal effects. Suppose the F
Y XU
satisfies Y = g(x, u). We have
P (Y y, X x, U u) = P (g(x, u) y, X x, U u)
Identification analysis focuses on g(·) and F
XU
instead of F
Y XU
. Usually we denote x as observable
heterogeneity and U as unobservable heterogeneity. For example, suppose that
x {0, 1} , U
i
= (Y
i
(0), Y
i
(1) Y
i
(0))
Then we have
Y
i
(x) = U
i,1
+ xU
i,2
2.1 Basic Assumptions: Unconfoundedness and Overlap
Theorem 2.1. (Random assigned treatment) Suppose that {Y (x) : x X } X (Unconfoundedness,
U). Then F
Y (x)
is point identified for all x S(X), where S(W ) =
w R
dim(w)
: dF
W
(w) > 0
Proof. F
Y (x)
(t) = P (Y (x) t) = P (Y (x) t | X = x) = P (Y t | X = x). Therefore, F
Y (x)
is
point identified for all x S(X).
Implication: parameters that depend on
F
Y (x)
: x S(X)
X is also point identified
ASF(x) =
R
ydF
Y (x)
(y)
ATE(x x
) =
R
y(dF
Y (x
)
(y) dF
Y (x)
(y))
Theorem 2.2. (Stratified random experiments) Suppose that {Y (x) : x X } X | W . Then F
Y (x)|W
is point identified for all (x, w) S(X, W ).
Proof. F
Y (x)|W
(t) = P (Y (x) t) = P (Y (x) t | X = x, W = w) = P (Y t | X = x, W = w).
Therefore, F
Y (x)|W
is point identified for all (x, w) S(X, W ).
Implication: parameters that depend on
F
Y (x)|W
: (x, w) S(X, W )
X is also point identified
CASF(x) =
R
ydF
Y (x)|W
(y, w)
CATE(x x
, w) =
R
y(dF
Y (x
)|W
(y, w) dF
Y (x)|W
(y, w))
3
Given that
ATE(x x
) =
Z
CATE(x x
, w)dF
W
(w)
to point identify ATE under unconfoundedness assumption, we need (Overlap condition, O)
P (X = 1 | W = w) (0, 1) w S(W ) S(X|W = w) = {0, 1} w S(W )
or rectangular assumption (too strong if continuous X): S(X) = {0, 1} and S(X, W ) = S(X)×S(W ).
Theorem 2.3. Suppose U and O. Then we have (1) F
Y (x)|W
is point identified x X , w S(W )
and (2) F
Y (x)
is point identified.
Proof. F
Y (x)|W
(t) = P (Y (x) t) = P (Y (x) t | X = x, W = w) = P (Y t | X = x, W = w) which
observed for x, w S(X, W ). F
Y (x)
(t) =
R
F
Y (x)|W (t,w)
dF
W
(w).
Proposition 2.4. Suppose U and overlap. Then ATE and CATE(w) are point identified for all
w S(W ).
Proof.
CATE(w) = E[Y (1) Y (0) | W = w]
= E[Y (1) | X = 1, W = w] E[Y (0) | X = 0, W = w]
= E[Y | X = 1, W = w] E[Y | X = 0, W = w]
ATE =
Z
CATE(w)dF
W
(w)
.
Why do we prefer U than conditional mean independent (weaker assumption)?
How to justify CMI is true but not U?
We can learn more if assume U
CMI of what? log Y (x) vs. Y (x)
Theorem 2.5. Suppose U. Then {Y (x) : x X } X | p(W ) where p(w) = P (X = 1 | W = w).
(Propensity Score Matching)
Proof.
P (X = 1 | Y (0), Y (1), p(W ) = p) = E[X | Y (0), Y (1), p(W ) = p]
= E [E[X | Y (0), Y (1), p(W ) = p, W ] | Y (0), Y (1), p(W ) = p]
= E [E[X | Y (0), Y (1), p(W ) = p, W ] | Y (0), Y (1), p(W ) = p]
= E [E[X | p(W ) = p, W ] | Y (0), Y (1), p(W ) = p]
= E[X | p(W ) = p]
.
Implication: under U and O, we can point identify ASF and ATE
ASF(x) = E[Y (x)] = E[E[Y (x) | p(W )]]
Given that
E[Y (x) | P (W ) = p] = E[y | X = x, p(W ) = p]
we need S(X, p(W )) = S(X) × S(p(W )) p(W ) (0, 1)w S(W ).
4
2.2 Accessing Overlap Assumption
Falsification of O: compute P (X = 1 | W = w) from F
Y XW
:
If finite S(W ), check if there’s at least one unit treated in each w S(W )
If infinite S(W ), f
X|W
(x; w) is still point identified but it would hard to distinguish zero prob-
ability and very small probability
What to do if O fails?
Option 1: stick with point identification
Define W
0,1
= {w S(W ) : P (X = 1 | W = w) (0, 1)} and we point identify only the ATE for the
overlapping part ATE
W
0,1
= E[Y (1) Y (0) | w W
0,1
]
Option 2: using partial identification
ATE = ATE
W
0,1
P (W
0,1
) + ATE
W
W
0,1
(1 P (W
0,1
)). Need to construct the bounds of ATE
W
W
0,1
based on external knowledge. Not common.
Option 3: (Imbens and Wooldridge 2009): assume a relationship between ATE
W
0,1
and ATE
W
W
0,1
Example: assume that E[Y (x) | W = w] = q(w)
γ(x) where q is known and γ(x) E
dim(q)
. Then,
we can identify γ(x) by regressing Y on q(w) among X = x (if the first expectation is non-singular):
γ(x) = E[q(w)q(w)
|X = x]E[q(w)y|X = x]
Then we also have
CASF(x, w) = q(w)
γ(x)
CATE(w) = q(w)
[γ(1) γ(0)]
ATE = E[q(w)
][γ(1) γ(0)]
Theorem 2.6. Suppose U and q(w) = (1, w), i.e., E[Y (x) | w] = (1, w)
γ(x). Then ATE is point
identified if V ar(W |X = x) > 0 x {0, 1}.
2.3 Accessing Unconfoundedness Assumption
2.3.1 Accessing unconfoundedness without further assumptions
Option 1: idea of Cornfield et al. (1959)
Theorem 2.7. Let Y, X, and U be binary random variables. Let Y (x, u) denote the potential outcomes,
x, u {0, 1}. Suppose that
r
u
= P (Y = 1 | U = u) p
x
= P (U = 1 | X = x)
r =
P (Y = 1 | X = 1)
P (Y = 1 | X = 0)
We have
p
1
p
0
> r if three assumptions are satisfied:
Latent unconfoundedness: {Y (x, u) : x, u = 0, 1} X | U
No causal effect of X: for all i I and u {0, 1},
Y
i
(1, u) = Y
i
(0, u)
U positively (or negatively) related with both Y and X: r
1
> r
0
and p
1
> p
0
Proof.
r =
P (Y = 1 | X = 1)
P (Y = 1 | X = 0)
=
r
1
p
1
+ r
0
(1 p
1
)
r
1
p
0
+ r
0
(1 p
0
)
p
1
p
0
= r +
r
0
r
1
p
0
((1 p
0
)r (1 p
1
))
5
Given that r > 1 and p
1
> p0, we have
r
0
r
1
p
0
((1 p
0
)r (1 p
1
)) > 0
Application: (1) if
p
1
p
0
> r is not plausible, then the no causal effect assumption might be false; and
(2) if it is plausible, we might want to doubt U assumption ({Y (x) : x X } X | W ).
Similar methods: Oster (2016 JBES)
Option 2: instead of focusing on observable confounders, we can focus on the relationship between
Y (x) and X (Robins, Rotznitzky, and Scharfstein 2000).
Theorem 2.8. Suppose X and Y (x) are discretely distributed. Suppose the joint distribution of (X, Y )
is observed. Suppose
p
x
(y) = P(X = x | Y (x) = y)
is both known and nonzero for all x S(X) and y S(Y ). Also assume p
x
(y) is nonzero for all
y S(Y ). Then P[Y (x) = y] is point identified for all y S(Y ) and all x S(X).
Proof.
P[Y (x) = y] =
P(Y (x) = y, X = x)
P(X = x | Y (x) = y)
= P(Y = y | X = x)
P(X = x)
p
x
(y)
To operationalize, we can parameterize p
x
(y) = p
x
(y; γ) and compute F
Y (x)
(or ATE) for different γ.
Option 3: we can also nonparametrically relax unconfoundedness (Masten and Poirier 2018).
Definition 2.9. Let x {0, 1}. Let c be a scalar in [0, 1]. Say X is c-dependent with Y (x) if
sup
y S[Y (x)]
|P (X = 1 | Y (x) = y) P (X = 1)| c
If c = 0, we have random assignment. If c > 0, we can still partially identify the interested parameters.
2.3.2 Accessing unconfoundedness using external approaches
Option 1: Placebo tests
1. Placebo outcome
Definition 2.10. Placebo exclusion: Y
p
(0) = Y
p
(1)
Theorem 2.11. Suppose placebo exclusion. Then {Y
p
(0), Y
p
(1)} X if and only if Y
p
(x) X.
U X Y
Y
p
The chain of logic requires the placebo outcomes to be affected by the confounder we worried about.
Then we have
Y
p
(x) X Y
p
(x, U) X(U)
X(u) = X(˜u) u = ˜u
Y (x, U ) X(U ) X
Workflow
6
Find a Y
p
and justify placebo exclusion
Check if Y
p
(x) X and conclude {Y
p
(0), Y
p
(1)} X or not
Justify why possible confounder of Y are also the cause of Y
p
Conclude Y (x) X if and only if Y
p
(x) X
Example: parallel trends
Suppose that{X
t
, Y
t
}
t=1,0,1
, X
1
= X
0
= 0, and X = X
1
. We want to assume unconfoundedness,
i.e., (Y
1
Y
0
)(x) X. We can use as the pre-trends, i.e., Y
0
Y
1
, as the placebo. If we have placebo
exclusion, i.e., (Y
0
Y
1
)(0) = (Y
0
Y
1
)(1), then placebo unconfoundedness, i.e., (Y
0
Y
1
)(x) X,
can be used to validate our assumption.
2. Placebo treatment
Theorem 2.12. Suppose that Y (x, x
p
) = Y (x, ˜x
p
) x
p
, ˜x
p
and Y (x) X
p
| X. Then Y X
p
| X
Proof.
P (Y y | X
p
= x
p
, X = x) = P (Y (x, x
p
) y | X
p
= x
p
, X = x)
= P (Y (x) y | X
p
= x
p
, X = x)
= P (Y (x) y | X = x)
= P (Y y | X = x)
U X Y
X
p
We require that the confounder we worried about affect both X and X
p
. Then we have
Y (x) X
p
| X Y (x, U ) X
p
(U) | X(U )
Y (x, u) = Y (x, ˜u) u, ˜u
Y (x, U ) X(U )
Y (x) X
Workflow
Check Y X
p
| X tells Y (x) X
p
| X
Statement about Y (x) X
Option 2: Using other assumptions
Example 1: Monotonicity restriction Y
min
Y (0) Y (1)
Theorem 2.13. Suppose monotone treatment responses (MTR). Then E(Y (x)) [LB(x), UB(x)].
LB(0) = y
min
P (X = 1) + P (Y | X = 0)P (X = 0)
UB(0) = (Y | X = 1)P (X = 1) + P (Y | X = 0)P (X = 0) = E(Y )
The idea is to derive informative bounds on ASF or ATE under MTR and check if the identification
set under U is in that bound.
Example 2: Modeling treatment assignment
Given production function Y
i
(x) = g(x, U) and profit E
i
[Y
i
(x) xv | V
i
= v] = m
i
(x, v).
7
Theorem 2.14. We have X U if the following conditions hold:
Rational expectation E
i
= E
Profit maximization x
i
= 1(m(1, v
i
) m(0, v
i
))
The unobservable part of the production function is independent with the cost V U
Proof. X
i
= g(V
i
) U
i
. Note that Π(x) = E[m(x, v)] is not identified since we cannot ensure X V
Option 3: Alternative identification strategy
Assuming U & IV, we can identify ATE in two ways and check if the identifications are the same.
3 Identification in Classical Linear Model
Theorem 3.1. (β, F
XU
) is point identified given the following assumptions of classical linear model:
A1: Linearity Y = X
β + U
A2: Finite moments E[XY ], E[XX
], E[XU ], and E[U] are finite
A3: Sufficient variation E[XX
] is nonsigular
A4: Exogenous E[U X] = E[U ]E[X]
A5: Normalization X include 1 and E[U] = 0
Accessing linearity by strengthen A4
A4
: E[U |X] = E[U] = 0 together with A2 and A5 implies that [Y |X = x] = x
β (falsifiable)
A4
′′
: U X implies that V ar[Y |X = x] = V ar[U] (falsifiable)
Can we falsify A1
: Y
i
= m(X) + U
i
?
Theorem 3.2. (1) If A1
, A2, and A4
, then m(X) + E[U] is point identified; (2) if A1
, A2, and
A4
′′
, then m(x) + E[U ] is point identified.
Accessing exogeneity
Suppose that Y
i
= β
0
+ Xβ
1
+ W β
2
+ U
i
.
Then
Cov(Y,X)
V ar[X]
= β
1
+
Cov(W,X)
V ar[X]
. Suppose sign restrict bias. Then the causal effect
Cov(Y,X)
V ar[X]
Release A4 and suppose that |Cov(X, U )| ε. Then β
i
h
Cov(Y,X)
V ar[X]
ε
V ar[X]
,
Cov(Y,X)
V ar[X]
+
ε
V ar[X]
i
Heterogeneous treatment effect
Can we assess A1
′′
: Y
i
= X
β
i
+ U
i
? We cannot identify β
i
if using only A1 5.
1. Keep
ˆ
β and interpret plim
ˆ
β under A1
′′
(Very strong assumption) Suppose A1
′′
, A4
, A5, and [β
i
| X = x
i
] = [β
i
]. Then plim
ˆ
β = E(β
i
)
Suppose X {0, 1}. Then regressing Y on X E[X E[X | W ]] yields
Cov(Y, X E[X | W ])
V ar(X E[X | W ])
= E
CATE(W )V ar(X | W )
E[V ar(X | W )]
ˆ
β is (1) the plim of regressing Y on (1, X, g(W )), and (2) a convex combination of CATE.
2. Define Θ and then compute Θ
I
under A1
′′
A5.
4 Identification in Choice Model
Example: suppose Y
i
(x) is the labor force participation under policy x.
“Threshold crossing”: Y
i
(x) = 1(m(x, U
i
) 0)
Employment rate under x: P r(m(x, W, U
i
) 0 | W = w)
8
Common methods
Additive separable: m(x, U ) = g(x) + U
Linear coefficient: g(x) = x
β Y (x) = 1(U x
β)
Theorem 4.1. Suppose threshold crossing and X U . Then ASF(x) = P r(m(x, U) 0) is point
identified x Supp(X)
Proof. ASF(x) = P r(m(x, U) 0) = P r(m(x, U ) 0 | X = x) = E[Y | X = x], which is observed.
How to extrapolate to x not in support by “realistic”? Identify the structural function m(·).
Theorem 4.2. (1) Suppose Y (x) = 1(g(x) + U 0) and U | X N (µ, σ
2
). Then (g, µ, σ) is not
point identified. (2) Suppose Y (x) = 1(g(x) + U 0) and U | X N(0, 1). Then g is point identified.
Random utility model
Suppose that Y
i
(x) = 1(u
1i
(x) u
0i
(x)) and u
j
(x) = v
j
(x) + ε
j
= x
β
j
+ ε
j
.
If ε
1
ε
0
X, then p(x) = P (Y = 1 | X = x) = F
ε
1
ε
0
(x
β
1
x
β
0
)
If F
ε
1
ε
0
is known and strictly increasing, then x
β
1
x
β
0
is point identified as F
1
ε
1
ε
0
(p(x))
If E[XX
] is nonsingular, then β
1
β
0
is point identified.
Common choice
(ε
0
, ε
1
) are joint normal with known µ and σ
(ε
0
, ε
1
) are i.i.d. EVTI. This implies that ε
1
ε
0
is logistically distributed.
IIA: if ε
i
are i.i.d. EVTI and let C
1
C
2
... C
J
= {1, ..., J }, then j, k C
1
, we have
P (y(C
1
) = j)
P (y(C
1
) = k)
=
P (y(C
2
) = j)
P (y(C
2
) = k)
= ... =
P (y(C
J
) = j)
P (y(C
J
) = k)
But this does not apply to nested logit or DDC.
Dynamic discrete choice model
The agent consider not only the current choice problem but also a series of future actions to maximizing
the long-term utility. Specifically, the agent choose (d
1
, d
2
, ...) to maximize the following target
u(x
1
, d
1
) + ε
1
(d
1
) + E
"
X
t=1
β
t
[u(x
t
, d
t
) + ε
t
(d
t
)]
#
We can derive the value function
V
t
(x
t
, ε
t
) max
d
t
,d
t+1
,...
u(x
t
, d
t
) + ε
1
(d
t
) + E
X
t
t
β
t
t
[u(x
t
, d
t
) + ε
t
(d
t
)]
It would be more convenient to focus on the integrated value function (value of being in state x at
time t, prior to realization of ε
t
)
¯
V
t
(x
t
)
Z
V
t
(x
t
, ε
t
)g(ε
t
)
t
The conditional value function (conditional on choosing d
t
) is given by
˙
V
t
(x
t
, d
t
) u(x
t
, d
t
) + β
Z
¯
V
t+1
(x
t+1
)f(x
t+1
| x
t
, d
t
)dx
t+1
Then the choice probability is
P (D
t
= d | X
t
= x) =
Z
1(arg max
˜
d
{
˙
V
t
(x
t
,
˜
d) + ϵ
t
(
˜
d)} = d)g(ε
t
)
t
9
The identification problem is to learn
(u
t
(x, d), F
ε
, f
t
(x
| x, d), β)
from p
t
(d
t
| x). F
ε
and β are usually assumed to be known. f
t
(x
| x, d) is observed from data.
Theorem 4.3. If ε
t
(d) is i.i.d. and follows known distribution G, then
˙
V
t
(x
t
, d
t
)
˙
V
t
(x
t
, 0) is point
identified.
For finite cases, use back induction. For infinite cases, use value iteration.
10