Given that
ATE(x → x
′
) =
Z
CATE(x → x
′
, w)dF
W
(w)
to point identify ATE under unconfoundedness assumption, we need (Overlap condition, O)
P (X = 1 | W = w) ∈ (0, 1) ∀w ∈ S(W ) ⇔ S(X|W = w) = {0, 1} ∀w ∈ S(W )
or rectangular assumption (too strong if continuous X): S(X) = {0, 1} and S(X, W ) = S(X)×S(W ).
Theorem 2.3. Suppose U and O. Then we have (1) F
Y (x)|W
is point identified ∀x ∈ X , w ∈ S(W )
and (2) F
Y (x)
is point identified.
Proof. F
Y (x)|W
(t) = P (Y (x) ≤ t) = P (Y (x) ≤ t | X = x, W = w) = P (Y ≤ t | X = x, W = w) which
observed for x, w ∈ S(X, W ). F
Y (x)
(t) =
R
F
Y (x)|W (t,w)
dF
W
(w).
Proposition 2.4. Suppose U and overlap. Then ATE and CATE(w) are point identified for all
w ∈ S(W ).
Proof.
CATE(w) = E[Y (1) − Y (0) | W = w]
= E[Y (1) | X = 1, W = w] − E[Y (0) | X = 0, W = w]
= E[Y | X = 1, W = w] − E[Y | X = 0, W = w]
ATE =
Z
CATE(w)dF
W
(w)
.
Why do we prefer U than conditional mean independent (weaker assumption)?
• How to justify CMI is true but not U?
• We can learn more if assume U
• CMI of what? log Y (x) vs. Y (x)
Theorem 2.5. Suppose U. Then {Y (x) : x ∈ X } ⊥⊥ X | p(W ) where p(w) = P (X = 1 | W = w).
(Propensity Score Matching)
Proof.
P (X = 1 | Y (0), Y (1), p(W ) = p) = E[X | Y (0), Y (1), p(W ) = p]
= E [E[X | Y (0), Y (1), p(W ) = p, W ] | Y (0), Y (1), p(W ) = p]
= E [E[X | Y (0), Y (1), p(W ) = p, W ] | Y (0), Y (1), p(W ) = p]
= E [E[X | p(W ) = p, W ] | Y (0), Y (1), p(W ) = p]
= E[X | p(W ) = p]
.
Implication: under U and O, we can point identify ASF and ATE
ASF(x) = E[Y (x)] = E[E[Y (x) | p(W )]]
Given that
E[Y (x) | P (W ) = p] = E[y | X = x, p(W ) = p]
we need S(X, p(W )) = S(X) × S(p(W )) ⇔ p(W ) ∈ (0, 1)∀w ∈ S(W ).
4