• Randomization assumption
Relevant paper: Blake, Nosko, and Tadelis (2015)
E [Y | T = 1] − E [Y | T = 0]
SUTVA & consistency = E [Y (1) | T = 1] − E [Y (0) | T = 0]
= ATT + selection bias
randomization = ATT
Regression for Causal Effect
Assuming that Y (t) = µ(t) + ε
t
(SUTVA) and Y = T Y (1) + (1 − T )Y (0) (consistency), we have
Y = Y (0) + T (Y (1) − Y (0))
= µ
0
+ T (µ
1
− µ
0
) + ε
0
+ T (ε
1
− ε
0
)
= α + βT + ε
To ensure identification and unbiased estimation, we additionally need the rank assumption (overlap
assumption) and independent assumption (randomization assumption).
Covariates
• Bad controls: treatment might affect covariates. Include only pre-treatment covariates. Assum-
ing that Y = T Y (1) + (1 − T )Y (0) and X = T X(1) + (1 − T )X(0), we have
E [Y | X = 1, T = 1] − E [Y | X = 1, T = 0]
= E [Y (1) | X(1) = 1, T = 1] − E [Y (0) | X(0) = 1, T = 0]
= E [Y (1) | X(1) = 1] − E [Y (0) | X(1) = 1] + E [Y (0) | X(1) = 1] − E [Y (0) | X(0) = 1]
= E [Y (1) | X(1) = 1] − E [Y (0) | X(1) = 1] + selection bias
• Heterogeneity: assuming that Y (t) = µ(t, X) + ε
t
, then
Y = µ(0, X) + T (µ(1, X) − µ(0, X)) + ε
= α(X) + β(X) · T + ε
= α(X) + CATE · T + ε
We need to specify structurally the form of heterogeneity. For example:
µ(t, X) = α + βX + t(τ + γ(X −
¯
X))
Then, we have
Y = α + βX + T τ + T γ(X −
¯
X)
= Y = b
0
+ b
1
T + b
2
X + b
3
(X −
¯
X)
• Causal ML
• Use ML (especially deep learning) for β(X)
• We relax function form assumption of β
• Y = α(X) + β(X) · T + ε ≈ α
i
+ ITE
i
· T
i
1.1 Application: profit maximisation
Profit contribution depending on targeting status
π
i
(T ) = π (T, x
i
) =
(
mY (0, x
i
) if T = 0
mY (1, x
i
) − c if T = 1
2