2 Staggered DiD
2.1 Assumption, identification, and imputation
Settings
• T time periods with no units treated in t = 1.
• First unit is treated at t = q < T . No reversibility.
• For g ∈ {q, ...T }, Y
t
(g) is the outcome if the unit is first subjected to the intervention at time g.
• Y
t
(∞) is the outcome if the unit is never treated in {q, ...T }.
• Treatment effects of primary focus: T E
gt
= Y
t
(g) − Y
t
(∞), g = q, ..., T ; t = g, ..., T .
• Exhaustive and mutually exclusive dummy variables: D
g
= 1 if unit is first subjected to inter-
vention at g ∈ {q, ...T }. D
∞
= 1 −
P
T
g=q
D
g
.
• The goal is to estimate τ
gt
≡ E[Y
t
(g) − Y
t
(∞) | D
g
= 1], g = q, ..., T ; t = g, ..., T .
• The observed outcome is Y
it
= D
i∞
T
it
(∞) +
P
T
g=q
D
ig
Y
it
(g).
• Define post-treatment time dummies by cohort: p
g
t
= f
g
t
+ · · · + f
T
t
; p
g
t
= 1 if t ≥ g.
• Define the time-varying treatment indicator: W
it
=
P
T
g=q
D
ig
p
g
t
.
• Goodman-Bacon (2021) shows that TWFE is generally difficult to interpret with staggered
interventions and heterogenous/time-varying effects.
Assumption 2.1. No anticipation, staggered (NAS). All pre-intervention treatment effects are zero:
E[Y
t
(g) − Y
t
(∞) | D
q
, ..., D
T
] = 1, g = q, ..., T, t = 1, ..., g − 1.
Assumption 2.2. Parallel trends, staggered (PTS). For t = 2, ..., T ,
E[Y
t
(∞) − Y
1
(∞) | D
q
, ..., D
T
] = E[Y
t
(∞) − Y
1
(∞)].
With PTS, we can write
E[Y
1
(∞) | D
q
, ..., D
T
] = α +
T
X
g=q
β
g
D
g
E[Y
t
(∞) | D
q
, ..., D
T
] = α +
T
X
g=q
β
g
D
g
+
T
X
s=2
γ
s
f
s
t
.
Then, the ATTs can be written as
τ
gt
= E[Y
t
(g) | D
g
= 1] − E[Y
t
(∞) | D
g
= 1] = E[Y
t
| D
g
= 1] − (α + β
g
+ γ
t
),
where E[Y
t
| D
g
= 1] is always identified:
¯
Y
gt
=
P
N
i=1
D
ig
Y
it
P
N
i=1
D
ig
p
→ E[Y
t
| D
g
= 1].
Additionally, α, β
g
, g = q, ..., T , γ
t
, t = 2, ..., T , are identified and estimated through the control
observations when we add NAS:
E[Y
t
| D
q
, ..., D
T
, W
t
= 0] = α +
T
X
g=q
β
g
D
g
+
T
X
s=2
γ
s
f
s
t
.
4