Difference-in-Differences
Zijing Hu
May 29, 2024
Contents
1 DiD with common intervention date 1
1.1 Assumptions, representations, and identification . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pooled OLS and TWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Event study regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Staggered DiD 4
2.1 Assumption, identification, and imputation . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Pooled OLS and Extended TWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Event study regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Violations of assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Nonlinear DiD 7
4 Weighting and matching 8
4.1 IPW, IPWRA, and PSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Synthetic DiD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
*This note is based on the EABCN training school Difference-in-Differences and Event Study Esti-
mators with Panel Data taught by Professor Jeffrey Wooldridge
1 DiD with common intervention date
1.1 Assumptions, representations, and identification
Settings
Time periods: t = 1, . . . , T . Time dummies: f
1
t
, . . . , f
T
t
; exhaustive and mutually exclusive.
Intervention is as time q, 1 < q T .
Treatment is not reversible (for now).
D
i
= 1 if unit i is eventually treated.
p
t
= 1[t q] = f
q
t
+ · · · + f
T
t
is the post treatment time dummy.
For a unit i, the time-varying treatment indicator is W
it
= D
i
· p
t
, t = 1, . . . , T .
Potential outcomes: {Y
t
(0), Y
t
(1)}, t = 1, . . . , T .
Treatment effects: T E
t
= Y
t
(1) Y
t
(0).
We want to estimate ATT: τ
t
E[T E
t
| D = 1], t = q, q + 1, . . . , T .
Assumption 1.1. No anticipation (NA). With D the treatment indicator,
E[Y
t
(1) Y
t
(0) | D = 1] = 0, t {1, . . . , q 1}.
1
We can skip a period or periods leading up to the intervention if units change behavior before the
intervention in ways that affect the outcome.
Assumption 1.2. Parallel trends (PT). The average trends in the control state are the same for the
treated and control groups. The assignment D can depend on the level but not on the trend. For
t {1, . . . , T },
E[Y
t
(1) Y
1
(0) | D] = E[Y
t
(1) Y
1
(0) | D]
or
E[Y
t
(1) Y
t
(0) | D] = E[Y
t
(1) Y
t
(0) | D], t
{1, . . . , q 1}
Theorem 1.3. Under NA and PT, we can identify the ATTs in each of the treated periods:
τ
t
= E[T E
t
| D = 1], t = q, q + 1, . . . , T.
Proof. Let t q, we have
T E
t
= Y
t
(1) Y
t
(0)
=
"
Y
t
(1)
1
q 1
q1
X
s=1
Y
s
(1)
#
"
Y
t
(0)
1
q 1
q1
X
s=1
Y
s
(0)
#
+
1
q 1
q1
X
s=1
[Y
s
(1) Y
s
(0)]
=
˙
Y
t
(1)
˙
Y
t
(0) +
1
q 1
q1
X
s=1
T E
s
.
Under NA,
E
"
1
q 1
q1
X
s=1
T E
s
| D = 1
#
=
1
q 1
q1
X
s=1
E [T E
s
| D = 1] = 0.
Define observed variables
˙
Y
t
= Y
t
1
q 1
q1
X
s=1
Y
s
=
1
q 1
q1
X
s=1
(Y
t
Y
s
).
When D = 1,
˙
Y
t
=
˙
Y
t
(1). Then, E[
˙
Y
t
(1) | D = 1] is identified:
1
N
1
N
X
i=1
D
i
˙
Y
it
=
N
1
N
1
"
1
N
N
X
i=1
D
i
˙
Y
it
#
p
E[
˙
Y
t
(1) | D = 1].
E[
˙
Y
t
(0) | D = 1] is identified by PT:
E[
˙
Y
t
(0) | D = 1] =
1
q 1
q1
X
s=1
E[Y
t
(0) Y
s
(0) | D = 1]
=
1
q 1
q1
X
s=1
E[Y
t
(0) Y
s
(0) | D = 0]
= E[
˙
Y
t
(0) | D = 0].
And we also have
1
N
0
N
X
i=1
(1 D
i
)
˙
Y
it
= E[
˙
Y
t
(0) | D = 0].
2
Hence,
ˆτ
t
=
1
N
1
N
X
i=1
D
i
˙
Y
it
1
N
0
N
X
i=1
(1 D
i
)
˙
Y
it
=
¯
Y
1t
¯
Y
0t
¯
Y
1,pre(q)
¯
Y
0,pre(q)
What about adding covariates?
Assumption 1.4. Conditional parallel trends (CPT). For t = 2, . . . , T and time-constant observ-
ables X,
E[Y
t
(0) Y
1
(0) | D, X] = E[Y
t
(0) Y
1
(0) | X]
Assumption CPT implies that D is unconfounded for
˙
Y
t
(0) conditional on X.
1.2 Pooled OLS and TWFE
Assumption 1.5. Overlap (OV). For all x Support(X), p(x) = Pr(D = 1 | X = x) < 1.
Usually, OV is required. But we can assume linearity instead.
Pooled OLS regression
Y
it
= η + λD
i
+
˙
X
i
κ + D
i
˙
X
i
ϕ +
T
X
s=2
θ
t
f
s
t
+ f
s
t
˙
X
i
π
s
| {z }
control
+ W
it
T
X
s=q
τ
s
f
s
t
+ f
s
t
˙
X
i
ρ
s
| {z }
treatment
+ε
it
.
Here,
˙
X = X E[X | D = 1] and W
t
· f
s
t
= D · f
s
t
.
TWFE is equivalent:
Y
it
= C
i
+ U
it
+
T
X
s=2
θ
t
f
s
t
+ f
s
t
˙
X
i
π
s
| {z }
control
+ W
t
T
X
s=q
τ
s
f
s
t
+ f
s
t
˙
X
i
ρ
s
| {z }
treatment
+ε
it
.
1.3 Event study regression
The most common approach for testing parallel pre-trends in DiD is to run an event-study (ES)
regression (leads and lags estimator). Event study approach (without controls):
Y
it
= η + λD
i
+
T
X
s=2
θ
t
f
s
t
+ D
i
T
X
s=q
τ
s
f
s
t
+ D
i
q2
X
s=1
γ
s
f
s
t
+ ε
it
.
The coefficient on period just before treatment) is normalized to zero. Use cluster-robust Wald statistic
to jointly test coefficients of pre-trends, γ
s
, s = 1, ..., q 2. Under conditional PT, the ES graph should
be based on coefficients ˆγ
1
, ..., ˆγ
q2
, ˆτ
q
, ..., ˆτ
T
from
Y
it
= η + λD
i
+
˙
X
i
κ + D
i
˙
X
i
ϕ +
T
X
s=2
θ
t
f
s
t
+ f
s
t
˙
X
i
π
s
+ D
i
T
X
s=q
τ
s
f
s
t
+ f
s
t
˙
X
i
ρ
s
+ D
i
q2
X
s=1
γ
s
f
s
t
+ ε
it
.
3
2 Staggered DiD
2.1 Assumption, identification, and imputation
Settings
T time periods with no units treated in t = 1.
First unit is treated at t = q < T . No reversibility.
For g {q, ...T }, Y
t
(g) is the outcome if the unit is first subjected to the intervention at time g.
Y
t
() is the outcome if the unit is never treated in {q, ...T }.
Treatment effects of primary focus: T E
gt
= Y
t
(g) Y
t
(), g = q, ..., T ; t = g, ..., T .
Exhaustive and mutually exclusive dummy variables: D
g
= 1 if unit is first subjected to inter-
vention at g {q, ...T }. D
= 1
P
T
g=q
D
g
.
The goal is to estimate τ
gt
E[Y
t
(g) Y
t
() | D
g
= 1], g = q, ..., T ; t = g, ..., T .
The observed outcome is Y
it
= D
i
T
it
() +
P
T
g=q
D
ig
Y
it
(g).
Define post-treatment time dummies by cohort: p
g
t
= f
g
t
+ · · · + f
T
t
; p
g
t
= 1 if t g.
Define the time-varying treatment indicator: W
it
=
P
T
g=q
D
ig
p
g
t
.
Goodman-Bacon (2021) shows that TWFE is generally difficult to interpret with staggered
interventions and heterogenous/time-varying effects.
Assumption 2.1. No anticipation, staggered (NAS). All pre-intervention treatment effects are zero:
E[Y
t
(g) Y
t
() | D
q
, ..., D
T
] = 1, g = q, ..., T, t = 1, ..., g 1.
Assumption 2.2. Parallel trends, staggered (PTS). For t = 2, ..., T ,
E[Y
t
() Y
1
() | D
q
, ..., D
T
] = E[Y
t
() Y
1
()].
With PTS, we can write
E[Y
1
() | D
q
, ..., D
T
] = α +
T
X
g=q
β
g
D
g
E[Y
t
() | D
q
, ..., D
T
] = α +
T
X
g=q
β
g
D
g
+
T
X
s=2
γ
s
f
s
t
.
Then, the ATTs can be written as
τ
gt
= E[Y
t
(g) | D
g
= 1] E[Y
t
() | D
g
= 1] = E[Y
t
| D
g
= 1] (α + β
g
+ γ
t
),
where E[Y
t
| D
g
= 1] is always identified:
¯
Y
gt
=
P
N
i=1
D
ig
Y
it
P
N
i=1
D
ig
p
E[Y
t
| D
g
= 1].
Additionally, α, β
g
, g = q, ..., T , γ
t
, t = 2, ..., T , are identified and estimated through the control
observations when we add NAS:
E[Y
t
| D
q
, ..., D
T
, W
t
= 0] = α +
T
X
g=q
β
g
D
g
+
T
X
s=2
γ
s
f
s
t
.
4
2.2 Pooled OLS and Extended TWFE
Without covariates
Y
it
= α +
T
X
g=q
β
ig
D
ig
+
T
X
s=2
θ
s
f
s
t
+
T
X
g=q
T
X
s=g
τ
gs
D
ig
f
s
t
+ ε
it
.
By replacing (1, D
iq
, ..., D
iT
) with unit dummies (C
i1
, ..., C
iN
), we get TWFE (extended TWFE).
Adding covariates
Assumption 2.3. Conditional no anticipation, staggered (CNAS). For each treatment cohort g
{q, ..., T },
E[Y
t
(g) Y
t
() | D
q
, ..., D
T
, X] = 0
Assumption 2.4. Conditional parallel trends, staggered (CPTS). For t = 2, ..., T and time-constant
controls X,
E[Y
t
() Y
1
() | D
q
, ..., D
T
, X] = E[Y
t
() Y
1
() | X].
Assumption 2.5. Linearity (LIN). For treatment cohort indicators D
g
and control variables X,
E[Y
1
() | D
q
, ..., D
T
, X] = α +
T
X
g=q
β
g
D
g
+ Xκ +
T
X
g=q
D
g
Xξ
g
,
E[Y
t
() | D
q
, ..., D
T
, X] E[Y
1
() | D
q
, ..., D
T
, X] =
T
X
s=2
θ
s
f
s
t
+
T
X
s=2
f
s
t
Xπ
s
.
Hence, we have
Y
it
= α +
T
X
g=q
β
g
D
ig
+ X
i
κ +
T
X
g=q
D
ig
˙
X
ig
ξ
g
+
T
X
s=2
θ
s
f
s
t
+
T
X
s=2
f
s
t
X
i
π
s
+
T
X
g=q
T
X
s=g
τ
gs
D
ig
f
s
t
+
T
X
g=q
T
X
s=g
D
ig
f
s
t
˙
X
ig
ρ
gs
+ ε
it
.
where
˙
X
ig
= X
i
¯
X
g
.
We can average over τ
gt
to get common dynamic effects for treatment cohort g with exposure time e:
τ
e
=
P
T e
g=q
N
g
τ
g,g+e
P
T e
g=q
N
g
.
2.3 Event study regressions
Without covariates
Y
it
= α +
T
X
g=q
β
ig
D
ig
+
T
X
s=2
θ
s
f
s
t
+
T
X
g=q
T
X
s=g
τ
gs
D
ig
f
s
t
+
T
X
g=q
g2
X
s=1
γ
gs
D
ig
f
s
t
+ ε
it
.
PT implies that γ
gs
should be statistically zero.
5
With covariates
Y
it
= α +
T
X
g=q
β
g
D
ig
+ X
i
κ +
T
X
g=q
D
ig
˙
X
ig
ξ
g
+
T
X
s=2
θ
s
f
s
t
+
T
X
s=2
f
s
t
X
i
π
s
+
T
X
g=q
T
X
s=g
τ
gs
D
ig
f
s
t
+
T
X
g=q
T
X
s=g
D
ig
f
s
t
˙
X
ig
ρ
gs
+
T
X
g=q
g2
X
s=1
γ
gs
D
ig
f
s
t
+ ε
it
.
Callaway and Sant’Anna (2021) require to add additionally the pre-treatment interactions with the
covariates. For each cohort, we could create an ES plot. We can also weight the
ˆ
θ
gs
and ˆτ
gs
by the
cohort shares to create a single ES plot (using time since treatment/exposure).
2.4 Violations of assumptions
Heterogenous trends: PTS is violated. We need to slightly modify the POLS estimator:
Y
it
= α +
T
X
g=q
β
ig
D
ig
+
T
X
s=2
θ
s
f
s
t
+
T
X
g=q
T
X
s=g
τ
gs
D
ig
f
s
t
+
T
X
g=q
δ
ig
D
ig
t + ε
it
.
General case:
Y
it
= α +
T
X
g=q
β
g
D
ig
+ X
i
κ +
T
X
g=q
D
ig
˙
X
ig
ξ
g
+
T
X
s=2
θ
s
f
s
t
+
T
X
s=2
f
s
t
X
i
π
s
+
T
X
g=q
T
X
s=g
τ
gs
D
ig
f
s
t
+
T
X
g=q
T
X
s=g
D
ig
f
s
t
˙
X
ig
ρ
gs
+
T
X
g=q
δ
ig
D
ig
t +
T
X
g=q
D
ig
tX
i
ψ
gs
+ ε
it
.
Testing no anticipation: drop the last time period and use TWFE. Conduct cluster-robust t
statistic on η.
Y
it
= C
i
+ ηW
i,t+1
+
T 1
X
s=2
θ
s
f
s
t
+
T 1
X
s=2
f
s
t
X
i
π
t
+
T 1
X
g=q
T 1
X
s=g
τ
gs
W
it
D
ig
f
s
t
+
T 1
X
g=q
T 1
X
s=g
W
it
D
ig
f
s
t
˙
X
ig
ρ
gs
+ ε
it
.
If NAS is violated:
Skip one period (or more) just prior to intervention.
Use external instruments for the intervention.
Exit/reversibility: we can alternatively index cohorts by entry and exit time and the potential
outcomes are now Y
t
(g, h), where first treated period is g and exit occurs in h. D
gh
, g < h T , are
the new cohort indicators. Define Y
t
() as the never treated state. The ATTs are
τ
ght
= E[Y
t
(g, h) Y
t
() | D
gh
= 1], t = g, ..., T
Here, ATTs are defined even when t h that is, after the intervention has been removed. This
helps examine persistence even after program is eliminated. To estimate it, just simply replace D
g
with D
g,h
, g < h.
Time-varying controls: we need to explicitly assume covariates do not react to treatment assign-
ment X
t
(g) = X
t
() = X
t
, g {q, . . . , T }. We also have to assume the {X
it
: t = 1, . . . , T } do not
react to the shocks ε
it
. To obtain ATTs ˆτ
gt
as regression coefficients:
6
1. Define treatment indicators W
itgr
D
ig
· f
r
t
2. Demean the covariates by cohort/calendar time
˙
X
itgr
X
it
N
1
g
P
N
i=1
D
ig
f
r
t
X
ht
3. Use a slightly modified POLS regression:
Y
it
on 1, W
itqq
, . . . , W
itqT
, W
it,q+1,q+1
, . . . , W
it,q+1,T
, . . . , W
itT T
W
itqq
·
˙
X
itqq
, . . . , W
itqT
·
˙
X
itqT
, W
it,q+1,q+1
·
˙
X
it,q+1,q+1
, . . . ,
W
it,q+1 ,T
·
˙
X
it,q+1,T
, . . . , W
itT T
·
˙
X
it,T,T
D
iq
, . . . , D
iT
, X
it
, D
iq
· X
it
, . . . , D
iT
, D
iT
· X
it
,
f
2
t
, . . . , f
T
t
, f
2
t
· X
it
, . . . , f
T
t
· X
it
3 Nonlinear DiD
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data.
The Econometrics Journal, 26(3), C31-C66.
Assumption 3.1. Conditional index PT, staggered (CIPTS). For t = 1, ..., T ,
E[Y
t
() | D
q
, ..., D
T
, X] = G
α
T
X
g=q
β
g
D
g
+ Xκ +
T
X
g=q
D
g
Xη
g
+ γ
t
+ Xπ
t
!
,
with the normalizations γ
1
0 and π
1
0. Here, G(·) is a known, strictly increasing, continuously
differentiable function.
With CIPTS, the ATTs can be written as
τ
gt
= E (Y
t
| D
g
= 1) E
G
α + β
g
+ γ
t
+ X
κ + η
g
+ π
t

| D
g
= 1
,
which shows the τ
gt
are identified if the parameters in the linear index are identified.
For the estimation of G(·), we can use quasi-MLE (QMLE) for linear exponential families (LEF).
Imputation estimation
1. For the chosen function G(·), use the W
it
= 0 observations to estimate the parameters
α, β
q
, . . . , β
T
, κ, η
q
, . . . , η
T
, γ
2
, . . . , γ
T
, π
2
, . . . , π
T
by pooled QMLE in the LEF. The explanatory variables are:
1, D
iq
, . . . , D
iT
, X
i
, D
iq
· X
i
, . . . , D
iT
· X
i
f
2
t
, . . . , f
T
t
, f
2
t
· X
i
, . . . , f
T
t
· X
i
2. For cohort g {q, . . . , T }, impute Y
it
() for W
it
= 1 :
ˆ
Y
igt
() G
ˆα +
ˆ
β
g
+ X
i
ˆκ + X
i
ˆ
η
g
+ ˆγ
t
+ X
i
ˆ
π
t
, t = g, . . . , T.
3. For t = g, . . . , T , obtain the imputation estimator of τ
gt
:
ˆτ
gt
= N
1
g
N
X
i=1
D
ig
h
Y
it
ˆ
Y
igt
()
i
=
¯
Y
gt
N
1
g
N
X
i=1
D
ig
G
ˆα +
ˆ
β
g
+ X
i
ˆ
κ + X
i
ˆ
η
g
+ ˆγ
t
+ X
i
ˆ
π
t
.
7
Pooled estimation
1. Using all of the data, apply pooled QMLE in the LEF to estimate
(α, β
q
, . . . , β
T
, κ, η
q
, . . . , η
T
, γ
2
, . . . , γ
T
, π
2
, . . . , π
T
,
δ
qq
, δ
q,q+1
, . . . , δ
qT
, δ
q+1,q+1
, . . . , δ
q+1,T
, . . . , δ
T T
ξ
qq
, ξ
q,q+1
, . . . , ξ
qT
, ξ
q+1,q+1
, . . . , ξ
q+1,T
, . . . , ξ
T T
) .
from
E (Y
it
| D
iq
, . . . , D
iT
, X
i
, W
i
) =G
"
α +
T
X
g=q
β
g
D
ig
+ X
i
κ +
T
X
g=q
(D
ig
· X
i
) η
g
+
T
X
s=2
γ
s
fs
t
+
T
X
s=2
(fs
t
· X
i
) π
s
+
T
X
g=q
T
X
s=g
δ
gs
W
itgs
+
T
X
g=q
T
X
s=g
W
itgs
·
˙
X
ig
ξ
gs
#
.
where now
˙
X
ig
= X
i
¯
X
g
are centred around cohort sample averages.
2. For ˆτ
gt
, obtain the average partial effect with respect to the binary variable W
t
, evaluated at
D
g
= 1, f r
t
= 1, and all other cohort and time dummies set to zero. Average across the
subsample with D
ig
= 1 to get
ˆτ
gt
=N
1
g
N
X
i=1
D
ig
h
G
ˆα +
ˆ
β
g
+ X
i
ˆκ + X
i
ˆη
g
+ ˆγ
t
+ X
i
ˆπ
t
+
ˆ
δ
gt
+
˙
X
ig
ξ
gr
G
ˆα +
ˆ
β
g
+ X
i
ˆκ + X
i
ˆη
g
+ ˆγ
t
+ X
i
ˆπ
t
i
.
4 Weighting and matching
4.1 IPW, IPWRA, and PSM
Lee, S. J., & Wooldridge, J. M. (2023). A Simple Transformation Approach to Difference-in-Differences
Estimation for Panel Data. Available at SSRN 4516518.
Rolling methods, common timing: assuming NA, CPT, and OV
1. For each unit i and a given t q, construct
˙
Y
it
= Y
it
1
(q 1)
q1
X
s=1
Y
is
If the focus is on a constant effect, we can replace
˙
Y
it
by
¯
Y
i
:
¯
Y
i
=
¯
Y
i,post
¯
Y
i,pre
=
1
(T q + 1)
T
X
s=q
Y
is
1
(q 1)
q1
X
s=1
Y
is
2. For any treatment period t q, apply standard TE methods to
n
˙
Y
it
, D
i
, X
i
: i = 1, . . . , N
o
The method could incorporate IPW, IPWRA (doubly robust), PSM, or machine learning.
8
3. Event study (long differencing) methods*: construct
˜
Y
it
Y
it
Y
i,q1
and apply IPW, IPWRA
(doubly robust), PSM, or machine learning to
n
˜
Y
it
, D
i
, X
i
: i = 1, . . . , N
o
.
Rolling methods, staggered interventions: assuming CNAS, CPTS, and OV
1. For a given g {q, . . . , T } and time period t {g, g + 1, . . . , T }, compute
˙
Y
itg
Y
it
1
(g 1)
g1
X
s=1
Y
is
= Y
it
¯
Y
i,pre(g)
2. Choose as the control group the units with D
i,t+1
+ D
i,t+2
+ · · · + D
iT
+ D
i
= 1 (or, if desired,
a subset, such as only the NT group).
3. Using the subset of data with
D
ig
+ D
i,t+1
+ D
i,t+2
+ · · · + D
iT
+ D
i
= 1,
apply standard TE methods - such as linear RA, IPW, IPWRA, matching - to the cross section
n
˙
Y
itg
, D
ig
, X
i
: i = 1, . . . , N
o
,
with D
ig
acting as the treatment indicator.
4.2 Synthetic DiD
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic
difference-in-differences. American Economic Review, 111(12), 4088-4118.
Let the first N
0
units be controls, N
1
treated, N = N
0
+ N
1
.
Treatment starting in period T
0
+ 1.
The TWFE estimator (basic DiD) of the the ATT solves
min
τ,µ,α
i
t
N
X
i=1
T
X
t=1
(Y
it
µ τW
it
α
i
β
t
)
2
Arkhangelsky et al. (2021): SC can be characterized as
min
τ,µ,β
t
N
X
i=1
T
X
t=1
ˆω
sc
i
(Y
it
τW
it
µ β
t
)
2
where they omit the unit fixed effects, α
i
, and use unit-specific weights, ˆω
sc
i
.
Arkhangelsky et al. (2021) propose an extension/synthesis and call it synthetic DiD:
min
τ,µ,α
i
t
N
X
i=1
T
X
t=1
ˆω
sdid
i
ˆ
λ
sdid
t
(Y
it
µ τW
it
α
i
β
t
)
2
Procedure:
1. Choosing the weights to ensure proper ”balance”:
ˆω
sdid
i
chosen to make
N
1
0
N
0
X
i=1
ˆω
sdid
i
Y
it
N
1
1
N
X
i=N
0
+1
Y
it
, t = 1, . . . , T
0
(control periods)
9
ˆ
λ
sdid
t
chosen to make
T
1
0
T
0
X
t=1
ˆ
λ
sdid
t
Y
it
T
1
1
T
X
t=T
0
+1
ˆ
λ
sdid
t
Y
it
, i = 1, . . . , N
0
(control units)
Need to use a regularization scheme so weights are not too variable.
2. Inference
Asymptotics assumes T
0
, T
1
both getting large.
Need to assume time series are weakly dependent.
Assumes N
0
large.
10