Diﬀerence-in-Diﬀerences - Zijing "Jimmy" Hu

Diﬀerence-in-Diﬀerences

Zijing Hu

May 29, 2024

Contents

1 DiD with common intervention date 1

1.1 Assumptions, representations, and identiﬁcation . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Pooled OLS and TWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Event study regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Staggered DiD 4

2.1 Assumption, identiﬁcation, and imputation . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Pooled OLS and Extended TWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Event study regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Violations of assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Nonlinear DiD 7

4 Weighting and matching 8

4.1 IPW, IPWRA, and PSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.2 Synthetic DiD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

*This note is based on the EABCN training school Diﬀerence-in-Diﬀerences and Event Study Esti-

mators with Panel Data taught by Professor Jeﬀrey Wooldridge

1 DiD with common intervention date

1.1 Assumptions, representations, and identiﬁcation

Settings

• Time periods: t = 1, . . . , T . Time dummies: f

1

t

, . . . , f

T

t

; exhaustive and mutually exclusive.

• Intervention is as time q, 1 < q ≤ T .

• Treatment is not reversible (for now).

• D

i

= 1 if unit i is eventually treated.

• p

t

= 1[t ≥ q] = f

q

t

+ · · · + f

T

t

is the post treatment time dummy.

• For a unit i, the time-varying treatment indicator is W

it

= D

i

· p

t

, t = 1, . . . , T .

• Potential outcomes: {Y

t

(0), Y

t

(1)}, t = 1, . . . , T .

• Treatment eﬀects: T E

t

= Y

t

(1) − Y

t

(0).

• We want to estimate ATT: τ

t

≡ E[T E

t

| D = 1], t = q, q + 1, . . . , T .

Assumption 1.1. No anticipation (NA). With D the treatment indicator,

E[Y

t

(1) − Y

t

(0) | D = 1] = 0, t ∈ {1, . . . , q − 1}.

1

We can skip a period or periods leading up to the intervention if units change behavior before the

intervention in ways that aﬀect the outcome.

Assumption 1.2. Parallel trends (PT). The average trends in the control state are the same for the

treated and control groups. The assignment D can depend on the level but not on the trend. For

t ∈ {1, . . . , T },

E[Y

t

(1) − Y

1

(0) | D] = E[Y

t

(1) − Y

1

(0) | D]

or

E[Y

t

(1) − Y

t

′

(0) | D] = E[Y

t

(1) − Y

t

′

(0) | D], t

′

∈ {1, . . . , q − 1}

Theorem 1.3. Under NA and PT, we can identify the ATTs in each of the treated periods:

τ

t

= E[T E

t

| D = 1], t = q, q + 1, . . . , T.

Proof. Let t ≥ q, we have

T E

t

= Y

t

(1) − Y

t

(0)

=

"

Y

t

(1) −

1

q − 1

q−1

X

s=1

Y

s

(1)

#

−

"

Y

t

(0) −

1

q − 1

q−1

X

s=1

Y

s

(0)

#

+

1

q − 1

q−1

X

s=1

[Y

s

(1) − Y

s

(0)]

=

˙

Y

t

(1) −

˙

Y

t

(0) +

1

q − 1

q−1

X

s=1

T E

s

.

Under NA,

E

"

1

q − 1

q−1

X

s=1

T E

s

| D = 1

#

=

1

q − 1

q−1

X

s=1

E [T E

s

| D = 1] = 0.

Deﬁne observed variables

˙

Y

t

= Y

t

−

1

q − 1

q−1

X

s=1

Y

s

=

1

q − 1

q−1

X

s=1

(Y

t

− Y

s

).

When D = 1,

˙

Y

t

=

˙

Y

t

(1). Then, E[

˙

Y

t

(1) | D = 1] is identiﬁed:

1

N

1

N

X

i=1

D

i

˙

Y

it

=



N

1

N



−1

"

1

N

X

i=1

D

i

˙

Y

it

#

p

→ E[

˙

Y

t

(1) | D = 1].

E[

˙

Y

t

(0) | D = 1] is identiﬁed by PT:

E[

˙

Y

t

(0) | D = 1] =

1

q − 1

q−1

X

s=1

E[Y

t

(0) − Y

s

(0) | D = 1]

=

1

q − 1

q−1

X

s=1

E[Y

t

(0) − Y

s

(0) | D = 0]

= E[

˙

Y

t

(0) | D = 0].

And we also have

1

N

0

N

X

i=1

(1 − D

i

)

˙

Y

it

= E[

˙

Y

t

(0) | D = 0].

2

Hence,

ˆτ

t

=

1

N

1

N

X

i=1

D

i

˙

Y

it

−

1

N

0

N

X

i=1

(1 − D

i

)

˙

Y

it

=



¯

Y

1t

−

¯

Y

0t



−



¯

Y

1,pre(q)

−

¯

Y

0,pre(q)



What about adding covariates?

Assumption 1.4. Conditional parallel trends (CPT). For t = 2, . . . , T and time-constant observ-

ables X,

E[Y

t

(0) − Y

1

(0) | D, X] = E[Y

t

(0) − Y

1

(0) | X]

Assumption CPT implies that D is unconfounded for

˙

Y

t

(0) conditional on X.

1.2 Pooled OLS and TWFE

Assumption 1.5. Overlap (OV). For all x ∈ Support(X), p(x) = Pr(D = 1 | X = x) < 1.

Usually, OV is required. But we can assume linearity instead.

Pooled OLS regression

Y

it

= η + λD

i

+

˙

X

i

κ + D

i

˙

X

i

ϕ +

T

X

s=2



θ

t

f

s

t

+ f

s

t

˙

X

i

π

s



| {z }

control

+ W

it

T

X

s=q



τ

s

f

s

t

+ f

s

t

˙

X

i

ρ

s



| {z }

treatment

+ε

it

.

Here,

˙

X = X − E[X | D = 1] and W

t

· f

s

t

= D · f

s

t

.

TWFE is equivalent:

Y

it

= C

i

+ U

it

+

T

X

s=2



θ

t

f

s

t

+ f

s

t

˙

X

i

π

s



| {z }

control

+ W

t

T

X

s=q



τ

s

f

s

t

+ f

s

t

˙

X

i

ρ

s



| {z }

treatment

+ε

it

.

1.3 Event study regression

The most common approach for testing parallel pre-trends in DiD is to run an event-study (ES)

regression (leads and lags estimator). Event study approach (without controls):

Y

it

= η + λD

i

+

T

X

s=2

θ

t

f

s

t

+ D

i

T

X

s=q

τ

s

f

s

t

+ D

i

q−2

X

s=1

γ

s

f

s

t

+ ε

it

.

The coeﬃcient on period just before treatment) is normalized to zero. Use cluster-robust Wald statistic

to jointly test coeﬃcients of pre-trends, γ

s

, s = 1, ..., q −2. Under conditional PT, the ES graph should

be based on coeﬃcients ˆγ

1

, ..., ˆγ

q−2

, ˆτ

q

, ..., ˆτ

T

from

Y

it

= η + λD

i

+

˙

X

i

κ + D

i

˙

X

i

ϕ +

T

X

s=2



θ

t

f

s

t

+ f

s

t

˙

X

i

π

s



+ D

i

T

X

s=q



τ

s

f

s

t

+ f

s

t

˙

X

i

ρ

s



+ D

i

q−2

X

s=1

γ

s

f

s

t

+ ε

it

.

3

2 Staggered DiD

2.1 Assumption, identiﬁcation, and imputation

Settings

• T time periods with no units treated in t = 1.

• First unit is treated at t = q < T . No reversibility.

• For g ∈ {q, ...T }, Y

t

(g) is the outcome if the unit is ﬁrst subjected to the intervention at time g.

• Y

t

(∞) is the outcome if the unit is never treated in {q, ...T }.

• Treatment eﬀects of primary focus: T E

gt

= Y

t

(g) − Y

t

(∞), g = q, ..., T ; t = g, ..., T .

• Exhaustive and mutually exclusive dummy variables: D

g

= 1 if unit is ﬁrst subjected to inter-

vention at g ∈ {q, ...T }. D

∞

= 1 −

P

T

g=q

D

g

.

• The goal is to estimate τ

gt

≡ E[Y

t

(g) − Y

t

(∞) | D

g

= 1], g = q, ..., T ; t = g, ..., T .

• The observed outcome is Y

it

= D

i∞

T

it

(∞) +

P

T

g=q

D

ig

Y

it

(g).

• Deﬁne post-treatment time dummies by cohort: p

g

t

= f

g

t

+ · · · + f

T

t

; p

g

t

= 1 if t ≥ g.

• Deﬁne the time-varying treatment indicator: W

it

=

P

T

g=q

D

ig

p

g

t

.

• Goodman-Bacon (2021) shows that TWFE is generally diﬃcult to interpret with staggered

interventions and heterogenous/time-varying eﬀects.

Assumption 2.1. No anticipation, staggered (NAS). All pre-intervention treatment eﬀects are zero:

E[Y

t

(g) − Y

t

(∞) | D

q

, ..., D

T

] = 1, g = q, ..., T, t = 1, ..., g − 1.

Assumption 2.2. Parallel trends, staggered (PTS). For t = 2, ..., T ,

E[Y

t

(∞) − Y

1

(∞) | D

q

, ..., D

T

] = E[Y

t

(∞) − Y

1

(∞)].

With PTS, we can write

E[Y

1

(∞) | D

q

, ..., D

T

] = α +

T

X

g=q

β

g

D

g

E[Y

t

(∞) | D

q

, ..., D

T

] = α +

T

X

g=q

β

g

D

g

+

T

X

s=2

γ

s

f

s

t

.

Then, the ATTs can be written as

τ

gt

= E[Y

t

(g) | D

g

= 1] − E[Y

t

(∞) | D

g

= 1] = E[Y

t

| D

g

= 1] − (α + β

g

+ γ

t

),

where E[Y

t

| D

g

= 1] is always identiﬁed:

¯

Y

gt

=

P

N

i=1

D

ig

Y

it

P

N

i=1

D

ig

p

→ E[Y

t

| D

g

= 1].

Additionally, α, β

g

, g = q, ..., T , γ

t

, t = 2, ..., T , are identiﬁed and estimated through the control

observations when we add NAS:

E[Y

t

| D

q

, ..., D

T

, W

t

= 0] = α +

T

X

g=q

β

g

D

g

+

T

X

s=2

γ

s

f

s

t

.

4

2.2 Pooled OLS and Extended TWFE

Without covariates

Y

it

= α +

T

X

g=q

β

ig

D

ig

+

T

X

s=2

θ

s

f

s

t

+

T

X

g=q

T

X

s=g

τ

gs

D

ig

f

s

t

+ ε

it

.

By replacing (1, D

iq

, ..., D

iT

) with unit dummies (C

i1

, ..., C

iN

), we get TWFE (extended TWFE).

Adding covariates

Assumption 2.3. Conditional no anticipation, staggered (CNAS). For each treatment cohort g ∈

{q, ..., T },

E[Y

t

(g) − Y

t

(∞) | D

q

, ..., D

T

, X] = 0

Assumption 2.4. Conditional parallel trends, staggered (CPTS). For t = 2, ..., T and time-constant

controls X,

E[Y

t

(∞) − Y

1

(∞) | D

q

, ..., D

T

, X] = E[Y

t

(∞) − Y

1

(∞) | X].

Assumption 2.5. Linearity (LIN). For treatment cohort indicators D

g

and control variables X,

E[Y

1

(∞) | D

q

, ..., D

T

, X] = α +

T

X

g=q

β

g

D

g

+ Xκ +

T

X

g=q

D

g

Xξ

g

,

E[Y

t

(∞) | D

q

, ..., D

T

, X] − E[Y

1

(∞) | D

q

, ..., D

T

, X] =

T

X

s=2

θ

s

f

s

t

+

T

X

s=2

f

s

t

Xπ

s

.

Hence, we have

Y

it

= α +

T

X

g=q

β

g

D

ig

+ X

i

κ +

T

X

g=q

D

ig

˙

X

ig

ξ

g

+

T

X

s=2

θ

s

f

s

t

+

T

X

s=2

f

s

t

X

i

π

s

+

T

X

g=q

T

X

s=g

τ

gs

D

ig

f

s

t

+

T

X

g=q

T

X

s=g

D

ig

f

s

t

˙

X

ig

ρ

gs

+ ε

it

.

where

˙

X

ig

= X

i

−

¯

X

g

.

We can average over τ

gt

to get common dynamic eﬀects for treatment cohort g with exposure time e:

τ

e

=

P

T −e

g=q

N

g

τ

g,g+e

P

T −e

g=q

N

g

.

2.3 Event study regressions

Without covariates

Y

it

= α +

T

X

g=q

β

ig

D

ig

+

T

X

s=2

θ

s

f

s

t

+

T

X

g=q

T

X

s=g

τ

gs

D

ig

f

s

t

+

T

X

g=q

g−2

X

s=1

γ

gs

D

ig

f

s

t

+ ε

it

.

PT implies that γ

gs

should be statistically zero.

5

With covariates

Y

it

= α +

T

X

g=q

β

g

D

ig

+ X

i

κ +

T

X

g=q

D

ig

˙

X

ig

ξ

g

+

T

X

s=2

θ

s

f

s

t

+

T

X

s=2

f

s

t

X

i

π

s

+

T

X

g=q

T

X

s=g

τ

gs

D

ig

f

s

t

+

T

X

g=q

T

X

s=g

D

ig

f

s

t

˙

X

ig

ρ

gs

+

T

X

g=q

g−2

X

s=1

γ

gs

D

ig

f

s

t

+ ε

it

.

Callaway and Sant’Anna (2021) require to add additionally the pre-treatment interactions with the

covariates. For each cohort, we could create an ES plot. We can also weight the

ˆ

θ

gs

and ˆτ

gs

by the

cohort shares to create a single ES plot (using time since treatment/exposure).

2.4 Violations of assumptions

Heterogenous trends: PTS is violated. We need to slightly modify the POLS estimator:

Y

it

= α +

T

X

g=q

β

ig

D

ig

+

T

X

s=2

θ

s

f

s

t

+

T

X

g=q

T

X

s=g

τ

gs

D

ig

f

s

t

+

T

X

g=q

δ

ig

D

ig

t + ε

it

.

General case:

Y

it

= α +

T

X

g=q

β

g

D

ig

+ X

i

κ +

T

X

g=q

D

ig

˙

X

ig

ξ

g

+

T

X

s=2

θ

s

f

s

t

+

T

X

s=2

f

s

t

X

i

π

s

+

T

X

g=q

T

X

s=g

τ

gs

D

ig

f

s

t

+

T

X

g=q

T

X

s=g

D

ig

f

s

t

˙

X

ig

ρ

gs

+

T

X

g=q

δ

ig

D

ig

t +

T

X

g=q

D

ig

tX

i

ψ

gs

+ ε

it

.

Testing no anticipation: drop the last time period and use TWFE. Conduct cluster-robust t

statistic on η.

Y

it

= C

i

+ ηW

i,t+1

+

T −1

X

s=2

θ

s

f

s

t

+

T −1

X

s=2

f

s

t

X

i

π

t

+

T −1

X

g=q

T −1

X

s=g

τ

gs

W

it

D

ig

f

s

t

+

T −1

X

g=q

T −1

X

s=g

W

it

D

ig

f

s

t

˙

X

ig

ρ

gs

+ ε

it

.

If NAS is violated:

• Skip one period (or more) just prior to intervention.

• Use external instruments for the intervention.

Exit/reversibility: we can alternatively index cohorts by entry and exit time and the potential

outcomes are now Y

t

(g, h), where ﬁrst treated period is g and exit occurs in h. D

gh

, g < h ≤ T , are

the new cohort indicators. Deﬁne Y

t

(∞) as the never treated state. The ATTs are

τ

ght

= E[Y

t

(g, h) − Y

t

(∞) | D

gh

= 1], t = g, ..., T

Here, ATTs are deﬁned even when t ≥ h – that is, after the intervention has been removed. This

helps examine persistence even after program is eliminated. To estimate it, just simply replace D

g

with D

g,h

, g < h.

Time-varying controls: we need to explicitly assume covariates do not react to treatment assign-

ment X

t

(g) = X

t

(∞) = X

t

, g ∈ {q, . . . , T }. We also have to assume the {X

it

: t = 1, . . . , T } do not

react to the shocks ε

it

. To obtain ATTs ˆτ

gt

as regression coeﬃcients:

6

1. Deﬁne treatment indicators W

itgr

≡ D

ig

· f

r

t

2. Demean the covariates by cohort/calendar time

˙

X

itgr

≡ X

it

− N

−1

g

P

N

i=1

D

ig

f

r

t

X

ht

3. Use a slightly modiﬁed POLS regression:

Y

it

on 1, W

itqq

, . . . , W

itqT

, W

it,q+1,q+1

, . . . , W

it,q+1,T

, . . . , W

itT T

W

itqq

·

˙

X

itqq

, . . . , W

itqT

·

˙

X

itqT

, W

it,q+1,q+1

·

˙

X

it,q+1,q+1

, . . . ,

W

it,q+1 ,T

·

˙

X

it,q+1,T

, . . . , W

itT T

·

˙

X

it,T,T

D

iq

, . . . , D

iT

, X

it

, D

iq

· X

it

, . . . , D

iT

, D

iT

· X

it

,

f

2

t

, . . . , f

T

t

, f

2

t

· X

it

, . . . , f

T

t

· X

it

3 Nonlinear DiD

Wooldridge, J. M. (2023). Simple approaches to nonlinear diﬀerence-in-diﬀerences with panel data.

The Econometrics Journal, 26(3), C31-C66.

Assumption 3.1. Conditional index PT, staggered (CIPTS). For t = 1, ..., T ,

E[Y

t

(∞) | D

q

, ..., D

T

, X] = G

α

T

X

g=q

β

g

D

g

+ Xκ +

T

X

g=q

D

g

Xη

g

+ γ

t

+ Xπ

t

!

,

with the normalizations γ

1

≡ 0 and π

1

≡ 0. Here, G(·) is a known, strictly increasing, continuously

diﬀerentiable function.

With CIPTS, the ATTs can be written as

τ

gt

= E (Y

t

| D

g

= 1) − E



G



α + β

g

+ γ

t

+ X



κ + η

g

+ π

t



| D

g

= 1



,

which shows the τ

gt

are identiﬁed if the parameters in the linear index are identiﬁed.

For the estimation of G(·), we can use quasi-MLE (QMLE) for linear exponential families (LEF).

Imputation estimation

1. For the chosen function G(·), use the W

it

= 0 observations to estimate the parameters



α, β

q

, . . . , β

T

, κ, η

q

, . . . , η

T

, γ

2

, . . . , γ

T

, π

2

, . . . , π

T



by pooled QMLE in the LEF. The explanatory variables are:

1, D

iq

, . . . , D

iT

, X

i

, D

iq

· X

i

, . . . , D

iT

· X

i

f

2

t

, . . . , f

T

t

, f

2

t

· X

i

, . . . , f

T

t

· X

i

2. For cohort g ∈ {q, . . . , T }, impute Y

it

(∞) for W

it

= 1 :

ˆ

Y

igt

(∞) ≡ G



ˆα +

ˆ

β

g

+ X

i

ˆκ + X

i

ˆ

η

g

+ ˆγ

t

+ X

i

ˆ

π

t



, t = g, . . . , T.

3. For t = g, . . . , T , obtain the imputation estimator of τ

gt

:

ˆτ

gt

= N

−1

g

N

X

i=1

D

ig

h

Y

it

−

ˆ

Y

igt

(∞)

i

=

¯

Y

gt

− N

−1

g

N

X

i=1

D

ig

G



ˆα +

ˆ

β

g

+ X

i

ˆ

κ + X

i

ˆ

η

g

+ ˆγ

t

+ X

i

ˆ

π

t



.

7

Pooled estimation

1. Using all of the data, apply pooled QMLE in the LEF to estimate

(α, β

q

, . . . , β

T

, κ, η

q

, . . . , η

T

, γ

2

, . . . , γ

T

, π

2

, . . . , π

T

,

δ

qq

, δ

q,q+1

, . . . , δ

qT

, δ

q+1,q+1

, . . . , δ

q+1,T

, . . . , δ

T T

ξ

qq

, ξ

q,q+1

, . . . , ξ

qT

, ξ

q+1,q+1

, . . . , ξ

q+1,T

, . . . , ξ

T T

) .

from

E (Y

it

| D

iq

, . . . , D

iT

, X

i

, W

i

) =G

"

α +

T

X

g=q

β

g

D

ig

+ X

i

κ +

T

X

g=q

(D

ig

· X

i

) η

g

+

T

X

s=2

γ

s

fs

t

+

T

X

s=2

(fs

t

· X

i

) π

s

+

T

X

g=q

T

X

s=g

δ

gs

W

itgs

+

T

X

g=q

T

X

s=g



W

itgs

·

˙

X

ig



ξ

gs

#

.

where now

˙

X

ig

= X

i

−

¯

X

g

are centred around cohort sample averages.

2. For ˆτ

gt

, obtain the average partial eﬀect with respect to the binary variable W

t

, evaluated at

D

g

= 1, f r

t

= 1, and all other cohort and time dummies set to zero. Average across the

subsample with D

ig

= 1 to get

ˆτ

gt

=N

−1

g

N

X

i=1

D

ig

h

G



ˆα +

ˆ

β

g

+ X

i

ˆκ + X

i

ˆη

g

+ ˆγ

t

+ X

i

ˆπ

t

+

ˆ

δ

gt

+

˙

X

ig

ξ

gr



−G



ˆα +

ˆ

β

g

+ X

i

ˆκ + X

i

ˆη

g

+ ˆγ

t

+ X

i

ˆπ

t

i

.

4 Weighting and matching

4.1 IPW, IPWRA, and PSM

Lee, S. J., & Wooldridge, J. M. (2023). A Simple Transformation Approach to Diﬀerence-in-Diﬀerences

Estimation for Panel Data. Available at SSRN 4516518.

Rolling methods, common timing: assuming NA, CPT, and OV

1. For each unit i and a given t ≥ q, construct

˙

Y

it

= Y

it

−

1

(q − 1)

q−1

X

s=1

Y

is

If the focus is on a constant eﬀect, we can replace

˙

Y

it

by ∆

¯

Y

i

:

∆

¯

Y

i

=

¯

Y

i,post

−

¯

Y

i,pre

=

1

(T − q + 1)

T

X

s=q

Y

is

−

1

(q − 1)

q−1

X

s=1

Y

is

2. For any treatment period t ≥ q, apply standard TE methods to

n

˙

Y

it

, D

i

, X

i



: i = 1, . . . , N

o

The method could incorporate IPW, IPWRA (doubly robust), PSM, or machine learning.

8

3. Event study (long diﬀerencing) methods*: construct

˜

Y

it

≡ Y

it

− Y

i,q−1

and apply IPW, IPWRA

(doubly robust), PSM, or machine learning to

n

˜

Y

it

, D

i

, X

i



: i = 1, . . . , N

o

.

Rolling methods, staggered interventions: assuming CNAS, CPTS, and OV

1. For a given g ∈ {q, . . . , T } and time period t ∈ {g, g + 1, . . . , T }, compute

˙

Y

itg

≡ Y

it

−

1

(g − 1)

g−1

X

s=1

Y

is

= Y

it

−

¯

Y

i,pre(g)

2. Choose as the control group the units with D

i,t+1

+ D

i,t+2

+ · · · + D

iT

+ D

i∞

= 1 (or, if desired,

a subset, such as only the NT group).

3. Using the subset of data with

D

ig

+ D

i,t+1

+ D

i,t+2

+ · · · + D

iT

+ D

i∞

= 1,

apply standard TE methods - such as linear RA, IPW, IPWRA, matching - to the cross section

n

˙

Y

itg

, D

ig

, X

i



: i = 1, . . . , N

o

,

with D

ig

acting as the treatment indicator.

4.2 Synthetic DiD

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic

diﬀerence-in-diﬀerences. American Economic Review, 111(12), 4088-4118.

• Let the ﬁrst N

0

units be controls, N

1

treated, N = N

0

+ N

1

.

• Treatment starting in period T

0

+ 1.

• The TWFE estimator (basic DiD) of the the ATT solves

min

τ,µ,α

i

,β

t

N

X

i=1

T

X

t=1

(Y

it

− µ − τW

it

− α

i

− β

t

)

2

• Arkhangelsky et al. (2021): SC can be characterized as

min

τ,µ,β

t

N

X

i=1

T

X

t=1

ˆω

sc

i

(Y

it

− τW

it

− µ − β

t

)

2

where they omit the unit ﬁxed eﬀects, α

i

, and use unit-speciﬁc weights, ˆω

sc

i

.

• Arkhangelsky et al. (2021) propose an extension/synthesis and call it synthetic DiD:

min

τ,µ,α

i

,β

t

N

X

i=1

T

X

t=1

ˆω

sdid

i

ˆ

λ

sdid

t

(Y

it

− µ − τW

it

− α

i

− β

t

)

2

Procedure:

1. Choosing the weights to ensure proper ”balance”:

ˆω

sdid

i

chosen to make

N

−1

0

N

0

X

i=1

ˆω

sdid

i

Y

it

≈ N

−1

1

N

X

i=N

0

+1

Y

it

, t = 1, . . . , T

0

(control periods)

9

ˆ

λ

sdid

t

chosen to make

T

−1

0

T

0

X

t=1

ˆ

λ

sdid

t

Y

it

≈ T

−1

1

T

X

t=T

0

+1

ˆ

λ

sdid

t

Y

it

, i = 1, . . . , N

0

(control units)

Need to use a regularization scheme so weights are not too variable.

2. Inference

• Asymptotics assumes T

0

, T

1

both getting large.

• Need to assume time series are weakly dependent.

• Assumes N

0

large.

10