10 - Difference-in-Differences (2/2)

Slides & Recap

This week, we will revisit the difference-in-differences framework and build upon it. Last week, our focus was on two groups: a treatment group and a control group. However, in real-world applications, you often encounter situations where units are treated at different times, resulting in multiple groups or cohorts. A substantial and recent body of literature has examined the so-called “staggered treatment adoption” and demonstrated that the standard estimator we used last week (two-way fixed effects, TWFE) is biased unless we account for treatment effect heterogeneity. Additionally, we must still adhere to the other assumptions relevant to the basic case, particularly parallel trends and no anticipation.

Consequently, when we observe units \(i\) across time periods \(t = 1,...,T_t\) and units adopt the binary treatment at different dates \(G_i \in (1, ..., T_t) \cup \infty\), we can extend parallel trends assumption by:

Assumption (A.stpt) “Parallel Trends”

\(\mathbb{E}[Y_{i,t}(\infty) - Y_{i,t-1}(\infty) | G_i=g] = \mathbb{E}[Y_{i,t-1}(\infty) - Y_{i,t}(\infty) | G_i=g'] \quad \forall g, g', t\).

Put differently, in absence of treatment, the average outcomes by cohort would be parallel. The interpretation of the parallel trends assumption does not change, the treatment must not have any effect on the cohort before being administered.

Assumption (A.stna) “No Anticipation”

\(\mathbb{E}[Y_{i,t}(g) - Y_{i,t}(\infty) | T_i=1] = 0 \quad\) or \(\quad Y_{i,t}(g) = Y_{i,t}(\infty) \quad \forall t < g\)

Practical example

Imagine you are working as an analyst for a streaming company. Over several weeks, you have introduced unskippable ads to boost ad revenue. However, you suspect that this change may have caused some customers to watch less content. Your task is to take the data and measure the impact of unskippable ads on customer viewing behavior.

# Load tidyverse
library(tidyverse)

# Read data
stream <- readRDS("stream.rds")

# Print first lines
print(stream |> head(10))

# A tibble: 10 × 10
    unit  time group minutes_watched   age gender account_tenure num_drama_watched num_comedy_watched num_action_watched
   <int> <int> <dbl>           <dbl> <dbl>  <dbl>          <dbl>             <dbl>              <dbl>              <dbl>
 1     1     1    10            95.5  29.4      0             90                 1                  2                  3
 2     1     2    10            93.5  29.4      0             90                 0                  1                  3
 3     1     3    10           167.   29.4      0             90                 1                  3                  1
 4     1     4    10           119.   29.4      0             90                 2                  1                  1
 5     1     5    10           117.   29.4      0             90                 5                  1                  2
 6     1     6    10            99.4  29.4      0             90                 0                  2                  2
 7     1     7    10           125.   29.4      0             90                 2                  1                  4
 8     1     8    10           129.   29.4      0             90                 2                  1                  1
 9     1     9    10           119.   29.4      0             90                 1                  4                  5
10     1    10    10           159.   29.4      0             90                 2                  1                  7

Data overview

This week, we’ll tackle a more realistic scenario compared to previous weeks. Instead of having all the details about your data upfront, you’ll be given the data and will need to uncover the details as you go.

Task 1: Answer the following questions:

How many periods did we observe?
How many units did we observe?
How many cohorts/treatment groups are in the data?
How many units are in each cohort?
How many treated, not-yet treated and untreated/never treated periods did we observe?

# Task 1: Answer the following questions:
# - How many periods?
unique(stream$time)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

# - How many units?
unique(stream$unit)

  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

# - How many cohorts/treatment groups?
unique(stream$group)

[1] 10  0  9  8

# - How many units in each cohort?
stream |> group_by(group) |> summarise(n_unit = n_distinct(unit)) |> ungroup()

# - How many treated, not-yet treated and untreated/never treated periods
# do we observe?
# First add treatment and relative treatment indicator
stream <- stream |> 
  # create time-variant treatment indicator D_it
  mutate(d_it = case_when(
    group == 0 ~ 0,
    time >= group ~ 1,
    .default = 0)) |>
  # create relative event-time indicator D_it_k
  mutate(d_it_k = case_when(
    group == 0 ~ Inf,
    .default = time - group))

# Then group by treatment indicator to retrieve information
units_d <- stream |> group_by(group, d_it) |> summarise(n_unit = n_distinct(unit)) |> ungroup()
print(units_d)

# A tibble: 7 × 3
  group  d_it n_unit
  <dbl> <dbl>  <int>
1     0     0     25
2     8     0     25
3     8     1     25
4     9     0     25
5     9     1     25
6    10     0     25
7    10     1     25

units_k <- stream |> group_by(group, d_it_k) |> summarise(n_unit = n_distinct(unit)) |> ungroup()
print(units_k)

# A tibble: 46 × 3
   group d_it_k n_unit
   <dbl>  <dbl>  <int>
 1     0    Inf     25
 2     8     -7     25
 3     8     -6     25
 4     8     -5     25
 5     8     -4     25
 6     8     -3     25
 7     8     -2     25
 8     8     -1     25
 9     8      0     25
10     8      1     25
# ℹ 36 more rows

Task 2: Compute and plot the outcome for each cohort (treatment groups + control groups)

before and after the treatment,
and for each period.

# Task 2: compute plot evolution of average outcomes across cohorts.
# (a) 
stream |>
    group_by(group, d_it) |> 
    summarise(mean_y = mean(minutes_watched)) |> 
    ungroup()

# (b)
Y_gt_k <- stream |>
  group_by(group, time) |> 
  summarise(mean_y = mean(minutes_watched)) |> 
  ungroup() |> 
  mutate(group = if_else(group == 0, NA, group))
Y_gt_k

# One plot
ggplot(Y_gt_k, aes(
  x = time, 
  y = mean_y,
  color = as.factor(group))) + 
  geom_line() + 
  geom_point() +
  xlab("Week") + 
  ylab("Minutes watched") +
  geom_vline(aes(xintercept = group - .5, color = as.factor(group)), linetype = "dotted")

# Facet plot
ggplot(Y_gt_k, aes(
  x = time, 
  y = mean_y)) + 
  geom_point() +
  geom_line() +
  geom_vline(aes(xintercept = group - .5), linetype = "dotted", colour = "red") +
  xlab("Week") + 
  ylab("Minutes watched") +
  facet_wrap(~group)

Event Study

Let’s run an event study to assess the plausibility of our assumption of parallel trends and no anticipation.

Task 3: Without including covariates, run an event study and assess whether the assumption of parallel trends holds.

# Task 3: Run an event-study without covariates to assess the parallel trends
# assumption.
evt_stdy_wo <- fixest::feols(minutes_watched ~ i(d_it_k, ref = c(-1, Inf)) | unit + time, data = stream)
summary(evt_stdy_wo)

OLS estimation, Dep. Var.: minutes_watched
Observations: 1,500
Fixed-effects: unit: 100,  time: 15
Standard-errors: Clustered (unit) 
           Estimate Std. Error t value   Pr(>|t|)    
d_it_k::-9    -3.49        7.7   -0.45 6.5249e-01    
d_it_k::-8     5.30        6.6    0.80 4.2624e-01    
d_it_k::-7    -3.89        5.5   -0.71 4.8200e-01    
d_it_k::-6    -1.50        5.2   -0.29 7.7452e-01    
d_it_k::-5    -2.04        5.2   -0.39 6.9827e-01    
d_it_k::-4    -4.41        5.7   -0.77 4.4156e-01    
d_it_k::-3     1.18        5.2    0.23 8.2097e-01    
d_it_k::-2     0.82        4.6    0.18 8.5771e-01    
d_it_k::0    -39.17        4.1   -9.46 1.6377e-15 ***
d_it_k::1    -37.52        4.5   -8.31 5.1265e-13 ***
d_it_k::2    -25.11        4.8   -5.25 8.5675e-07 ***
d_it_k::3    -21.81        5.1   -4.26 4.6376e-05 ***
d_it_k::4    -16.34        5.1   -3.21 1.8012e-03 ** 
d_it_k::5     -0.88        5.4   -0.16 8.7014e-01    
d_it_k::6     -3.89        5.5   -0.71 4.8145e-01    
d_it_k::7    -14.60        7.2   -2.03 4.4699e-02 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 24.7     Adj. R2: 0.528101
             Within R2: 0.105374

fixest::iplot(evt_stdy_wo)

Task 4: Including covariates, run an event study and assess whether the assumption of parallel trends holds.

# Task 4: Run an event-study with covariates to assess the conditional parallel trends
# assumption.
evt_stdy_w <- fixest::feols(minutes_watched ~ i(d_it_k, ref = c(-1, Inf)) +
                              age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched
                            | unit + time, data = stream)

The variables 'age', 'gender' and 'account_tenure' have been removed because of collinearity (see $collin.var).

summary(evt_stdy_w)

OLS estimation, Dep. Var.: minutes_watched
Observations: 1,500
Fixed-effects: unit: 100,  time: 15
Standard-errors: Clustered (unit) 
                   Estimate Std. Error t value   Pr(>|t|)    
d_it_k::-9          -3.1174       7.71 -0.4045 6.8674e-01    
d_it_k::-8           5.4019       6.64  0.8137 4.1777e-01    
d_it_k::-7          -3.8795       5.49 -0.7065 4.8153e-01    
d_it_k::-6          -1.4750       5.26 -0.2802 7.7991e-01    
d_it_k::-5          -1.8686       5.26 -0.3551 7.2330e-01    
d_it_k::-4          -4.2636       5.71 -0.7472 4.5669e-01    
d_it_k::-3           1.0704       5.19  0.2061 8.3711e-01    
d_it_k::-2           0.7614       4.63  0.1643 8.6983e-01    
d_it_k::0          -38.9389       4.10 -9.4870 1.4396e-15 ***
d_it_k::1          -37.3805       4.49 -8.3319 4.6437e-13 ***
d_it_k::2          -25.1122       4.76 -5.2778 7.7309e-07 ***
d_it_k::3          -22.0022       5.10 -4.3140 3.7996e-05 ***
d_it_k::4          -16.2977       5.11 -3.1890 1.9122e-03 ** 
d_it_k::5           -0.6415       5.33 -0.1203 9.0447e-01    
d_it_k::6           -3.7607       5.49 -0.6850 4.9496e-01    
d_it_k::7          -14.4919       7.16 -2.0226 4.5809e-02 *  
num_drama_watched    0.7247       0.47  1.5329 1.2850e-01    
num_comedy_watched   0.0019       0.47  0.0039 9.9686e-01    
num_action_watched   0.4706       0.51  0.9271 3.5612e-01    
... 3 variables were removed because of collinearity (age, gender and account_tenure)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 24.7     Adj. R2: 0.528099
             Within R2: 0.107328

fixest::iplot(evt_stdy_w)

Task 5: Compute the static treatment effect using TWFE (w and w/o).

# Task 6: Compute the static treatment effect using TWFE (w and w/o).
twfe_wo <- fixest::feols(minutes_watched ~ i(d_it, ref = c(0)) | unit + time, data = stream)
twfe_w <- fixest::feols(minutes_watched ~ i(d_it, ref = c(0)) 
                        + age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched
                        | unit + time, data = stream)

The variables 'age', 'gender' and 'account_tenure' have been removed because of collinearity (see $collin.var).

summary(twfe_wo)

OLS estimation, Dep. Var.: minutes_watched
Observations: 1,500
Fixed-effects: unit: 100,  time: 15
Standard-errors: Clustered (unit) 
        Estimate Std. Error t value   Pr(>|t|)    
d_it::1      -26        3.3    -7.8 7.7816e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 25.4     Adj. R2: 0.509109
             Within R2: 0.059178

summary(twfe_w)

OLS estimation, Dep. Var.: minutes_watched
Observations: 1,500
Fixed-effects: unit: 100,  time: 15
Standard-errors: Clustered (unit) 
                   Estimate Std. Error t value   Pr(>|t|)    
d_it::1              -25.62       3.30   -7.75 8.0912e-12 ***
num_drama_watched      0.78       0.49    1.59 1.1463e-01    
num_comedy_watched    -0.10       0.48   -0.22 8.2804e-01    
num_action_watched     0.41       0.52    0.79 4.3306e-01    
... 3 variables were removed because of collinearity (age, gender and account_tenure)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 25.3     Adj. R2: 0.509125
             Within R2: 0.061248

Robust estimators

As discussed in the lecture, when treatment adoption is staggered, the two-way fixed effects estimator yields biased results in the presence of treatment effect heterogeneity, i.e., when the treatment effect changes over time. This occurs because some comparisons that should be excluded are still executed by TWFE. Andrew Goodman-Bacon was the first to notice and explain this issue. He decomposed the treatment effect into all possible 2x2 comparisons, showing which comparisons are valid and which are not when we cannot assume treatment effect homogeneity over time. The R package bacondecomp helps visualize these comparisons. Let’s use it to examine our case.

# Bacon-Decomposition
library(bacondecomp)

# What comparisons are included in TWFE?
df_bacon <- bacon(
  minutes_watched ~ d_it + 
    age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched,
  data = stream,
  id_var = "unit",
  time_var = "time")

                  type weight avg_est
1         Both Treated   0.24     -44
2 Treated vs Untreated   0.76     -20

# Plot comparisons
ggplot(df_bacon$two_by_twos) +
  aes(x = weight, y = estimate, shape = factor(type), color = factor(type)) +
  geom_point(size = 3) +
  geom_hline(yintercept = 0, color = "black", linetype = "dotted") + 
  theme_minimal() +
  labs(x = "Weight", y = "Estimate", shape = "Type", color = "Type") + 
  theme(legend.position = "bottom")

Task 6: Use any method that is robust to treatment effect heterogeneity for staggered treatment adoption. Retrieve a static and a dynamic estimate and plot your result. Using the result from the Bacon-Decomposition, compare the estimate to the TWFE estimate.

Callaway & Sant’Anna (2021)

# (b) Doubly robust
drdid <- did::att_gt(
  yname = "minutes_watched",
  tname = "time",
  idname = "unit",
  gname = "group",
  xformla = ~ age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched,# + unit, #  ~ 1 
  data = stream,
  allow_unbalanced_panel = TRUE,
  control_group = "nevertreated", # "notyettreated"
  anticipation = 0,
  clustervars = "unit",
  est_method = "dr", # "reg", "ipw"
)
did::aggte(drdid, type="simple")


Call:
did::aggte(MP = drdid, type = "simple")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 


 ATT    Std. Error     [ 95%  Conf. Int.]  
 -18           4.2        -26        -9.5 *


---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

drdid_evt_sty <- did::aggte(drdid, type="dynamic")
did::ggdid(drdid_evt_sty)

Gardner, Thakral, Tô, and Yap (2024)

# (c) Two-Stage DiD
# Static model
twsdid <- did2s::did2s(
  stream,
  yname = "minutes_watched",
  first_stage = ~ age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched | unit + time, # only time-variant covariates
  second_stage = ~ i(d_it, ref = 0),
  treatment = "d_it",
  cluster_var = "unit"
)

Running Two-stage Difference-in-Differences
 - first stage formula `~ age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched | unit + time`
 - second stage formula `~ i(d_it, ref = 0)`
 - The indicator variable that denotes when treatment is on is `d_it`
 - Standard errors will be clustered by `unit`

twsdid

OLS estimation, Dep. Var.: minutes_watched
Observations: 1,500
Standard-errors: Custom 
        Estimate Std. Error t value   Pr(>|t|)    
d_it::1      -20        3.3    -5.9 4.8371e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 26.6   Adj. R2: 0.108268

# Event Study
twsdid_k <- did2s::did2s(
  stream,
  yname = "minutes_watched",
  first_stage = ~ age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched | unit + time, # only time-variant covariates
  second_stage = ~ i(d_it_k, ref = c(-1, Inf)),
  treatment = "d_it",
  cluster_var = "unit"
)

Running Two-stage Difference-in-Differences
 - first stage formula `~ age + gender + account_tenure + num_drama_watched + num_comedy_watched + num_action_watched | unit + time`
 - second stage formula `~ i(d_it_k, ref = c(-1, Inf))`
 - The indicator variable that denotes when treatment is on is `d_it`
 - Standard errors will be clustered by `unit`

fixest::iplot(twsdid_k)

Comparison

estimators_comparison <- tibble(
  "TWFE" = twfe_w$coeftable$Estimate[1],
  "Callaway & Sant’Anna" = did::aggte(drdid, type="simple")[[1]],
  "Gardner, Thakral, Tô, and Yap" = twsdid$coeftable[[1]]
)   

print(estimators_comparison)

# A tibble: 1 × 3
   TWFE `Callaway & Sant’Anna` `Gardner, Thakral, Tô, and Yap`
  <dbl>                  <dbl>                           <dbl>
1 -25.6                  -17.7                           -19.5

Assignment

Accept the Week 10 - Assignment and follow the same steps as last week and as described in the organization chapter.

Please also remember to fill out the course evaluation if you have not done so already.

Explain in your own words in 3-4 sentences, when and why the estimation of treatment effects using TWFE yields biased estimate in the case of staggered adoption and multiple treatment periods.
One week prior to introducing the unskippable ads, you notify your users in an email. Explain, how that could affect your estimation and the assumption it relies on. Independent of your conclusion, show how you can account for anticipation when using the estimator from the did2s package (Gardner, Thakral, Tô, and Yap (2024)).
Using the robust estimator from the did package (Callaway & Sant’Anna (2021)), perform a sensitivity analysis using the HonestDiD package. Additional to the lecture slides, you can find instructions on installation, functionality, and theory on GitHub. Run the sensitivity analysis based on “bounds on relative magnitudes” and explain the result in your own words (3-4 sentences). Please note that you have to set base_period = "universal" in att_gt().