Introduction

Recently, Correia, Luck and Verner (2020) (henceforth CLV) put forward a very interesting, policy relevant paper that, among other things, discuss whether Non Pharmaceutical Interventions (NPIs) mitigated the adverse economic effects of the 1918 Spanish Flu pandemic on economic growth (this is only one part of their paper, to be clear). They find some suggestive evidence that cities that intervened earlier and more aggressively during the 1918 Spanish Flu pandemic did not perform worse in terms of economic activity and, if anything, grow faster after the pandemic is over. As a result, the authors interpret these findings as suggestive evidence that NPIs may mitigate the adverse economic consequences of a pandemic.

Motivated by the findings of CLV, Lilley, Lilley and Rinaldi (2020) (LLR) decided to analyze similar data to shed light on the “robustness” of the findings reported by CLV. LLR find that a portion of the results in CLV is not robust to changes in the methodology initially adopted, in the sense that, in LLR words, “the estimated effect of NPIs on economic growth are a noisy zero; we can neither rule out substantial positive nor negative effects of NPIs on employment growth”. LLR make all their analysis and data publicly available at https://almlgr.github.io/. This is open-science at its best!

In this document, my main goal is to briefly revisit this on-going debate by using some of the difference-in-differences methods that I have been working on in the last couple of years with Brantly Callaway and Jun B. Zhao. This would hopefully allow me to better understanding the “source” of the discussion. Having said that, I would like to stress that this exercise is mostly for my own understanding of the methodological issues behind the debate. I do not have expert knowledge about the 1918 Spanish flu pandemic, and the approach I take here may be full of caveats. As so, please interpret everything I say here with a grain of salt.

In the rest of this note, I do the following:

  1. I summarize part of this debate, highlighting how the CLV and LLR arrive at somehow different findings.

  2. I attempt to explain some challenges behind the methodology adopted:

    i.The “treatment” variables underlying CLV and LLR analysis are not binary as is usually the case in “traditional” difference-in-difference (DiD) studies. Because of that, at least to me, it is not entirely clear what type of “parallel trends assumptions” these papers are relying on to identify causal effects. This may play a role on how we should assess the “reliability” of the results. In fact, I find that checking for zero pre-trends as a way to assess the credibility of the underlying assumptions may not be a good idea here, as several of the covariates are measured in “later” pre-treatment periods(e.g. in 1914 and 1917), so using them as a covariates for “earlier” pre-treatment periods (e.g. 1904 and 1899) may not be appropriate.

    1. In these “non-standard” DiD setups, at first it is not clear if the coefficient underlying the outcome regression models used by both CLV and LLR recover “easy-to-understand” treatment effect parameters when treatment effects are allowed to heterogeneous with respect to the treatment intensity. My concern is that the TWFE regression underlying the CLV and LLR analysis may be implicitly imposing some homonegeity and linearity conditions that perhaps are not warranted. The usual “best linear approximation”-type interpretation may not be warranted here as the goal is to estimate causal effects.

    2. The inference procedures behind both CLV and LLR are pointwise and not simultaneous. This means that, in practice, one may find statistically significant effects by chance as these pointwise confidence intervals do not account for multiple-testing problems.

  3. In order to partially address the challenges described in points 1 and 2 above, I discretize the treatment variables to map the analysis back to the more standard DiD analysis. Although this may be full of caveats and potential drawbacks, it allow us to use some methods that are perhaps more “robust”. The results from this exercise provide suggestive evidence favoring the CLV original findings that adopting more intensive NPIs can help alleviating the negative economic effects of a pandemic.

Summary of the “debate”

In this section, I try to briefly summarize part of the “debate” between CLV and LLR that relates to pre-trends and model specifications. This is a simplified version of the discussion, as I am not revisiting the potential issues related to redefinition of city borders brought up by LLR; I strongly recommend you to read the entire exchange between CLV and LLR to get all their points. Here, my goal is to provide a “broad picture” that is more related to econometrics. I will also replicate their findings, using their proposed regression specifications, and using the data that LLR made available here. Reproducible research, YAY!!!

Replicating CLV

Using data from 43 U.S. cities from 1909 to 1923, CLV find that NPIs reduced the adverse economic effects of the 1918 Spanish Flu pandemic. They use two different outcome measures: \((i)\) city level manufacturing (log) output, and \((ii)\) city level manufacturing (log) employment. They also use two different NPI measures: \((a)\) NPI intensity, measured as the total cumulative days of NPIs per city, and \((b)\) Speed of NPI, measured as the number of days between when the weekly excess death rate exceeds two times the baseline influenza and pneumonia death rate and the date that an NPI measure is activated; see Correia, Luck and Verner (2020) for additional details.

In order to arrive at these conclusions, CLV make use of the very popular and widely adopted TWFE event-study linear regression specification: \[ Y_{c,t} = \alpha_c + \tau_t + \sum_{j \not= 1914}\beta_j NPI_{c,1918} 1\{j=t\} + \sum_{j \not= 1914}\gamma_j X_{c} 1\{j=t\} + \varepsilon_{c,t} \] where \(Y_{c,t}\) is the log-outcome of interest, \(\alpha_c\) is the city fixed-effect, \(\tau_t\) is the year fixed-effect, \(\varepsilon_{c,t}\) is the idiosyncratic error term, and \(X_c\) is a vector of time-invariant covariates including log 1910 population, percentage of the city population employed in manufacturing in 1914 and influenza mortality in 1917, as well as state-level measures of the 1910 urban share, agricultural employment share and per capital income.

The parameters of interest in the aforementioned TWFE linear regression are the \(\beta_j\)’s, also known as the event-study coefficients. The \(\beta_j\)’s with \(j \ge1918\) are usually interpreted as average treatment effect parameters at time period \(j\), and the \(\beta_j\)’s with \(j<1918\) are usually used to asses the credibility of a ``parallel trends assumption’’ (more on this below).

In what follows, I run the regressions associated with these specifications using both outcomes and both NPIs measures and plot the \(\beta_j\) coefficients together with their pointwise 90% confidence intervals. For the confidence intervals, I use standard errors clustered at the city-level. I report 90% confidence intervals because, in our opinion, with 43 cities, it would perhaps be “too much” to expect 95% confidence intervals.

# load packages and set ggplot template
devtools::install_github("pedrohcgs/DRDID", dependencies = T)
devtools::install_github("bcallaway11/did", dependencies = T)
library(DRDID)
library(did)
library(haven)            # package to read Stata file
library(lfe)
library(fastDummies)
library(ggthemes)
library(tidyverse)
library(patchwork)
#theme_set(theme_bw())
set.seed(20200517)
#-------------------------------------------------------------------------------------------------
# import the data in long format (save after run Analysis.do )
sflu_orig <- haven::read_dta("sf_processed.dta")
# get data without missing
sflu <- subset(sflu_orig, (is.na(sflu_orig$DaysofNPI) == F) == 1)

# Get CLV subsample
sflu_correia <- subset(sflu, (sflu$Year >= 1909) * (sflu$Year <= 1924) == 1)
#-------------------------------------------------------------------------------------------------
# Estimate Event-Study regressions
# Interact covariates with times
# Days of NPI
covs_days = model.matrix(~ (DaysofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year),
                         data = sflu_correia)
# Speed of NPI
covs_speed = model.matrix(~(SpeedofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year),
                         data = sflu_correia)

# Drop interaction with 1914 (baseline) and also the intercept
covs_days = covs_days[,-grep(")1914$", colnames(covs_days))]
covs_days = covs_days[,-1]

covs_speed = covs_speed[,-grep(")1914$", colnames(covs_speed))]
covs_speed = covs_speed[,-1]

# Manufacturing output and days
twfe_days_out <- lfe::felm(CityManuOutput_ln ~ covs_days | Year + City| 0 | City,
                                    data = sflu_correia)

# Manufacturing employment and days
twfe_days_emp <- lfe::felm(CityManuEmp_ln ~ covs_days | Year + City| 0 | City,
                                    data = sflu_correia)

# Manufacturing output and speed
twfe_speed_out <- lfe::felm(CityManuOutput_ln ~ covs_speed | Year + City| 0 | City,
                                    data = sflu_correia)

# Manufacturing employment and speed
twfe_speed_emp <- lfe::felm(CityManuEmp_ln ~ covs_speed | Year + City| 0 | City,
                                    data = sflu_correia)

# function to make plots
makeplot1 <- function(model, title) {
  broom::tidy(model, conf.int = TRUE, conf.level = 0.90) %>%
    filter(row_number() <= 4) %>%
    mutate(year = c(1909, 1919, 1921, 1923)) %>%
    bind_rows(tibble(year = 1914, estimate = 0, std.error = 0, 
                     conf.low = 0, conf.high = 0)) %>% 
    filter(!is.na(estimate)) %>% 
    ggplot(aes(x = year, y = 100 * estimate)) +
    ggtitle(title) + 
    geom_hline(yintercept = 0,color = "red", linetype  = "dashed") +
    scale_x_continuous(breaks = c(1909, 1914, 1919, 1921, 1923)) +
    geom_pointrange(aes(ymin = 100*conf.low, ymax = 100*conf.high), show.legend = FALSE)+
    theme(plot.title = element_text(hjust = 0.5, size=12))+
    labs(x = "Year", y = "Coefficient Estimate (x 100)")

}
# make our four plots
plot1 <- makeplot1(twfe_days_emp, "NPI intensity and log manufacturing employment.")
plot2 <- makeplot1(twfe_speed_emp, "NPI speed and log manufacturing employment.")
plot3 <- makeplot1(twfe_days_out, "NPI intensity and log manufacturing output")
plot4 <- makeplot1(twfe_speed_out, "NPI speed and log manufacturing output")

# put into a single plot
plot1 +  plot3 + plot2 + plot4 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'Confidence intervals are based analytical formula. 90% pointwise confidence intervals displayed in black.')

These results replicate very well CLV findings. Indeed, the estimates from the TWFE linear regression model above suggest that cities that intervened earlier and more aggressively during the 1918 Spanish Flu pandemic did not perform worse in terms of economic activity and, if anything, grow faster after the pandemic is over. Furthermore, as the \(\widehat\beta_{1909}\) is not statistically different from zero in any of the four TWFE linear regressions, CLV interpret these findings as lack of evidence against their maintained parallel trend assumptions behind the TWFE specifications.

Replicating LLR

Now, time to discuss LLR. As mentioned before, they build on CLV analysis. First, LLR extends the city-level manufacturing data used by CLV by adding 1899, 1904, 1925 and 1927 to their 1909-1923 sample. The first step of LLR analysis

LLR is concern that cities with more aggressive NPIs were growing faster before 1918, which in turn would imply that CLV estimates could be confounded by these pre-trends, casting doubt on the credibility of CLV conclusions.

LLR analysis is very detailed and raise some potential issues with CLV conclusions. They start their analysis discussing how population growth is related to manufacturing employment growth, and how this could potentially bias the results of CLV. They provide motivating discussions about these issues, too, but I will not focus on that. Again, if you want to better understand this part of their debate, you should read the entire exchange between CLV and LLR here.

After that, LLR go on and run the same TWFE linear regression specification as in CLV, but using these additional years in their data. This would help one to “better assess” how pre-trends could be driving the original CLV findings. I am able to replicate the entire analysis very easily, which is always a plus! I display the point estimates and the pointwise 90% confidence intervals below. As before, standard errors are clustered at the city-level.

#-------------------------------------------------------------------------------------------------
# Interact covariates with times
# Days of NPI
covs_days2 = model.matrix(~ (DaysofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year),
                         data = sflu)
# Speed of NPI
covs_speed2 = model.matrix(~(SpeedofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year),
                         data = sflu)

# Drop interaction with 1914 (baseline) and also the intercept (automatically included in reg)
covs_days2 = covs_days2[,-grep(")1914$", colnames(covs_days2))]
covs_days2 = covs_days2[,-1]

covs_speed2 = covs_speed2[,-grep(")1914$", colnames(covs_speed2))]
covs_speed2 = covs_speed2[,-1]

# Manufacturing output and days
twfe_days_out2 <- lfe::felm(CityManuOutput_ln ~ covs_days2 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing employment and days
twfe_days_emp2 <- lfe::felm(CityManuEmp_ln ~ covs_days2 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing output and speed
twfe_speed_out2 <- lfe::felm(CityManuOutput_ln ~ covs_speed2 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing employment and speed
twfe_speed_emp2 <- lfe::felm(CityManuEmp_ln ~ covs_speed2 | Year + City| 0 | City,
                                    data = sflu)


# function to make plots
makeplot2 <- function(model, title) {
  broom::tidy(model, conf.int = TRUE, conf.level = 0.90) %>%
    filter(row_number() <= 8) %>%
    mutate(year = c(1899, 1904, 1909, 1919, 1921, 1923, 1925, 1927)) %>%
    bind_rows(tibble(year = 1914, estimate = 0, std.error = 0,
                     conf.low = 0, conf.high = 0)) %>% 
    filter(!is.na(estimate)) %>% 
    ggplot(aes(x = year, y = 100 * estimate)) +
    ggtitle(title) + 
    geom_hline(yintercept = 0,color = "red", linetype  = "dashed") +
    scale_x_continuous(breaks = c(1899, 1904, 1909, 1914, 1919, 1921, 1923, 1925, 1927)) +
    geom_pointrange(aes(ymin = 100*conf.low, ymax = 100*conf.high), show.legend = FALSE)+
    theme(plot.title = element_text(hjust = 0.5, size=12))+
    labs(x = "Year", y = "Coefficient Estimate (x 100)")
}

# make our plots
plot5 <- makeplot2(twfe_days_emp2, "NPI intensity and log manufacturing employment.")+ ylim(c(-0.42, 0.42))
plot6 <- makeplot2(twfe_speed_emp2, "NPI speed and log manufacturing employment.")+ ylim(c(-2.2, 2.5))
plot7 <- makeplot2(twfe_days_out2, "NPI intensity and log manufacturing output")+ ylim(c(-0.42, 0.42))
plot8 <- makeplot2(twfe_speed_out2, "NPI speed and log manufacturing output") +  ylim(c(-2.2, 2.5))

# put into one plot
plot5 + plot7 + plot6 + plot8 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'Confidence intervals are based analytical formula. 90% pointwise confidence intervals displayed in black.')

LLR argue that the results presented above are worrisome as they suggest very strong pre-trends, which, in turn, may be interpreted as evidence against the reliability of the underlying identification assumptions used by CLV. As so, following LLR, the reliability of CLV findings may be at stake; see Lilley, Lilley and Rinaldi (2020) for additional details on how they combine these results with other arguments to reach their conclusion.

In face of the aforementioned results, LLR then estimate a TWFE regression model which also includes the interaction between year and the treatment variable to, in their own words, “naively correct for pre-trend”: \[ Y_{c,t} = \alpha_c + \tau_t + \sum_{j > 1914}\beta_j NPI_{c,1918} 1\{j=t\} + \lambda \cdot t\cdot NPI_{c,1918} + \sum_{j \not= 1914}\gamma_j X_{c} 1\{j=t\} + \varepsilon_{c,t}. \] They argue, and I agree, that this is a widely used “robustness-check” to assess how robust the findings are with respect to the presence of linear pre-trends. Having said that, I must also say that, at least to my knowledge, there is no rigorously founded discussion about how this particular specification can deliver point-identified causal effect parameters when treatment effects may be heterogeneous. Thus, although very popular and perhaps useful, I would say that one should proceed with caution here–and LLR do!

With these caveats in mind, I proceed to replicate their results below. I only report the post-treatment parameters (as LLR did), and, as before, standard errors are clustered at the city-level.

#-------------------------------------------------------------------------------------------------
# Interact covariates with times, like before
# Days of NPI
covs_days22 = model.matrix(~ (DaysofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year) +
                             I(DaysofNPI * Year),
                         data = sflu)

# Speed of NPI
covs_speed22 = model.matrix(~(SpeedofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year) +
                             I(DaysofNPI * Year),
                         data = sflu)

# Drop intercept and interaction with 1914 (baseline)
covs_days22 = covs_days22[,-1]
covs_days22 = covs_days22[,-grep(")1914$", colnames(covs_days22))]
# Drop interactions betwen NPIs with years before 1914 
covs_days22 = covs_days22[,-c(2,3,4)]

# Do the same for speed
covs_speed22 = covs_speed22[,-1]
covs_speed22 = covs_speed22[,-grep(")1914$", colnames(covs_speed22))]
covs_speed22 = covs_speed22[,-c(2,3,4)]

# Manufacturing output and days
twfe_days_out22 <- lfe::felm(CityManuOutput_ln ~ covs_days22 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing employment and days
twfe_days_emp22 <- lfe::felm(CityManuEmp_ln ~ covs_days22 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing output and speed
twfe_speed_out22 <- lfe::felm(CityManuOutput_ln ~ covs_speed22 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing employment and speed
twfe_speed_emp22 <- lfe::felm(CityManuEmp_ln ~ covs_speed22 | Year + City| 0 | City,
                                    data = sflu)

# function to make plots
makeplot22 <- function(model, title) {
  broom::tidy(model, conf.int = TRUE, conf.level = 0.90) %>%
    filter(row_number() <= 6) %>%
     filter(row_number() >1) %>%
    mutate(year = c(1919, 1921, 1923, 1925, 1927)) %>%
    bind_rows(tibble(year = 1914, estimate = 0, std.error = 0,
                     conf.low = 0, conf.high = 0)) %>% 
    filter(!is.na(estimate)) %>% 
    ggplot(aes(x = year, y = 100 * estimate)) +
    ggtitle(title) + 
    geom_hline(yintercept = 0,color = "red", linetype  = "dashed") +
    scale_x_continuous(breaks = c(1914, 1919, 1921, 1923, 1925, 1927)) +
    geom_pointrange(aes(ymin = 100*conf.low, ymax = 100*conf.high), show.legend = FALSE)+
    theme(plot.title = element_text(hjust = 0.5, size=12))+
    labs(x = "Year", y = "Coefficient Estimate (x 100)")
}

# make our plots
plot51 <- makeplot22(twfe_days_emp22, "NPI intensity and log manufacturing employment.")+ ylim(c(-0.42, 0.42))
plot61 <- makeplot22(twfe_speed_emp22, "NPI speed and log manufacturing employment.")+ ylim(c(-2.2, 2.5))
plot71 <- makeplot22(twfe_days_out22, "NPI intensity and log manufacturing output")+ ylim(c(-0.42, 0.4))
plot81 <- makeplot22(twfe_speed_out22, "NPI speed and log manufacturing output") +  ylim(c(-2.2, 2.5))

# put into one plot
plot51 +  plot71 + plot61 + plot81 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'Confidence intervals are based analytical formula. 90% pointwise confidence intervals displayed in black.')

After adjusting for linear pre-trends, the above results suggest that we do not have enough evidence to conclude that cities which were more aggressive (NPI intensity) in implementing NPIs have different growth after the pandemic than those which were less aggressive. The same conclusion is reached when we compare cities which implemented NPIs sooner vs. later (NPI speed). Furthermore, the confidence intervals are relatively wide, making it hard to rule out negative effects of NPIs on economic activities. Nonetheless, as explicitly pointed out by LLR on page 7 of their paper, these estimates

“should be interpreted cautiously, and as illustrative, rather than as causal estimates. In the context of strong prevailing trends, there is no guarantee of what precise functional form trends driven by other phenomena take, nor to what extent they will continue”.

However, they do paint a potentially worrisome “fragility” of CLV results.

Based on these results and additional context-specific discussions about how how population growth may be driving the results, LLR conclude that CLV results using data from the 1918 Spanish Flu pandemic does not provide clear evidence of the effect of non-pharmaceutical interventions on economic growth. Potentially, there is a lot of uncertainty going on here to allow us to “be sure”.

The reply of CLV to LLR

CLV took the criticism raised by LLR very seriously, which is always a very good sign! Their full reply can be found in Correia, Luck and Verner (2020, reply) (henceforth CLV2). They provide point-by-point discussion on the issues raised by LLR, but again, I will focus only on a portion of there that are more related to pre-trends.

In this regard, CLV2 argue that looking for pre-trends 19 years before the policy took place may not be very informative in the context of the analysis, especially because the structure of the U.S. economy and U.S. cities changed quickly around the turn of the 20th century. I personally find this argument reasonable and compelling, though reasonable people can disagree on this.

Having said that, CLV2 did not stop here. Motivated by the arguments put forward by LLR that NPIs were correlated with population growth, CLV2 argue that including the log of population in 1900 as additional covariate in the TWFE linear model should help addressing this issue; see Correia, Luck and Verner (2020, reply) for additional detail.

The figure display the results using CLV2 “augmented” TWFE specification, with standard errors clustered at the city-level.

# generate log of population in 1900
sflu$Pop1900_ln = log(sflu$Pop1900)
#-------------------------------------------------------------------------------------------------
# Estimate Event-Study regressions
# Interact covariates with times
# Days of NPI
covs_days3 = model.matrix(~ (DaysofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917 +
                                   Pop1900_ln):factor(Year),
                         data = sflu)
# Speed of NPI
covs_speed3 = model.matrix(~(SpeedofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917 +
                                   Pop1900_ln):factor(Year),
                         data = sflu)

# Drop interaction with 1914 (baseline) and also the intercept
covs_days3 = covs_days3[,-grep(")1914$", colnames(covs_days3))]
covs_days3 = covs_days3[,-1]
covs_speed3 = covs_speed3[,-grep(")1914$", colnames(covs_speed3))]
covs_speed3 = covs_speed3[,-1]

# Manufacturing output and days
twfe_days_out3 <- lfe::felm(CityManuOutput_ln ~ covs_days3 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing employment and days
twfe_days_emp3 <- lfe::felm(CityManuEmp_ln ~ covs_days3 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing output and speed
twfe_speed_out3 <- lfe::felm(CityManuOutput_ln ~ covs_speed3 | Year + City| 0 | City,
                                    data = sflu)

# Manufacturing employment and speed
twfe_speed_emp3 <- lfe::felm(CityManuEmp_ln ~ covs_speed3 | Year + City| 0 | City,
                                    data = sflu)

# make our three plots
plot9 <- makeplot2(twfe_days_emp3, "NPI intensity and log manufacturing employment.")
plot10 <- makeplot2(twfe_speed_emp3, "NPI speed and log manufacturing employment.")
plot11 <- makeplot2(twfe_days_out3, "NPI intensity and log manufacturing output")
plot12 <- makeplot2(twfe_speed_out3, "NPI speed and log manufacturing output")

# put into one plot
plot9 +  plot11 + plot10 + plot12 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'Confidence intervals are based analytical formula. 90% pointwise confidence intervals displayed in black.')

From this last set of plots, one may conclude that we do not have any evidence against violations of pre-trends after controlling for the vector of observed characteristics which includes the log of population in 1900. CLV2 then conclude that their results still “survive” LLR critique after they control for 1900 population growth.

However, LLR do not agree with this conclusion, as their other concerns were not addressed to their satisfaction. This is totally OK, but I think we are now ready to move on.

Challenges associated with the TWFE linear specification

I believe we are now ready to dig deeper into the underlying challenges associated with the TWFE regressions used by CLV and LLR!

When you read both CLV and LLR, you notice that these authors argue that, in order to interpret the \(\beta_j's\) estimates from the TWFE regressions as average causal effects, one need to rely on a “parallel trends assumption”. This is arguably the rationale behind using pre-treatment periods to “pre-test” the reliability of such an assumption. This is indeed very standard in difference-in-differences type of analysis. What is perhaps “non-standard” here is that the treatment variables, NPI intensity and NPI speed, are not binary variables but approximately continuous!

From my perspective, this difference raises several questions about the methodology that deserve more attention:

  1. It is not clear what is the parameter of interest underlying the analysis. This is in sharp contrast to the DiD setup with two groups, where the parameter of interest is well understood to be the \(ATT_{t}\) for all post-treatment periods \(t\), \(ATT_{t}=E[Y_t(1)-Y_t(0)|D=1]\).

  2. In the 2 group setup (discrete treatment), we understand that the parallel trends condition commonly invoked to identify the \(ATT_{t}\) for all post-treatment periods \(t\) is that \[ \text{For all post-treatment periods t, it follows that} \\ \mathbb{E}[Y_t(0)-Y_{t-1}(0)|D=1, X]=\mathbb{E}Y_t(0)-Y_{t-1}(0)|D=0,X]~ almost~surely. \] When we have a continuum of groups, it is not entirely clear the type of parallel trends assumption one wants to invoke. As so, it is not really clear if/how we should/need to restrict pre-trends.

  3. It is not entirely clear that the TWFE regression adopted by CLV and LLR lead to causally interpretable parameters as it relies on an arguably strong functional form assumption. More specifically, the underlying TWFE specification implicitly imposes that the effect of NPIs measures on the (log) outcomes is linear, which may be deemed too strong in several applications. Given that the main goal of the analysis is to recover causal effects, the usual motivation of using OLS because it provides the “best linear approximation” is not warranted. As recently emphasized by Sloczynski (2020),

“Ordinary least squares is “best” in predicting actual outcomes but causal inference is about predicting missing outcomes (…). In other words, the OLS weights are optimal for predicting `what is.’ Instead, we are interested in predicting “what would be” if treatment were assigned differently"

On top of these three important identification challenges, it is worth mentioning that the inference procedures adopted by CLV and LLR in their event-study plots are pointwise and do not acknowledge that we are interested in making inference about the evolution of the treatment effects. As so, they ignore important multiple-testing issues, which in turn suggest that one may be finding statistically significant results by chance.

Several points should also be made about using zero pre-trends as a way to “validate” the design:

  • It may be the case that the method identifying assumptions do not restrict pre-trends, implying that having a zero pre-trends may not be a necessary nor sufficient condition for the method.

  • Even if it were a necessary condition for identification, we need to be explicit that failing to reject the null hypothesis does not provide evidence in favor of the null hypothesis, as the test may have low power. It is plausible that this is the case here as we have data on 43 cities only, so sample size is fairly small. This is where context-specific knowledge can help us to decide on how to proceed.

  • In this particular context, we note that we use covariates measured in 1900 (log population), in 1910 (log population, state-level measures of urban share, agricultural employment share and per capital income.), in 1914 ( percentage of the city population employed in manufacturing), and in 1917 (influenza mortality). The pre-treatment periods are 1899, 1904, 1909 and 1914, implying that the covariates are measure “after” the outcomes. This can potentially create a problem of “bad controls”, potentially putting the reliability of pre-trends test at further stake. Here, we note that this latter point about pre-trends is not specific to the TWFE specification per se.

Semi-parametric event-study

In light of the potential challenges associated with the DiD-type analysis that exploits variation in treatment intensity via TWFE regression models, I believe it is reasonable to adopt alternative identification and estimation strategies. Unfortunately, I am not aware of any DiD-type procedure available in the literature that is suitable to setups like the one in this paper (Brantly Callaway, Andrew Goodman-Bacon and I are working on one!). An alternative way to proceed would be to discretize the treatment to avoid the pitfalls mentioned above. Although one may argue that this would be a less-than-optimal “solution”, following this path allows one to use methods that are better understood. This is the path I take in this section, though I do recognize that this has some drawbacks, too!

I start the analysis by discretizing the NPIs measures into above and below the median (“treated” and “comparison” groups, respectively). Given that we have only two treatment groups, we are now in a position where we can use some recent methods developed by Sant’Anna and Zhao (2020, JoE Forthcoming) and by Callaway and Sant’Anna (2019, new version coming soon!). More specifically, we can combine Sant’Anna and Zhao (2020) Doubly-Robust DiD estimators with the methodology put forward in Callaway and Sant’Anna (2019) to handle DiD with multiple time periods. These methods were design for binary treatment, so that is why I tool the path of discretizing treatments.

I follow this path because the procedure put forward by Callaway and Sant’Anna (2019) is (i) suitable to highlight treatment effect dynamics, (ii) accommodates different estimation methods as it clearly separate the identification from estimation/inference part of the analysis, and (iii) provide simultaneous confidence intervals for event-study plots, therefore avoiding multiple-testing problems. I choose to use the ATT estimator proposed by Sant’Anna and Zhao (2020) because it (v) allows for covariates to affect trends of treated and comparison groups differently, which add an additional layer of flexibility, (vi) it is more robust against model specification than alternative procedures, remaining consistent if \(either\) (but not necessarily both) propensity score or outcome regression models are correctly specified and (vii) uses the data in the most efficient way, i.e., the estimator is semiparametrically locally-efficient.

Next, I briefly revisit the relevant parts of Callaway and Sant’Anna (2019) (henceforth CS) and Sant’Anna and Zhao (2020) (henceforth SZ). I am a co-author of both papers, so have that in mind to understand my enthusiasm!

Revisiting CS and SZ

In order to estimate the average treatment effect for the treated cities (i.e. cities with NPI above the median), we make the following parallel trends assumption: \[ \text{For all post-treatment periods } t \ge 1918, \text{ it follows that} \\ \mathbb{E}[Y_t(0)-Y_{t-1}(0)|D=1, X]=\mathbb{E}[Y_t(0)-Y_{t-1}(0)|D=0,X]~ almost~surely. \] This assumption states that, conditional on a vector of pre-treatment covariates, the evolution of \(Y(0)\) is the same across cities above and below the median NPI. It is worth stressing that, here, \(Y(0)\) is not the potential outcome with zero treatment, but rather the potential outcome under the (average) treatment intensity of the cities with NPI below the media. In other words, here we are comparing the average potential outcome at time \(t\) associated with NPI speed/intensity being above its empirical median (\(\mathbb{E}[Y_t(1)|D=1],\)) with the counterfactual average potential outcome t time \(t\) for those same cities if they were to adopt the treatment intensity of the cities with NPI speed/intensity being below its empirical median, (\(\mathbb{E}[Y_t(0)|D=0]\)). The parameter of interest is therefore the \(ATT_t = \mathbb{E}[Y_t(1)-Y_t(0)|D=1]\). I apologize for the abuse of notation.

Now, I am ready to use CS and SZ. From their results, it then follows that, for all \(t \ge 1918\), we can identify the \(ATT_t\) by \[ ATT_t = \mathbb{E}\left[ \left( \frac{D}{\mathbb{E}\left[ D\right] } - \frac{\frac{p(X)\left( 1-D\right) }{1-p(X)}}{\mathbb{E}\left[ \frac{p(X)\left( 1-D\right) }{1-p(X)}\right]}\right) \left(Y_t - Y_{1914} - m_{0,Y_t - Y_{1914} }(X)\right) \right], \] where \(p(X)\equiv\mathbb{E}[D|X]\) is the propensity score, and \(m_{0,Y_t-Y_{1914}}(X) \equiv\mathbb{E}[Y_t - Y_{1914}|D=0,X]\) is the outcome regression of the evolution of the outcome from \(t=1914\) (last pre-treatment period) to \(t\) for the cities in the comparison group.

CS and SZ suggest estimating \(p(X)\) using a logit model, to estimate \(m_{0,Y_t-Y_{1914}}(X)\) using OLS for each \(t \ge1918\), and then replace the population expectations with their sample analogue. SZ also shows that, as long one of these models are correctly specified, the resulting estimator for the \(ATT_t\) will be consistent (hence the doubly robust terminology). If both of these models are correctly specified, then the estimator is semiparametrically efficient.

CS also discuss in great detail a bootstrap algorithm to make simultaneous confidence intervals that fully address the multiple testing problem. We use that, too!

Another potentially attractive feature of the CS-SZ procedure is that it does not rely on zero “pre-trends-type” assumptions for all pre-treatment periods, which may not be realistic in this particular scenario. In fact, as it is clear from the above formula, CS-SZ estimator (estimand) for the \(ATT_t\) does not make use of all pre-treatment outcomes to estimate the \(ATT_t\) for \(t \ge1918\). Of course, if one wants to assess the presence of pre-trends, one can evaluate \(ATT_t\) at \(t<1914\), too.

A potential challenge of using CS-SZ procedure here is the fact that we have only 43 cities, so both treatment groups are small. This implies that inference procedures may not be very precise. Well, I proceed with the analysis in any case, but we should be aware of this!

Results based on CS and SZ

Next, we put the CS-SZ estimator to work. I use the same set of covariates as in the last designs, i.e, I also include the log of the population in 1900 as covariate. This can be interpreted as a variant of the CLV-LLR-CLV2 specification, where I allow the same set of covariates used in CLV2 to have a “more flexible” effect on the outcomes of interest.

The results are displayed below. Standard errors are clustered at the city-level. Pointwise confidence interval are displayed in black, while simultaneous confidence intervals are displayed in red.

# Create binary treatment by considering above and below median treated vs not treated
tauquant = 0.5
sflu$DaysofNPI_treated = (sflu$DaysofNPI >= quantile(sflu$DaysofNPI, probs = tauquant))
sflu$SpeedofNPI_treated = (sflu$SpeedofNPI >= quantile(sflu$SpeedofNPI, probs = tauquant))
# Define "treatment" 
sflu$treated_days = ifelse(sflu$DaysofNPI_treated == 1, 1918, 0)
sflu$treated_speed = ifelse(sflu$SpeedofNPI_treated == 1, 1918, 0)
# Ensure non-factors (this may create issues with did package at the moment)
sflu$City <- as.numeric(sflu$City)
sflu$Year <- as.numeric(sflu$Year)
#---------------------------------------------------------------------------------------------------
# Estimate Event-Study 
# Manufacturing output and days
drdid_days_out <- did::att_gt(yname = "CityManuOutput_ln",
                              tname = "Year",
                              idname = "City",
                              first.treat.name = "treated_days",
                              xformla = ~ AgricultureEmpShare1910 +
                                ManufacturingEmpShare1914 +
                                UrbanPopShare1910  +
                                IncomePerCapita1910  +
                                Pop1910_ln  +
                                Mortality1917+
                                Pop1900_ln,
                              panel = TRUE,
                              estMethod = "dr",
                              bstrap = T,
                              biters = 9999,
                              cband = T,
                              alp = 0.1,
                              data = sflu,
                              printdetails = F)

# Manufacturing employment and days
drdid_days_emp <- did::att_gt(yname = "CityManuEmp_ln",
                              tname = "Year",
                              idname = "City",
                              first.treat.name = "treated_days",
                              xformla = ~ AgricultureEmpShare1910 +
                                ManufacturingEmpShare1914 +
                                UrbanPopShare1910  +
                                IncomePerCapita1910  +
                                Pop1910_ln  +
                                Mortality1917+
                                Pop1900_ln,
                              panel = TRUE,
                              estMethod = "dr",
                              bstrap = T,
                              biters = 9999,
                              cband = T,
                              alp = 0.1,
                              data = sflu,
                              printdetails = F)

# Manufacturing output and speed
drdid_speed_out <- did::att_gt(yname = "CityManuOutput_ln",
                              tname = "Year",
                              idname = "City",
                              first.treat.name = "treated_speed",
                              xformla = ~ AgricultureEmpShare1910 +
                                ManufacturingEmpShare1914 +
                                UrbanPopShare1910  +
                                IncomePerCapita1910  +
                                Pop1910_ln  +
                                Mortality1917+
                                Pop1900_ln,
                              panel = TRUE,
                              estMethod = "dr",
                              bstrap = T,
                              biters = 9999,
                              cband = T,
                              alp = 0.1,
                              data = sflu,
                              printdetails = F)

# Manufacturing employment and $ speed
drdid_speed_emp <- did::att_gt(yname = "CityManuEmp_ln",
                              tname = "Year",
                              idname = "City",
                              first.treat.name = "treated_speed",
                              xformla = ~ AgricultureEmpShare1910 +
                                ManufacturingEmpShare1914 +
                                UrbanPopShare1910  +
                                IncomePerCapita1910  +
                                Pop1910_ln  +
                                Mortality1917+
                                Pop1900_ln,
                              panel = TRUE,
                              estMethod = "dr",
                              bstrap = T,
                              biters = 9999,
                              cband = T,
                              alp = 0.1,
                              data = sflu,
                              printdetails = F)
#-------------------------------------------------------------------------------------------------
# Functions to get results with 1914 as reference (did package do not fix reference time-period yet)

mboot2 <- function(inf.func, model) {
  # needed variables
  data <- model$DIDparams$data
  idname <- model$DIDparams$idname
  clustervars <- model$DIDparams$clustervars
  biters <- model$DIDparams$biters
  tname <- model$DIDparams$tname
  tlist <- unique(data[,tname])[order(unique(data[,tname]))]
  alp <- model$DIDparams$alp
  panel <- model$DIDparams$panel
  
  # just get n obsevations (for clustering below...)
  ifelse(panel,
         dta <- data[ data[,tname]==tlist[1], ],
         dta <- data)
  
  n <- nrow(inf.func)
  
  # if include id as variable to cluster on
  # drop it as we do this automatically
  if (idname %in% clustervars) {
    clustervars <- clustervars[-which(clustervars==idname)]
  }
  
  # we can only handle up to 2-way clustering
  if (length(clustervars) > 1) {
    stop("can't handle that many cluster variables")
  }
  
  # bootstrap
  bout <- lapply(1:biters, FUN=function(b) {
    if (length(clustervars) > 0) {
      # draw Rademachar weights
      # these are the same within clusters
      # see paper for details
      n1 <- length(unique(dta[,clustervars]))
      Vb <- matrix(sample(c(-1,1), n1, replace=T))
      Vb <- cbind.data.frame(unique(dta[,clustervars]), Vb)
      Ub <- data.frame(dta[,clustervars])
      Ub <- Vb[match(Ub[,1], Vb[,1]),]
      Ub <- Ub[,-1]
    } else {
      Ub <- sample(c(-1,1), n, replace=T)
    }
    # multiply weights onto influence function
    Rb <- sqrt(n)*(apply(Ub*(inf.func), 2, mean))
    # return bootstrap draw
    Rb
  })
  # bootstrap results
  bres <- simplify2array(bout)
  # handle vector and matrix case differently, so you get nxk matrix
  ifelse(class(bres)=="matrix", bres <- t(bres), bres <- as.matrix(bres))
  # bootstrap variance matrix 
  V <- cov(bres)
  # bootstrap standard error
  bSigma <- apply(bres, 2,
                  function(b) (quantile(b, .75, type=1, na.rm = T) -
                                 quantile(b, .25, type=1, na.rm = T))/(qnorm(.75) - qnorm(.25)))
  # critical value for uniform confidence band
  bT <- apply(bres, 1, function(b) max( abs(b/bSigma)))
  crit.val <- quantile(bT, 1-alp, type=1, na.rm = T)
  
  list(bres=bres, V=V, se=bSigma/sqrt(n), crit.val=crit.val)
}

# Create event study, fixing 1914 as reference period for everyone
event.st <- function(model){
  inf.func.event <- as.matrix(cbind(-rowSums(model$inffunc[,1:3]), 
                                    -rowSums(model$inffunc[,2:3]),
                                    -(model$inffunc[,3]),
                                    #0, 
                                    model$inffunc[,4:8])
  )
  estimate <- c(-sum(model$att[1:3]), 
                -sum(model$att[2:3]),
                -(model$att[3]),
                0, 
                model$att[4:8])
  year = c(1899, 1904, 1909, 1914, 1919, 1921, 1923, 1925, 1927)
  
  bout <- mboot2(inf.func.event, model=model)
  se <- c(bout$se[1:3],0,bout$se[4:8])
  c <- bout$crit.val
  list(estimate = estimate, se = se, c = c, year = year)
}

# function to make plots
makeplot <- function(model, title) {
  ES_style <- event.st(model)
  tibble(estimate = ES_style$estimate, 
         year = ES_style$year,
         conf.low.u = ES_style$estimate - ES_style$c * ES_style$se,
         conf.high.u = ES_style$estimate + ES_style$c * ES_style$se,
         conf.low.p = ES_style$estimate - stats::qnorm(0.95) * ES_style$se,
         conf.high.p = ES_style$estimate + stats::qnorm(0.95) * ES_style$se
         ) %>%
    filter(!is.na(estimate)) %>% 
    ggplot(aes(x = year, y = estimate)) +
    ggtitle(title) + 
    geom_hline(yintercept = 0,color = "red", linetype  = "dashed") +
    scale_x_continuous(breaks = c(1899, 1904, 1909, 1914, 1919, 1921, 1923, 1925, 1927)) +
    geom_pointrange(aes(ymin = conf.low.u, ymax = conf.high.u), show.legend = FALSE, linetype= 1,
                    color = "red")+
    geom_pointrange(aes(ymin = conf.low.p, ymax = conf.high.p), show.legend = FALSE)+
    theme(plot.title = element_text(hjust = 0.5, size=12))+
    labs(x = "Year", y = "Coefficient Estimate")
}
# make our four plots
plot13 <- makeplot(drdid_days_emp, "NPI intensity and log manufacturing employment.")
plot14 <- makeplot(drdid_speed_emp, "NPI speed and log manufacturing employment.")
plot15 <- makeplot(drdid_days_out, "NPI intensity and log manufacturing output")
plot16 <- makeplot(drdid_speed_out, "NPI speed and log manufacturing output")
# put into one plot
plot13 + plot15 + plot14 + plot16 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'All confidence intervals are based on 9,999 bootstrap draws. 90% pointwise (simultaneous) confidence intervals displayed in black (red).')

The above results are somehow close to CLV2 results. They suggest that NPI intensity and speed had a positive effect on manufacturing employment. They also suggest a positive effect on manufacturing output, though these are more noisy. It is also interesting to see the “price” we pay for being robust against the multiple-testing problems is not that high: the simultaneous confidence intervals are not much wider than the pointwise ones. However, if one were to ignore the multiple-testing problem, one would obtain “stronger” evidence in favor of the effect of NPIs on manufacturing output which is perhaps not warranted,

In terms of violations of parallel pre-trends, we do not find any evidence against it using pre-tests. However, such a “conclusion” is full of caveats, as mentioned before. First, we use covariates measured in 1900 (log population), in 1910 (log population, state-level measures of urban share, agricultural employment share and per capital income.), in 1914 ( percentage of the city population employed in manufacturing), and in 1917 (influenza mortality). The pre-treatment periods are 1899, 1904, 1909 and 1914, implying that the covariates are measure “after” the outcomes. This can potentially create a problem of “bad controls”, potentially putting the reliability of pre-trends test at stake. Second, but not less importantly, failing to find evidence against pre-trends does not imply that we do have evidence in favor of the parallel trends assumption made here. In fact, the parallel pre-trends assumption made is fundamentally untestable, so a lot of care should be taken here in interpreting the point-estimates (and confidence intervals) for the pre-treatment periods above.

Ultimately, the above plots provides suggestive evidence that being aggressive/fast in NPIs may help alleviate the effects of the pandemic, if you believe on the untestable parallel trends assumption mentioned earlier.

A “robustness” check

Now, a point that is very easy to criticize from the above results is that we defined treatment and comparison units in an arbitrary way. Furthermore, such NPI policies were implemented together, implying that cities with \(D=0\) following one criteria may have \(D=1\) following the other criteria. That would be problematic because the comparison group would be a “compromised” comparison group.

In order to partially address this potential limitation, we can redefine the treatment group to be those cities under which their NPI speed and NPI intensity are above the median in both criteria. Likewise, the comparison group would be formed with the cities that are below the median of both NPI speed and NPI intensity. Six cities who were in different groups depending on the criteria used are therefore dropped: Albany, Denver, Indianapolis, Nashville, New Orleans and Rochester.

The results of this "robustness’’ check in presented below:

# Create treatment
tauquant3 = 0.5

sflu$treated = (sflu$DaysofNPI > quantile(sflu$DaysofNPI, probs = tauquant3)) +
  (sflu$SpeedofNPI > quantile(sflu$SpeedofNPI, probs = tauquant3))
  
# Subset data to exclude cities with different "treatment" status depending on NPI measure.
sflu3_days <- subset(sflu, sflu$treated != 1)
sflu3_days$treated_days = ifelse(sflu3_days$treated == 2, 1918, 0)
#-------------------------------------------------------------------------------------------------
# Estimate Event-Study
# Manufacturing output and above median on both
drdid_days_out3 <- did::att_gt(yname = "CityManuOutput_ln",
                              tname = "Year",
                              idname = "City",
                              first.treat.name = "treated_days",
                              xformla = ~ AgricultureEmpShare1910 +
                                ManufacturingEmpShare1914 +
                                UrbanPopShare1910  +
                                IncomePerCapita1910  +
                                Pop1910_ln  +
                                Mortality1917+
                                Pop1900_ln,
                              panel = TRUE,
                              estMethod = "dr",
                              bstrap = T,
                              biters = 9999,
                              cband = T,
                              alp = 0.1,
                              data = sflu3_days,
                              printdetails = F)

# Manufacturing employment and $ days
drdid_days_emp3 <- did::att_gt(yname = "CityManuEmp_ln",
                              tname = "Year",
                              idname = "City",
                              first.treat.name = "treated_days",
                              xformla = ~ AgricultureEmpShare1910 +
                                ManufacturingEmpShare1914 +
                                UrbanPopShare1910  +
                                IncomePerCapita1910  +
                                Pop1910_ln  +
                                Mortality1917+
                                Pop1900_ln,
                              panel = TRUE,
                              estMethod = "dr",
                              bstrap = T,
                              biters = 9999,
                              cband = T,
                              alp = 0.1,
                              data = sflu3_days,
                              printdetails = F)
#-------------------------------------------------------------------------------------------------
# make our plots
plot21 <- makeplot(drdid_days_emp3, "NPI ( intensity and speed) and log manufacturing employment.")
plot22 <- makeplot(drdid_days_out3, "NPI ( intensity and speed) log manufacturing output")

# put into one plot
plot21 + plot22 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'All confidence intervals are based on 9,999 bootstrap draws. 90% pointwise (simultaneous) confidence intervals displayed in black (red).')

As we can see, the results are essentially unchanged in terms of post-treatment periods: we find suggestive evidence that NPI may mitigate the adverse economic consequences of a pandemic, especially in terms of unemployment. Again, the use of simultaneous confidence intervals have shown to be practically important, especially when analyzing the effect on manufacturing output.

Now, the picture about pre-trend is a more complicated one, especially if one puts more attention to the very last pre-treatment periods, 1899 and 1904. Although the results do not find any suggestive evidence against zero pre-trends, one may argue that this may be because the test lacks power, which is something that can not be rule-out here as we have 37 observation!

Personally, given that these are more than 10 years before the treatment took place and that the structure of the U.S. economy and U.S. cities changed quickly around the turn of the 20th century, I would tend to not put much emphasis on these data points, which is in agreement with the arguments made by CLV2. Here, again, reasonable people can disagree! Also, needless to say that the same caveats about pre-trends discussed before apply here, too.

I am sure you can do more robustness checks! But it is also important to keep in mind that there are only 43 cities available so “over-partitioning” the data into finer groups could be problematic for the reliability of the underlying statistical method being used. That is why I did not try many of these.

Conclusion

In summary, I find this debate about the effect of NPIs on economic growth fascinating. CLV is a very intriguing paper, and the authors make strong claims about the effectiveness of NPIs to alleviate the economic effects of the pandemic. LLR raised some important points about CLV analysis, suggesting that the “degree of certainty” implicit in CLV conclusions were not warranted. CLV2 took the criticism serious, and update their analysis based on it.

From my standing point, I was worried about some of the modelling assumptions behind the analysis. By playing with the data and using some of the DiD methods I have been working on recently, I find suggestive evidence that cities that intervened earlier and more aggressively during the 1918 Spanish Flu pandemic tend to grow faster after the pandemic is over, which is in line with CLV original findings.

Now, there is the debate about the “credibility of the design”, which necessarily depend on how much one trust the untestable assumption that, in the absence of the pandemic, the evolution of the outcomes among cities who intervened “earlier” and “more aggressively” during the 1918 Spanish Flu pandemic would have been the same as those cities who intervened “later” and “less aggressive”. This is a discussion that needs to depend expert knowledge about the specific context. Both CLV and LLR (and their replies to each other) make good points here.

At the end of the day, we need to read these discussions and make a subjective decision on whether we find the arguments raised by LLR or CLV more plausible. Here, I personally tend to agree more with CLV points. I find it hard that the redefinition of city boundaries would be able to completely explain these results to the point we should not be able to interpret it as suggestive evidence. But again, reasonable people may disagree on this. That is entirely OK, we are at least having this conversation, which should be interpreted as a win!

I hope you enjoyed reading this as much as I have enjoyed writing it.