The notation on pg 3935 in the handbook is quite sloppy, and so is the statement:
No. The correct regression is:
Written with StackEdit.
On pg 3935 of **USING RANDOMIZATION IN DEVELOPMENT ECONOMICS
RESEARCH: A TOOLKIT**, the authors discuss how to weight an average treatment effect, when the probability of treatment varies within strata.
I had such an issue, and so looked to this page for help. But the information provided on this page does not seem to be correct. Essentially, the authors say that weighting the treatment effect (namely, the average difference between treated and untreated) in each strata by the probability of being in that strata, conditional on treatment is equivalent to running a regression of the outcome on the treatment dummy controlling for strata dummies (and all interactions between strata dummies, if the sample was stratified along more than one dimension, say city and gender).
This is likely not the case, as the variance covariance matrix from a weighted regression versus one with additional dummies will not be the same.
The ATE is measured by:
When the data are stratified, Duflo et. al. suggest the following:
(duflo_toolkit.pdf, pg 3935), where, using data,
The authors use a continuous strata in their equation (an integral sign). I’m not sure why, as strata are usually discrete. The authors leave it at the above, but I’ll walk you through what those weights mean first.
So , what is P(X=x|T)? It would be great if the authors spelled this out.
An example of computing these weights follows with my own data.
Here is a summary of observations from an experiment. You can see from this simple table, that the probability of being treated in each strata is very different. In strata 1, 107 out of 648 individuals were treated, whereas in strata 2 98 out of 1,075 individuals were treated. So we need to correct for this, if we’d like an average effect of the treatement variable.
Handbook’s Suggested Methodologies
Supposed we’d like to estimate the effect of the treatment on an outome. Our hypothesis is:
H0: The effect of the treatment on outcome is zero.
Duflo et. al. gives two methods of estimating the treatment effect correcting for the fact that the probability of treatment depends on the strata as follows:
Run a regression with controls for each strata
Use weights in the form of:
Both (1) and (2) can be done in a regression form. Given that, how does the OLS estimator (or probit, since my primary outcome variable is binary) and it’s standard errors (clustered at the strata level) change for (1) and (2)? Is
General OLS estimators:
1.If we use controls, X=[treatment, strata1, strata2, strata3,strata4], Y=[outcome] (I guess we can have a dummy for each strata and suppress the intercept or, take one strata dummy out).
2.If we use weights, then X=[treatment]’, Y=[outcome]’, and W=[vector of strata weights]’, for example,
Perhaps (1) and (2) could produce equivalent results for
In terms of the variance of beta,
More General Issues on Heterogenous Treatment Effects
Stratifying a random sample can often be used to get a representative sample within strata, allowing the research to look at the treatment effect by strata. Without stratification, if we subset the data to a certain city, for example, we can’t be sure that our sample within that city is representative of the population.
From a Bayesian perspective, this would be equivalent to saying that our prior distribution of the data is incomplete–we’re missing a whole matrix of individuals within that city who would respond differently to the treatment than the ones we happened to pick up in our sample that was not stratified by city.
In fact, some great Bayesians, like Andrew Gelman, have writen about this issue from a Bayesian perspective: http://www.stat.columbia.edu/~gelman/research/published/bayes_management.pdf
There are several other methodologies out there that relate to heterogenous treatment effects that use machine learning methods rather than the chop, dice and data mine methods:
Imai’s work at Princeton essentially says, rather than look at all the interaction effects between treatments and strata to decide on which treatment is best for which strata, let’s reduce the problem to just a few treatments and strata a priori. The algorithm is here, but it’s essentially an optimization problem with an added constraint that dampens the effect of some interaction effects (hopefully, I’m getting this right):
Yet, one more method that I haven’t delved into is by Grimmer et. al.:
The traditional go-to method in the sciences of throwing in many interaction effects between treatment(s) and a strata (or whatever you may be conditioning) is wrought with two major issues:
1. It expands your parameter space, requiring a much larger sample to maintain the same power as before.
2. Designs don’t exogenously vary both the treatment AND the strata conditioned on, so the interaction cannot necessarily be interpreted as causal.
It’s time to consider new methods, I think.