Thursday, January 15, 2015

Bootstrapping Standard Errors: Methods and AR(1) processes

The Famous Cameroon/Gelbach paper goes through:

Non parametric bootstraps
1. Pairs Cluster Bootstrap-se and Bootstrap-t
2. Residual and wild bootstrap 
Parametric bootstraps
3. Redidual bootstrap

The first method is just to repeat your regression/correlation X number of times, and plot the distribution of all X estimates to get a nonparametric estimate of the variance. However, what's key here, is that the sampling method mimic whatever issue you're trying to correct for. If it's clustering of standard errors, then you'll want to have a clustered sampler. 

The second and third method (in some sense) simulates "new data" on each run, and then proceeds as in (1). Method 2 reassigns the initial estimated residuals (nonparametric), while method 3 randomly samples new residuals (parametric, because it assumes a distribution on the residuals) from a standard normal. Nonparametric is preferred to parametric, but isn't always possible (like in your case). 

Method (2) is popular for bootstrapping when the # of clusters is small (so you're not just resampling the same data repeatedly, which would give you a crude estimate of the true distribution). 

***Bootstrapping for an AR(1) process*******

Now, in the case of getting se's with an AR(1) process there are two options: non-parametric and parametric. With non-parametric you'll want to mimic the AR(1) process via the sampling method. With parametric you'll mimic the AR(1) process by literally generating data from an AR(1) process (see slide 23/28:

I haven't come across time series sampling akin to clustered sampling, so to bootstrap the se's on AR(1) process I don't see any other possibly other than a parametric bootstrap. 

Attached is Stata code for a basic nonparametric X,Y pairwise se-bootstrap. You can find R code here: My particular code doesn't really help your specific case, but gives an example. 

Some other links I checked out to understand this better:

Session Effects and Treatment Effects Colinear in a Between Experiment

In between experiments, we look at the average effect of a treatment for a given session, and compare averages across sessions. Imagine we have two sessions: A& B, and two treatments: T1, T2
A is assigned T1
B is assigned T2

Here we can't distinguish if it's the session that caused a change in outcome behavior of the treatment. What would solve it is if we have at least 2 sessions for each treatment for a total of four sessions: A, B, C, D:

A and B are assigned T1
C and D are assigned T2

Now session and treatment are not perfectly colinear. But of course, 4 session is costlier than 2.

One paper:
suggests that clustering will solve session effects. A dummy for the session and clustering within the cov-var matrix are not the same thing. The latter will take care of interactions between individuals in a session, but not the average effect of being in session A versus session B. Perhaps session A was very noisy, so everyone was distracted, but no one talked to one another--so clustering wouldn't do much to control for a noisy session.

Responder A: 
Why do you disagree with Session fixed effects?

I have 5 sessions and characteristics are evenly distributed across the sessions. Unless there is something big that I am unaware of (possible) the session effect should not vary by session. That is, any unobservable interaction should not be different from session to session and almost certainly would not be linear. Hence, I am inclined not to use session effects.

I am coming to this idea slowly. In general, I think session effects are a good default since people select into their session. But i am having colinearity problems when i include all my session effects, so that 2 dummies are dropped. Observables should not contribute differently to session effects since they are well distributed and so we are then talking about interactions, which is a whole other can of worms and make me think the linear session effects are not the way to go. My sample is probably too small to look at this rigorously.

How can i justify this approach to an audience, reader, reviewer?

Responder B:My issue with the session effect paper is that it totally confounds within-group correlations (clustering) with session effects (fixed effects). It says that feedback between subjects is a session effect. I see session effects as an average intercept shift in each session (classic fixed effect).
It also claims that clustering will handle dynamic session effects and heterogeneity in behavior.
Regarding your issue--I'll think out loud a bit.
I think session effects should account for anything you may not have controlled for or can't control for--e.g. there's the day of the week, the time, and maybe if there was a baby crying during the session (and you didn't know that) and people couldn't concentrate, etc.
Now, suppose you put in dummies to account for this session intercept. Why would some drop?
1. The session is perfectly collinear with another set of variables--like the treatment.
2. Suppose the session effect (relative to the left out session) is the same across a few sessions. That's feasible. It could be that the baby came to 2 sessions and not the others, and that's the only thing that differed between sessions. In that case, you wouldn't be able to estimate a different intercept for each session.
This must be an issue in cross-country papers that include country dummies? Country dummies must drop out... You might post/look here:

Responder B: 
1. I can't put in session dummies into my regs, i.e. fixed effects for session, because they coincide with my treatments perfectly and drop out.
2. Does it make any sense to have a session variable (or similarly a city variable) rather than a session or city fixed effect?
namely, rather than having a dummy for each city or each session, one controls for city in a city variable that is coded as 1=NYC, 2=London etc.
What does that variable mean? And does it make any sense? When you first learn regressions you use a city variable, but now a days you use a dummy for each city.
Do you ever use one city, or country control rather than a dummy for each country or city?

Responder C:
Here is my take:
If you use an ordering of 1,2,3,4,5 for a region, it has to make sense. This is because 5 means a higher magnitude than 4 etc. Therefore there are only few situations you can go for an ordering instead of fixed effects and a time trend is one such case. For regions, I'm afraid you have to go for fixed effects. The one alternate solution is to use a variable that captures something by region (so is region specific) and use that instead of region fixed effects. If this variable comes close to approximating you region specific unobservable, you may be able to get away with it. And you can then say you control for what you think is the biggest issue, but you can use region fixed effects as it kills your main variable.
Thats my 2 cents.