Thursday, January 15, 2015

Session Effects and Treatment Effects Colinear in a Between Experiment

In between experiments, we look at the average effect of a treatment for a given session, and compare averages across sessions. Imagine we have two sessions: A& B, and two treatments: T1, T2
A is assigned T1
B is assigned T2

Here we can't distinguish if it's the session that caused a change in outcome behavior of the treatment. What would solve it is if we have at least 2 sessions for each treatment for a total of four sessions: A, B, C, D:

A and B are assigned T1
C and D are assigned T2

Now session and treatment are not perfectly colinear. But of course, 4 session is costlier than 2.

One paper: https://files.nyu.edu/gf35/public/print/Frechette_2011b.pdf
suggests that clustering will solve session effects. A dummy for the session and clustering within the cov-var matrix are not the same thing. The latter will take care of interactions between individuals in a session, but not the average effect of being in session A versus session B. Perhaps session A was very noisy, so everyone was distracted, but no one talked to one another--so clustering wouldn't do much to control for a noisy session.


Responder A: 
Why do you disagree with Session fixed effects?

I have 5 sessions and characteristics are evenly distributed across the sessions. Unless there is something big that I am unaware of (possible) the session effect should not vary by session. That is, any unobservable interaction should not be different from session to session and almost certainly would not be linear. Hence, I am inclined not to use session effects.

I am coming to this idea slowly. In general, I think session effects are a good default since people select into their session. But i am having colinearity problems when i include all my session effects, so that 2 dummies are dropped. Observables should not contribute differently to session effects since they are well distributed and so we are then talking about interactions, which is a whole other can of worms and make me think the linear session effects are not the way to go. My sample is probably too small to look at this rigorously.

How can i justify this approach to an audience, reader, reviewer?

Responder B:My issue with the session effect paper is that it totally confounds within-group correlations (clustering) with session effects (fixed effects). It says that feedback between subjects is a session effect. I see session effects as an average intercept shift in each session (classic fixed effect).
It also claims that clustering will handle dynamic session effects and heterogeneity in behavior.
****
Regarding your issue--I'll think out loud a bit.
I think session effects should account for anything you may not have controlled for or can't control for--e.g. there's the day of the week, the time, and maybe if there was a baby crying during the session (and you didn't know that) and people couldn't concentrate, etc.
Now, suppose you put in dummies to account for this session intercept. Why would some drop?
1. The session is perfectly collinear with another set of variables--like the treatment.
2. Suppose the session effect (relative to the left out session) is the same across a few sessions. That's feasible. It could be that the baby came to 2 sessions and not the others, and that's the only thing that differed between sessions. In that case, you wouldn't be able to estimate a different intercept for each session.
This must be an issue in cross-country papers that include country dummies? Country dummies must drop out... You might post/look here: http://stats.stackexchange.com/questions


Responder B: 
1. I can't put in session dummies into my regs, i.e. fixed effects for session, because they coincide with my treatments perfectly and drop out.
2. Does it make any sense to have a session variable (or similarly a city variable) rather than a session or city fixed effect?
namely, rather than having a dummy for each city or each session, one controls for city in a city variable that is coded as 1=NYC, 2=London etc.
What does that variable mean? And does it make any sense? When you first learn regressions you use a city variable, but now a days you use a dummy for each city.
Do you ever use one city, or country control rather than a dummy for each country or city?


Responder C:
Here is my take:
If you use an ordering of 1,2,3,4,5 for a region, it has to make sense. This is because 5 means a higher magnitude than 4 etc. Therefore there are only few situations you can go for an ordering instead of fixed effects and a time trend is one such case. For regions, I'm afraid you have to go for fixed effects. The one alternate solution is to use a variable that captures something by region (so is region specific) and use that instead of region fixed effects. If this variable comes close to approximating you region specific unobservable, you may be able to get away with it. And you can then say you control for what you think is the biggest issue, but you can use region fixed effects as it kills your main variable.
Thats my 2 cents.

No comments:

Post a Comment