Monday, July 11, 2016

Power Calcs; R versus Stata



I found this nice post with R code for more complicated power calculations using simulations - which is nice, because it's not the black box.

http://egap.org/methods-guides/10-things-you-need-know-about-statistical-power

They provide 3 stages of power calculations:

  1. Two sided power calculation - no clustering, covariates, or multiple treatment arms
  2. Two sided power calculation with covariates - no clustering, or multiple treatment arms
  3. Two sided power calculation with multiple treatment arms 

  1. Two sided power calculation - no clustering, covariates, or multiple treatment arms
First I compare (1) to Stata, and they report the same sample size: N = 500, split between T1 and Control, assuming the SD for both samples is the same (which, is generally an erroneous assumption. You expect your treatment group to have greater variation in outcomes. But so it is.)

R


Stata 12.1 ( sampsi 60 65, alpha(.05) power(.8) sd(20))


2. Two sided power calculation with covariates - no clustering, or multiple treatment arms

Then I look at their calcs with covariates. When you add in covariates that creates noisier data. When you don't adjust for the reduce sd on estimates that covariates bring you later down the road, then you're overstating N. See the difference between sample_size = 1500 now and sample_size.cov = 500. (I didn't do this in Stata. I think power in Stata 13 lets you add in covariates, and I'm not bothering writing and ado for Stata to do this.)



              3. Two sided power calculation with multiple treatment arms 


First, let's assume the same effect size in both (5) for each treatment group. If we consider only making one comparison (ttest), then N ~ 600.


Stata 12.1 (sampsi 60 65, alpha(.05) power(.8) sd(20) ratio(3))

And that's what Stata's Sampsi command gives too, N= 600 , if we assume the same effect sizes by each treatment and no adjustment to the needed p-value for multiple testing.


In R simulations, the simulations that correct the p-values for multiple testing, show that we don't acheive 80% power with a sample of 2,000 or less.  (possible.ns[which(power.fullranking > 0.8)[1]] returns NA and you can see that series in blue is flat).

      4. The final step that they don't do is to also adjust for clustering. So I add this in. 
I'm working on adding this in to the code in R and rewriting it in Python... tbc.