Tuesday, July 19, 2011

Data Tools

In my all our data work I've found that:

R produces great graphics. Like ellipsical confidence intervals and such. But it's statistical syntax has little logic to me. It's not matrix algebra, and it's not one line code. Perhaps it's good for object oriented programmers.

SQL is best for data aggregation, especially biggish data. Like if your data are in relational schema, don't merge them in a statistical tool. Do it directly from SQL.SQL won't let you merge datasets when unique id's don't match, unless you allow for left and right joins. STATA can often produce messy merges-dropping stuff, or lots of missings. Don't ask me why.

SAS is good for data manipulation-transposing, reshaping.

STATA is good for built in stats. It just takes way fewer commands to run an estimation in STATA than in anything else. Hands down.

MATA and MATLAB are good for coding up your own estimators. If you want to change a maximum likelood estimator for some particular distribution, your best bet is to find it in MATA(STATA's Matlab) or MATLAB code and alter it. R would be a bitch. So would SAS.

SPSS just sucks. Don't use it. Who wants to only click their way through life??

No comments:

Post a Comment