schussman.com logo

Meaningful terms

I like this post on regression linear models by Drew Thomas. Drew argues that casual use of the term “regression” doesn’t adequately describe linear modeling:

“Regression” literally means “the act of going back.” If we accept this definition in this context, we have to have something to which we can return. Clearly, this implies discovering the mean – but chronologically, it can only mean discovering the cause, that which came before.

Linear modelling makes no explicit assumptions about cause and effect, a major source of headache in our discipline, but the word itself, consciously or otherwise, binds us to this fact.

It also grates on his sensibilities to hear “regress” used as a verb. “We regressed bar on foo” has always seemed like an awkward phrasing to me, as well. Drew acknowledges that changing terminology is a tough business, but the call to be more precise in the language of our methodologies and models is one that I can get behind.

useR! 2006

The second international conference of R users recently took place in Vienna, and the conference site has now posted slides and abstracts of both the keynotes and the regular presentations. There’s a ton of stuff there: Discussions of R for all sorts of statistical and graphics purposes, using R in teaching, and talks about R in a wide variety of disciplines and practices. It’s a gold mine.

R-Help, 1 April edition

The R-Help mailing list is a wealth of information. While I have no doubt (well, mostly) that the people who frequent the mailing list are all nice people in real life, woe is the newbie who asks a question that is easily answerable by consulting one of a half-dozen arcane texts, conducting an exhaustive search of list archives, or using R’s internal help system. “Read the posting guide!” will be accompanied by a curt response that often suggests how truly easy it was to find this answer for anyone not still working on their own cell division. (Okay, it’s not really that bad, but it can certainly be an intimidating place due to the sheer number of super-smart residents who have little tolerance for perceived time-wasters.)

This all adds up to my not being quite sure how to take today’s April Fool’s posts by list heavyweight Frank Harrell.

I have never taken a statistics class nor read a statistics text, but I am in dire need of help with a trivial data analysis problem for which I need to write a report in two hours. I have spent 10,000 hours of study in my field of expertise (high frequency noise-making plant biology) but I’ve always thought that statistics is something that can be mastered on short notice.

Briefly, I have an experiment in which a response variable is repeatedly measured at 1-day intervals, except that after a plant becomes sick, it is measured every three days. We forgot to randomize on one of the important variables (soil pH) and we forgot to measure the soil pH. Plants that begin to respond to treatment are harvested and eaten (deep fried if they don’t look so good), but we want to make an inference about long-term responses.

There’s more, including a couple of helpful responses, so you know, read the whole thing. The message ends with this conclusion, which is actually fairly representative of a good number of frantic help-me posts: “I would appreciate receiving a few paragraphs of description of the analysis that I can include in my report, and I would like to receive R code to analyze the data no matter which variables I collect. I do value your time, so you will get my everlasting thanks.”

Take-home message: Read the posting guide, design your analysis carefully, and don’t look crossways at Frank Harrel in a dark alley.

Zelig

Looking up some information on ordinal logit models in R today, I came across Zelig. Produced by Kosuke Imai, Gary King, and Olivia Lau, Zelig is a sort of R meta-package that wraps, for example, various fuctions of the MASS or VGAM libraries into a smaller set of commands. The authors subtitled Zelig “Everyone’s Statistical Software,” with the idea that it will make R more accessible while not cutting off any of its power. Zelig appears to be a couple of years old now, but I hadn’t run into it before; maybe I’m hanging around on the wrong mailing lists?

Nice find: Dataninja

Dataninja was just the right site to stumble across tonight. The production of an “economist and (future) economics PhD student,” Dataninja is packed full of good data and workflow stuff: Techniques to convert from spreadsheets to LaTeX code, tips for working with Stata, R pointers (including homemade reference cards), applescripts, programming tools, links to data sets, and more. As they say, read the whole thing.

Packed full. It’s a great resource.


About, the short version

I’m a sociologist. This site is powered by Textpattern, TextDrive and the sociological imagination. For more about me and this site, see the long version.

Syndicate me with any of the following: Atom, RSS, sociology and linklog feeds are available.