Tuesday, April 24, 2007

We got both kinds of statistics...

One of the things that gets consistently confused when analyzing baseball is the role of certain statistics. In particular, there is one distinction that is constantly blurred, to the great confusion of everyone involved: the difference between forward looking and backward looking statistics.

I like to frame this difference as predictive statistics versus descriptive statistics. Predictive statistics are useful for projecting future performance, but often do not do a good job of describing past performance.

What are the characteristics of a good predictive statistic? A good predictive statistic should be relatively stable from year to year. It should be useful in models designed to predict and plan for upcoming seasons.

What are the characteristics of a good descriptive statistic? A good descriptive statistic should inform about a player's past performance. A good set of descriptive statistics should provide a good feel for how valuable a player has been in the past.

A problem arises when people try to use descriptive statistics to predict and predictive statistics to describe. The latter case is probably more common, because descriptive statistics have been around longer, and it has become common practice to use them to quantify future player value. It is quite common to see someone praise the acquisition of a player because "he's a run producer", meaning that he has had a lot of RBIs in the past.

Unfortunately, RBIs are not very predictive. They are far too context sensitive to be useful in projecting player performance. RBIs are a descriptive statistic: they tell you what a player has done, and they do it quite well. One of the things I want to know about any given baseball game is which players got the big hits. RBIs help with this.

Perhaps even more annoying, because it occurs in the objective analysis community which ought to know better, is the tendency to use predictive statistics to describe past performance. A statistic like DIPS (defense independent ERA, essentially) is excellent for projecting future performance, but it doesn't really tell you what a player has done. Does anyone really care, excepting implications for future performance, if Chien-Ming Wang throws a seven-inning, zero run game with two strikeouts or eight strikeouts? Not really. But a statistic like DIPS will penalize a pitcher for not striking anyone out, regardless of how well the player actually performed.

Of course, not every statistic is purely descriptive or predictive. VORP does a good job going both ways, for example. However, one should always be mindful of the context in which a statistic is being used before using it to draw conclusions.

2 comments:

D.Cous. said...

Two things:

1.) Shouldn't it be "RsBI," and not "RBIs?"

2.) You should put a link in your sidebar to a glossary of terms, for those of us who had to learn stats independent of baseball.

3.) Suprise.

4.) Fear.

Unknown said...

1) Yes, it should be RsBI, but this annoys some people so much that I throw them a bone and say RBIs...

2) This is a good idea. I'll have to figure out how to do that.

3) FETCH...

4) ...THE COMFY CHAIR!!