Basebology (The Study of Baseball): We got both kinds of statistics...

Tuesday, April 24, 2007

We got both kinds of statistics...

One of the things that gets consistently confused when analyzing baseball is the role of certain statistics. In particular, there is one distinction that is constantly blurred, to the great confusion of everyone involved: the difference between forward looking and backward looking statistics.

I like to frame this difference as predictive statistics versus descriptive statistics. Predictive statistics are useful for projecting future performance, but often do not do a good job of describing past performance.

What are the characteristics of a good predictive statistic? A good predictive statistic should be relatively stable from year to year. It should be useful in models designed to predict and plan for upcoming seasons.

What are the characteristics of a good descriptive statistic? A good descriptive statistic should inform about a player's past performance. A good set of descriptive statistics should provide a good feel for how valuable a player has been in the past.

A problem arises when people try to use descriptive statistics to predict and predictive statistics to describe. The latter case is probably more common, because descriptive statistics have been around longer, and it has become common practice to use them to quantify future player value. It is quite common to see someone praise the acquisition of a player because "he's a run producer", meaning that he has had a lot of RBIs in the past.

Unfortunately, RBIs are not very predictive. They are far too context sensitive to be useful in projecting player performance. RBIs are a descriptive statistic: they tell you what a player has done, and they do it quite well. One of the things I want to know about any given baseball game is which players got the big hits. RBIs help with this.

Perhaps even more annoying, because it occurs in the objective analysis community which ought to know better, is the tendency to use predictive statistics to describe past performance. A statistic like DIPS (defense independent ERA, essentially) is excellent for projecting future performance, but it doesn't really tell you what a player has done. Does anyone really care, excepting implications for future performance, if Chien-Ming Wang throws a seven-inning, zero run game with two strikeouts or eight strikeouts? Not really. But a statistic like DIPS will penalize a pitcher for not striking anyone out, regardless of how well the player actually performed.

Of course, not every statistic is purely descriptive or predictive. VORP does a good job going both ways, for example. However, one should always be mindful of the context in which a statistic is being used before using it to draw conclusions.

2 comments:

D.Cous. said...: Two things:

1.) Shouldn't it be "RsBI," and not "RBIs?"

2.) You should put a link in your sidebar to a glossary of terms, for those of us who had to learn stats independent of baseball.

3.) Suprise.

4.) Fear.; April 25, 2007 at 8:41 AM
Anonymous said...: 1) Yes, it should be RsBI, but this annoys some people so much that I throw them a bone and say RBIs...

2) This is a good idea. I'll have to figure out how to do that.

3) FETCH...

4) ...THE COMFY CHAIR!!; April 25, 2007 at 10:12 PM

Post a Comment

Key Stats

ARP
Adjusted Runs Prevented

ARP measures the amount of runs that a relief pitcher prevented from scoring above what an average relief pitcher would have prevented. ARP is adjusted for the situation in which the pitcher was used.

ISO
Isolated Power

ISO is the ratio of extra bases that a player has accumulated to the number of at bats he has received. ISO is essentially a player's SLG minus his batting average. This has the effect of giving a player credit only for extra base hits. ISO is not a useful measure of player value on its own, but is a very effective measure of a player's extra base ability.

OBP
On Base Percentage

OBP is the ratio of the number of times a player reached base safely to the number of opportunities he had to reach base. It effectively measures a player's skill at not making outs. Since outs are a teams most precious commodity, OBP measures perhaps the most valuable and fundamental skill a player can have.

OPS
On Base Plus Slugging Percentage

OPS is a crude metric that simply sums a player's on base and slugging percentages. It is probably the most popular non-traditional measure of overall batting performance due to its simplicity. However, it has drawn criticism from performance analysts for its inaccuracy relative to other advanced metrics and because it works by adding two numbers with different denominators together to produce a conceptually meaningless quantity. It is best used as a quick and dirty estimator of batting prowess.

SLG
Slugging Percentage

SLG is the ratio of total bases that a player has accumulated to the number of at bats he has received. It is essentially a weighted batting average that gives a player more credit for extra base hits.

UZR
Ultimate Zone Rating

UZR is a defensive metric that uses play-by-play data to determine how good a player's defense is. On Fangraphs, it is denominated in runs saved above average.

VORP
Value Over Replacement Player

VORP measures the amount of runs that a player contributed above what a "replacement player" at the same position would produce. VORP considers only offensive contributions.

WARP
Wins Above Replacement Player

WARP measures the amount of wins that a player contributed above what a "replacement player" at the same position would produce. WARP considers both offensive and defensive contributions.

WXRL
Win Expectancy added above Replacement adjusted for Lineup

WXRL measures the amount of wins that a relief pitcher contributed above what a "replacement player" would produce. WXRL differs from WARP because it is adjusted for both the game situation in which the pitcher was used and the hitters that the pitcher faced.

Basebology (The Study of Baseball)

Tuesday, April 24, 2007

We got both kinds of statistics...

2 comments:

Blog Archive

Key Stats

Contributors