Sunday, April 15, 2007

Raison D'Être (or: How I Learned to Stop Yelling and Write About Todd Jones)

Blame Todd Jones.

It's really his fault that I've decided to cave in and start blogging. For whatever reason, he was the straw that broke this camel's back. It was the random, pointless, groundless praise of Todd Jones that sent me from simply existing as a slightly repressed, statistically inclined, baseball fan to my new life as a full-blown blogger. It's his fault.

And so I've started a blog devoted to baseball. This is neither bold nor original; there are many, many excellent blogs which I will probably reference frequently. If a thought appears in this space that strikes you as original or relevant, it probably appeared elsewhere first. I'll do my best to site sources.

So what was about Mr. Jones that set me off? It was this quote (from this article), reference by my father in conversation:

"The Tigers aren't looking for anyone to take Jones' job, not when [he] is 5-for-5 in saves this year, and 28-for-31 with a 1.26 ERA dating back to last June."

That Todd Jones has been on a roll in his last 31 save opportunities is evident from this blurb.

Unfortunately, the implication of this quote, that Jones is likely to continue this success, is not nearly so evident. In fact, the evidence presented is beyond poor. Frankly, it sucks.

It's also typical. This is the exact type of statistic that the media always uses to make a point. It's driven by the story. An aspiring beat writer wants to write a story about Todd Jones (or Joel Zumaya, as the case may be) and so he digs until he finds a statistic that proves the point of his article. It's a complete and total misuse of statistics. Worse, these stats are repeated ad nauseum by fans who don't know any better. At some point, I'm gonna post a baseball fan's primer on statistical analysis.

There are three main errors in this short quote:

  1. Saves are a worthwhile measuring stick for a relief pitcher's performance.

    Saves are rubbish. Instead of getting needlessly theoretical, let me simple illustrate with a simple example.

    Suppose you have a relief pitcher, your closer, who is guaranteed to post a 9.00 ERA. In fact, he does this by alternating one-inning appearances where he surrenders zero runs with one-inning appearances where he surrenders two runs. Unfortunately, he makes $12 million a year, and your general manager has mandated that he close all possible games this year to justify the expense.

    How many saves will your closer rack up? Naturally, this depends on how many chances he gets. These days, a closer on a decent team can get 40+ save opportunities easily. Let's arbitrarily choose 45 opportunities for our example. Being even more arbitrary, let's says that 15 of the saves are one-run saves, 15 of the saves are two-run saves, and 15 of the saves are three-run saves.

    We would expect our closer to convert full half of the one and two run saves. He will convert all of the three-run save opportunities. We expect him to end the season with fully 30 saves. All this despite that fact that he completely, totally sucks.

    Todd Jones is easily twice this good. It's hard to be a closer and not be at least twice as effective as our mythical disaster of a closer. The amount of saves he accumulates (like all closers) is primarily a function of how easy it is to record a save. I know you've always been told it's hard. It's not. I hope this example, while contrived, makes that clear. Almost any pitcher can accumulate a massive save total, given enough opportunities. This does not make him a good closer.

  2. Streak statistics are good predictors of future performance.

    Streak statistics are a great way to cherry pick only the information that helps you make your point. Why is what Todd Jones did prior to last June not relevant? In this case, it's not relevant because it makes Jones' numbers look worse and weakens the writer's point. Todd Jones in April, May and June of 2006: 2.25 ERA in 4.0 IP, 6.00 ERA in 15.0 IP, and 7.62 ERA in 13.0 IP. In this time, he blew only three of 24 save opportunities. Once again, it is totally possible to pitch like crap and rack up saves. It happens every year.

    Furthermore, streak statistics have been repeatedly shown to have virtually no predictive value. When trying to predict a streaking player's immediate future performance, it's best to ignore the streak altogether and simply use the best objective measure available (incorporating data from the streak, of course, just not the streak itself). For reference, read this book.

  3. Small Sample Size.

    This is probably the most consistent issue with conventional use of statistics. In the above sample, Todd Jones has pitched less than 40 innings. It's impossible to derive useful information on the basis of 40 innings pitched. It just isn't enough data. Even an entire season's worth of data isn't enough to effectively analyze a relief pitcher statistically. This is why they are so tough to predict and why they are consistently given contracts that end up being albatrosses (albatri?!).
The proper use of statistics works like this (grossly simplified):
  1. Determine a problem that you would like to analyze statistically.
  2. Derive a statistic that, given reasonable assumptions, will describe the area you wish to analyze while minimizing interference from areas that you do not wish to measure.
  3. Calculate the statistic for each subject.
  4. Let the results speak for themselves, even if you don't like them.
  5. When presenting the information, supply necessary context.
It's the fourth and fifth points that are most consistently violated. When people don't like the results, they tend to throw them out and use a different statistic. When people find a statistic that makes their point, they often leave out the context so that it sound more impressive.

In this case, let's look at two stats: Adjusted Runs Prevented (ARP), and Lineup Adjusted Wins Above Replacement (WXRL), both from Baseball Prospectus.

ARP measures how many runs a pitcher prevented compared to an completely average pitcher. It gives pitchers credit for stranding inherited runners and it penalizes them for leaving runners on base. It is adjusted for quality of competition and the park in which the pitcher was pitching. Details here.

WXRL is similar, but has two key differences: it measures a pitcher's contribution above a replacement level pitcher (who is much worse than an average pitcher), and it measures wins instead of runs, thus accounting for the game situation more completely.

Todd Jones, as measured by ARP and WXRL, since 2000 (picked for its roundness):

Year Team IP ARP WXRL
2000 DET 64.0 14.1 4.469
2001 DET/MIN 68.0 -8.0 -1.482
2002 COL 82.3 0.8 3.459
2003 BOS/COL 68.2 -18.1 -1.031
2004 CIN/PHI 82.1 8.4 3.465
2005 FLO 73.0 22.8 4.852
2006 DET 64.0 3.6 2.298

The thing to notice right away is the extreme amount of variation: in 2003 Todd Jones was probably the worst relief pitcher in baseball. In 2005, he was one of the best. For context, the best relief pitchers in baseball have around 30-40 ARP and usually 6-8 WXRL. Average ARP is around zero, by definition. Replacement level WXRL is around zero, by definition.

So what is Todd Jones? He's a roughly average MLB relief pitcher. Like almost every other relief pitcher, in any given season he is capable of being abjectly horrible or more than adequate through nothing more than pure, random chance (from an analyst's perspective). He isn't the type of guy you build your pen around and he certainly isn't the guy you want to hold up as an example if you're trying to make the Tigers sound scary.

So that's it. That's my inital foray into the jungle of intarweb blogging.

Wish me luck.

Blame Todd Jones.

**EDIT** I can't get the formatting right. For some reason, the statistics mess up the default page format such that even when the stats block ends, the formatting doesn't return to normal. Weird.

**EDIT #2** OK. It seems that changing the template is good enough to fix the funky formatting. Plus, it just looks better anyway.

3 comments:

E. W. Lynch said...

First!

John Lynch said...

Touché.

E. W. Lynch said...

Directly after reading this, I walked back into the laundry where I work and listened to Todd Jones walk his first batter, and then give up three hits and two runs in the top of the ninth to blow a 3-1 lead. They would go on to lose the game in the tenth. What does that make Todd's WXRL just for today?