Friday, April 27, 2007

Baseball and Science

I'm going to take a step back from baseball for a second to do some meta-scientific thinking. This is important because so much of what I talk about when I talk about baseball is essentially scientific analysis. Therefore, it is important to understand what science is and what it provides us.

Webster's defines science as "a department of systematized knowledge as an object of study." That works, but I want to offer a much more concise definition.

Science: logical conclusions drawn from empirical observations.

This definition establishes the two key parts of science: the use of tests and observations to gather data and the analysis of that data to reach reasonable conclusions about what was tested.

It's important to understand the limitations of this approach. Science always entails assumptions, margins for error, limits in precision, and other dirty details associated with measuring something as complex as the real world. Science has to deal with all sorts of interferences from unexpected agents in the systems being tested.

This is not to say that the system is not powerful. Science is an absurdly powerful methodology when properly applied. We have accumulated massive amounts of useful knowledge through science.

But what does science actually leave us with? After all the tests and all the observations and all the analysis, science presents us with a model of reality. These models work given certain assumptions. Newtonian physics holds up quite well for many real world physics problems, but breaks down on the molecular and atomic levels. These models aren't reality, but they provide us with a necessary characteristic for any sort of practical application: simplicity.

The models that science provides us are always grossly simplified compared to what actually exists. Newtonian physics treats complex combinations of many different types of atoms as solid bodies. Objects don't have a "mass;" mass is a measurement designed to quantify something otherwise impossible for us to grasp. Without this simplicity it would be impossible to apply scientific findings in any meaningful way.

So what does all this have to do with baseball?

Science applied to baseball provides us with simplified models and measurements that allow us to analyze baseball in ways we never could before. In fact, in one way, baseball is so much better a subject for science than the physical world: baseball has been categorizing and measuring itself in detail for a hundred years. It's one of most massive controlled experiments in history.

When I talk about Derek Jeter's VORP, it's the same as if I was talking about his mass. Derek Jeter doesn't "have" a VORP. He's a complex individual in a complex system. VORP is one way that we try to quantify his effect on that complex system. Is it a massive simplification? Absolutely. It would be useless if it wasn't.

Science in baseball is often skewered for being a geeky simplification of a grand human endeavor. Of course it is! That doesn't limit its analytical power. We can still reach powerful conclusions from a simple model of baseball.

And with that, I'm going to begin talking about the model that we use for a baseball player, the observations that formed it, and how we can use it to draw conclusions about real baseball.

Thursday, April 26, 2007

A Real Life Example of Why Saves Suck

This item made it into the Newsstand on Baseball Think Factory, and I wanted to echo the thought here:

Dayn Perry writes about "being on pace" in this article. There, he has this tidbit about saves:

4. The Pace: Joe Borowski is on pace for 71 saves. This would comfortably shatter Bobby Thigpen's record 57 saves in 1990.

Why It Won't Happen: What's great about this is that Borowski is on pace for 71 saves and a 10.13 ERA. Talk about conflicting evidence. His season to date provides an object lesson in the flawed nature of the save statistic: you don't necessarily have to pitch well to pile them up.

What Will Probably Happen: Borowski, provided he keeps his job, racks up enough saves to give him the whiff of success (thanks to the quality of the team he's on). In reality, however, he'll be a below-average pitcher by closer standards.

I'm going to be following up on this later, but for now this is an excellent example that underscores my previous point: pitching well and getting saves do not necessarily go hand in hand.

Tuesday, April 24, 2007

We got both kinds of statistics...

One of the things that gets consistently confused when analyzing baseball is the role of certain statistics. In particular, there is one distinction that is constantly blurred, to the great confusion of everyone involved: the difference between forward looking and backward looking statistics.

I like to frame this difference as predictive statistics versus descriptive statistics. Predictive statistics are useful for projecting future performance, but often do not do a good job of describing past performance.

What are the characteristics of a good predictive statistic? A good predictive statistic should be relatively stable from year to year. It should be useful in models designed to predict and plan for upcoming seasons.

What are the characteristics of a good descriptive statistic? A good descriptive statistic should inform about a player's past performance. A good set of descriptive statistics should provide a good feel for how valuable a player has been in the past.

A problem arises when people try to use descriptive statistics to predict and predictive statistics to describe. The latter case is probably more common, because descriptive statistics have been around longer, and it has become common practice to use them to quantify future player value. It is quite common to see someone praise the acquisition of a player because "he's a run producer", meaning that he has had a lot of RBIs in the past.

Unfortunately, RBIs are not very predictive. They are far too context sensitive to be useful in projecting player performance. RBIs are a descriptive statistic: they tell you what a player has done, and they do it quite well. One of the things I want to know about any given baseball game is which players got the big hits. RBIs help with this.

Perhaps even more annoying, because it occurs in the objective analysis community which ought to know better, is the tendency to use predictive statistics to describe past performance. A statistic like DIPS (defense independent ERA, essentially) is excellent for projecting future performance, but it doesn't really tell you what a player has done. Does anyone really care, excepting implications for future performance, if Chien-Ming Wang throws a seven-inning, zero run game with two strikeouts or eight strikeouts? Not really. But a statistic like DIPS will penalize a pitcher for not striking anyone out, regardless of how well the player actually performed.

Of course, not every statistic is purely descriptive or predictive. VORP does a good job going both ways, for example. However, one should always be mindful of the context in which a statistic is being used before using it to draw conclusions.

Monday, April 16, 2007

Perspective and the Residue of Design

"Luck is the residue of design." - Branch Rickey

Could I possibly pick a more cliché quote with which to open a post? Probably not. Nonetheless, it's one of my favorite quotes and serves as an excellent introduction to a topic that tends to be very misunderstood: luck.

In baseball, attributing an event or series of events to "luck" is usually seen as either an excuse or an insult. It tends to offend people. It's often seen as a dodge to avoid tackling the real issue. Instead of analyzing underlying events, chalking something up to luck seems to be the easy way out.

The problem is that "luck" means different things to different people. Worse yet, it should mean different things to different people. These different perspectives are often the cause of misunderstandings when people start attributing "luck" to various events. Let's take a look at each perspective.
  • The Player's Perspective

    From a player's point of view, very little that actually occurs is lucky or unlucky because he is directly able to affect what's happening in the game. Sure, a bad bounce is bad luck and the opposing team's shortstop committing a throwing error is good luck, but the players directly influence events. Even the above examples require interaction from players: to be the beneficiary of a peculiar bounce, a batter must first put the ball into play. That's not luck; it's almost all skill.

  • The Manager's Perspective

    The manager's perspective is more removed than the player's perspective. For example, not only are bad bounces and fortuitous errors highly subject to chance, so are the things that his own players do. A manager can only send a pinch hitter to the plate, he can't will him to get a clutch hit.

    If I send by elite pinch hitter, Homer Offenwalker, a .450 hitter with power in close and late situations throughout his career (yes, I know this is unlikely to be a skill), to the plate with runners on second and third, two outs, down by one in the ninth inning, and he doesn't get a hit, was it a bad decision? Of course not. It's just unlucky that Homer came up empty that time. From Homer's perspective though, it's not luck. He failed, and a lot of that has to do with his skill as a hitter.

  • The Executive's Perspective

    Let's say you're the general manager of the Oakland A's. Your name is Billy Beane. Your shit doesn't work in the playoffs. You are famous for saying that the playoffs are essentially a crap shoot. Is this true?

    From your perspective, it mostly is true. Once a GM puts a team together, he has to sit back and watch it play. If his team clicks in the postseason, he's lucky. If it forgets to slide to avoid a tag at home plate, that's unlucky. As a GM, all you can do is put the best possible team on the field, you can't make them win.

    I feel that this is the crucial misunderstanding between a lot of fans, players, and analysts (objective or otherwise). When objective analysts talk about a team being lucky or unlucky, it's almost always a reference to the GM's perspective, not that of the individual players.

    A team winning more games or fewer games than expected from its run differential is highly subject to chance from an executive's perspective, but the player's are the ones making it happen. They directly contribute to each outcome. From their perspective, little of what happens is luck or chance.

  • The Fan's Perspective

    Most fans naturally assume another perspective, usually the GM's or manager's, when talking about baseball. We put ourselves in another person's position in order to critique their decisions. Most of the critique's that I make here are from the GM's perspective.

    But in reality, my perspective is even more subject to chance. Even the players that a GM acquires or the ticket prices at the ballpark are beyond my control. The only thing I can do is fork over money to make my team competitive.
What Branch Rickey means when he says, "Luck is the residue of design," is that he has no direct control over the events viewed as lucky or unlucky, but that his decisions, his design, put the pieces in place to influence that outcome. Keeping this in mind when I, and others, refer to random chance in an event will go a long way towards understanding the meaning of the analysis in question.

Sunday, April 15, 2007

Raison D'Être (or: How I Learned to Stop Yelling and Write About Todd Jones)

Blame Todd Jones.

It's really his fault that I've decided to cave in and start blogging. For whatever reason, he was the straw that broke this camel's back. It was the random, pointless, groundless praise of Todd Jones that sent me from simply existing as a slightly repressed, statistically inclined, baseball fan to my new life as a full-blown blogger. It's his fault.

And so I've started a blog devoted to baseball. This is neither bold nor original; there are many, many excellent blogs which I will probably reference frequently. If a thought appears in this space that strikes you as original or relevant, it probably appeared elsewhere first. I'll do my best to site sources.

So what was about Mr. Jones that set me off? It was this quote (from this article), reference by my father in conversation:

"The Tigers aren't looking for anyone to take Jones' job, not when [he] is 5-for-5 in saves this year, and 28-for-31 with a 1.26 ERA dating back to last June."

That Todd Jones has been on a roll in his last 31 save opportunities is evident from this blurb.

Unfortunately, the implication of this quote, that Jones is likely to continue this success, is not nearly so evident. In fact, the evidence presented is beyond poor. Frankly, it sucks.

It's also typical. This is the exact type of statistic that the media always uses to make a point. It's driven by the story. An aspiring beat writer wants to write a story about Todd Jones (or Joel Zumaya, as the case may be) and so he digs until he finds a statistic that proves the point of his article. It's a complete and total misuse of statistics. Worse, these stats are repeated ad nauseum by fans who don't know any better. At some point, I'm gonna post a baseball fan's primer on statistical analysis.

There are three main errors in this short quote:

  1. Saves are a worthwhile measuring stick for a relief pitcher's performance.

    Saves are rubbish. Instead of getting needlessly theoretical, let me simple illustrate with a simple example.

    Suppose you have a relief pitcher, your closer, who is guaranteed to post a 9.00 ERA. In fact, he does this by alternating one-inning appearances where he surrenders zero runs with one-inning appearances where he surrenders two runs. Unfortunately, he makes $12 million a year, and your general manager has mandated that he close all possible games this year to justify the expense.

    How many saves will your closer rack up? Naturally, this depends on how many chances he gets. These days, a closer on a decent team can get 40+ save opportunities easily. Let's arbitrarily choose 45 opportunities for our example. Being even more arbitrary, let's says that 15 of the saves are one-run saves, 15 of the saves are two-run saves, and 15 of the saves are three-run saves.

    We would expect our closer to convert full half of the one and two run saves. He will convert all of the three-run save opportunities. We expect him to end the season with fully 30 saves. All this despite that fact that he completely, totally sucks.

    Todd Jones is easily twice this good. It's hard to be a closer and not be at least twice as effective as our mythical disaster of a closer. The amount of saves he accumulates (like all closers) is primarily a function of how easy it is to record a save. I know you've always been told it's hard. It's not. I hope this example, while contrived, makes that clear. Almost any pitcher can accumulate a massive save total, given enough opportunities. This does not make him a good closer.

  2. Streak statistics are good predictors of future performance.

    Streak statistics are a great way to cherry pick only the information that helps you make your point. Why is what Todd Jones did prior to last June not relevant? In this case, it's not relevant because it makes Jones' numbers look worse and weakens the writer's point. Todd Jones in April, May and June of 2006: 2.25 ERA in 4.0 IP, 6.00 ERA in 15.0 IP, and 7.62 ERA in 13.0 IP. In this time, he blew only three of 24 save opportunities. Once again, it is totally possible to pitch like crap and rack up saves. It happens every year.

    Furthermore, streak statistics have been repeatedly shown to have virtually no predictive value. When trying to predict a streaking player's immediate future performance, it's best to ignore the streak altogether and simply use the best objective measure available (incorporating data from the streak, of course, just not the streak itself). For reference, read this book.

  3. Small Sample Size.

    This is probably the most consistent issue with conventional use of statistics. In the above sample, Todd Jones has pitched less than 40 innings. It's impossible to derive useful information on the basis of 40 innings pitched. It just isn't enough data. Even an entire season's worth of data isn't enough to effectively analyze a relief pitcher statistically. This is why they are so tough to predict and why they are consistently given contracts that end up being albatrosses (albatri?!).
The proper use of statistics works like this (grossly simplified):
  1. Determine a problem that you would like to analyze statistically.
  2. Derive a statistic that, given reasonable assumptions, will describe the area you wish to analyze while minimizing interference from areas that you do not wish to measure.
  3. Calculate the statistic for each subject.
  4. Let the results speak for themselves, even if you don't like them.
  5. When presenting the information, supply necessary context.
It's the fourth and fifth points that are most consistently violated. When people don't like the results, they tend to throw them out and use a different statistic. When people find a statistic that makes their point, they often leave out the context so that it sound more impressive.

In this case, let's look at two stats: Adjusted Runs Prevented (ARP), and Lineup Adjusted Wins Above Replacement (WXRL), both from Baseball Prospectus.

ARP measures how many runs a pitcher prevented compared to an completely average pitcher. It gives pitchers credit for stranding inherited runners and it penalizes them for leaving runners on base. It is adjusted for quality of competition and the park in which the pitcher was pitching. Details here.

WXRL is similar, but has two key differences: it measures a pitcher's contribution above a replacement level pitcher (who is much worse than an average pitcher), and it measures wins instead of runs, thus accounting for the game situation more completely.

Todd Jones, as measured by ARP and WXRL, since 2000 (picked for its roundness):

2000 DET 64.0 14.1 4.469
2001 DET/MIN 68.0 -8.0 -1.482
2002 COL 82.3 0.8 3.459
2003 BOS/COL 68.2 -18.1 -1.031
2004 CIN/PHI 82.1 8.4 3.465
2005 FLO 73.0 22.8 4.852
2006 DET 64.0 3.6 2.298

The thing to notice right away is the extreme amount of variation: in 2003 Todd Jones was probably the worst relief pitcher in baseball. In 2005, he was one of the best. For context, the best relief pitchers in baseball have around 30-40 ARP and usually 6-8 WXRL. Average ARP is around zero, by definition. Replacement level WXRL is around zero, by definition.

So what is Todd Jones? He's a roughly average MLB relief pitcher. Like almost every other relief pitcher, in any given season he is capable of being abjectly horrible or more than adequate through nothing more than pure, random chance (from an analyst's perspective). He isn't the type of guy you build your pen around and he certainly isn't the guy you want to hold up as an example if you're trying to make the Tigers sound scary.

So that's it. That's my inital foray into the jungle of intarweb blogging.

Wish me luck.

Blame Todd Jones.

**EDIT** I can't get the formatting right. For some reason, the statistics mess up the default page format such that even when the stats block ends, the formatting doesn't return to normal. Weird.

**EDIT #2** OK. It seems that changing the template is good enough to fix the funky formatting. Plus, it just looks better anyway.