Tuesday, July 17, 2007

By God, I love Jason Whitlock

I don't always agree with Mr. Jason Whitlock of the Kansas City Star. That being said, everyone should read what he says right here in answering questions #2 and #3.

Well said, sir. Well said.

Monday, July 16, 2007

Does anyone here know math?

ESPN's MLB page has a poll up asking viewers which team will be the next team in Major League baseball to lose its 10,000th game as a franchise. The choices are the Braves (9681 losses), Cubs (9425 losses), and Reds (9341 losses).

Only 47% of responders have voted for the Braves, presumably because they are a perennially good team and the Cubs and Reds are not. There's only one problem: the Braves are 276 losses "ahead" of the Cubs, with only 319 losses to go. The Cubs would have to lose at nearly twice the rate of the Braves for about the next five years to catch them. Even if the Braves play at a 100 win per season pace until their 10,000 loss and the Cubs play at a 100 loss per season pace until their 10,000 loss, the Braves will still get there first.

It will take an upset of epic proportions for the Braves not to be the next team to lose 10,000. For whatever reason, people are just too dumb to infer this from even the most cursory examinations of the numbers.

Saturday, July 14, 2007

Pass

Gary Sheffield has once again proven himself to be at best ignorant and at worst a racist. I really have nothing more to say. At this point, Gary does a better job at making my point than I do.

Thursday, July 12, 2007

Statistical noise and its implications

There's an interesting discussion going on here (may be Insider only) about the impact of injuries on player performance. The controversy stems from the claims of one of the posters in the comments section of the blog, MGL. MGL claims that there is no statistical evidence that playing hurt affects player performance. Not only is this highly counter-intuitive, but it's also hard to understand the details of MGL's claim, so I thought I'd take a crack at it.

For a little perspective, MGL, Mitchel G. Lichtman, is a well known statistical analyst. He's one of the co-authors of The Book and frequents many of the same baseball forums that I do (not that this puts me anywhere near MGL's caliber as an analyst). I rarely disagree with his analysis (except when it comes to Derek Jeter, for whom he appears to have an irrational hatred) because it's usually based on rock solid reasoning.

In this case, MGL cites a chapter from The Book in which he looks at hot and cold streaks among players. His conclusion is that no matter how you define a hot streak or a cold streak, there is no tendency for players in the midst of a streak to continue that streak going forward. In other words, the past year or three years of performance is a better predictor of the future performance of streaking players that their streak performance. MGL then reasons in the above blog post that if playing hurt significantly effected player performance, this trend could not exist. There would be a way to slice the data that showed a relationship between cold streaks and poor immediate future performance because there would be a relationship: some of those in the midst of cold streaks would be hurt players. We cannot find this relationship, so either it doesn't exist or its effects are so small that they cannot be detected.

Most of the posters on the site have reacted negatively to this conclusion, and understandably so. Again, it's highly counterintuitive. However, I do think that it's correct, inasmuch as you understand what is being claimed. Let's look closer.

Suppose I have a group of ten players. All of these guys have been struggling for two weeks, playing well below their statistical norms. However, five of these guys are hurt, and I hypothesize that this is what is dragging down their numbers. Therefore, I decide to look at the next week of performance for the players and see if it is still below their normal performance.

If being hurt is the cause of the poor performance of the five players and they are still hurt for the next week, then I would expect the numbers for the entire group to still be below expectations. Even if the other five rebound to normal performance, the poor performance of the hurt five will still drag down the numbers. The numbers as a whole will be less affected than they were, but they will still be affected. This will provide me with evidence that playing hurt affects performance.

There is a key clause in the above hypothesis: if the players are still hurt. Perhaps when I run the numbers, the players' totals are right back to normal. In this case, it could be possible the the hurt five are now healthy. In this case, I could try reducing the cold streak to one week and looking at the next week of performance. If playing hurt was the cause of the poor performance, I would now surely see the groups aggregate performance dragged down because I will be analyzing the first week the five were hurt and the second week they were hurt as well.

In other words, if playing hurt does effect player performance, there must be some way to slice this data up such that I can see the effects of playing hurt on the whole group.

But why look at the whole group? Part of this is simply a data problem: we don't have good data on when a player is hurt, how long he is hurt, and to what degree he is hurt. Essentially, in our above example, we know that five players are hurt, but not which ones, and not for how long. The only way we can test for the effect is to look at the whole group of slumping players. The other part of this is a statistical issue. It is important that we look at the whole group, because this provides us with a large enough sample that the effects of pure, random, statistical noise are drastically reduced.

Now there is another problem with the above example. Since the sample is so small, its possible that the five healthy players could all immediately get hot and cancel out the continued cold of the hurt players. I can compensate for this by examining 5000 players instead of ten. Now the odds of this happening are infinitesimal. I can have confidence that random fluctuation between the two periods under examination will not be the cause of failing to detect sustained poor performance by the hurt players in the group. If 2500 of these players are hurt, remain hurt, and it affects their performance, then surely I will see the aggregate numbers of the 5000 affected negatively. This is in essence what MGL has done.

The important thing to understand here is that if player performance is negatively affected by playing hurt, it absolutely must show up to some degree under this analysis or one of four things must be true:
  1. Playing hurt does not affect player performance.
  2. Everyone is always playing hurt. This is likely, to some degree, but is a worthless conclusion, for obvious reasons.
  3. Everyone who is hurt enough to have performance significantly affected isn't playing. The fact that players who are too hurt to play on the disabled list probably can't account for all of our failure to detect the effect of playing hurt, but it certainly goes a long way.
  4. Playing hurt does affect player performance, but so much less and so much less often than random hot and cold streaks that its effect is drowned out in the analysis. This, in conjunction with point three, is the likely explanation. Those players who would be largely affected by injuries go on the disabled list. Those who remain are able to play much closer to their established level. If this weren't possible, they'd be on the disabled list.
It's important to understand this concept. If playing hurt significantly affected player performance, it is impossible that we would be unable to find a way to detect it by looking at cold streaks for all player and seeing if there is a lingering effect. This must be true. There is no way for the math to not work. If we have x players playing at expected level and y players playing below expected level, than the performance of x plus y will also be below expectations, though less so than when looking at just group y. It is impossible for this to be otherwise.

Let's look back at the example. What if instead of five healthy players and five hurt players, we have five hundred healthy players and five hurt players. Suddenly, when the healthy players as a whole rebound to their expected level and the hurt players continue slumping, the aggregate numbers of the group barely reflect this continued slump. The large amount of players slumping for random reasons has drowned out the players slumping for injury reasons. So playing hurt does affect performance, but it is so rarely the cause of poor performance that it we would never, ever assume that a player's poor performance was caused by injury when our only information is that the player is performing poorly.

It is this last point that is important. Ultimately, the argument isn't that playing hurt doesn't affect performance. It's that poor performance is so much more likely to be the cause of random statistical fluctuations that one would never, ever infer that a player is playing hurt from his statistics alone.

Note that if we had a priori knowledge of which players were hurt and for how long, we could run the same test using only these players and see if they maintained their cold streak. This would be the correct way to answer the question of whether or not playing hurt does affect player performance. However, it would still be the case that one could never conclude on the basis of a player's performance alone the he was hurt. We simply cannot find that trend no matter how hard we try.

Wednesday, July 11, 2007

Experience and the reality of disjoint skill sets

It is not uncommon to hear baseball players voice the opinion that those who have never played baseball have nothing to teach those who have. The implication is that those stats-geeks should shut up. They know nothing of "the right way to play the game" or "team chemistry" or "the little things."

This is a silly notion. It assumes the the skill set involved in playing baseball is the same as the skill set involved in analyzing baseball. In fact, this is almost guaranteed to be false. Playing baseball is an activity with a very aggressive form of natural selection on a player's physical talents. You have to be able to hit or pitch or field at an elite level to be a ball player. Analyzing baseball is a strategic activity that imposes none of the above physical survival constraints on those engaging in it. In order to excel at both, you must be both physically and strategically adept. Not everything is like this. In chess, the physical survival constraint is practically non-existent: even a quadriplegic can play.

What's amazing to me isn't that these two baseball skill sets are disjoint. It's that people have a hard time accepting that they are. To make this easier, allow me to present the analogy that inspired this post:

When one sets out to build a really grand, tall, expensive building, one hires an engineer. This engineer designs the building from the ground up, probably in conjunction with many other people: architects, other engineers, etc. He has specialized training in constructing really grand, tall, expensive buildings. He also may have never picked up a hammer in his entire life.

So when the time comes to actually build the really grand, tall, expensive building, one will not ask the engineer to actually build it. One will hire a construction crew. These crews have a lot of hands-on experience in the practice of building really grand, tall, expensive buildings. They can frame, nail, drill, pound, pour, and weld way better than the engineer can. They know far more about the practice of construction than the engineer does.

So why don't we let these guys design and plan really grand, tall, expensive buildings? Because no matter how well you can frame, nail, drill, pound, pour, or weld, none of it will matter if you're building has a fatal engineering flaw. These two skill sets are disjoint. Sure, you may find a brilliant engineer who is also a battle-hardened construction worker or a life-long construction worker who has a talent for engineering. However, you would never infer one from the other. Not unless you want your really grand, tall, expensive building to end up a twisted heap of steel and glass.

This is why it's so frustrating to see opinions and commentary from people whose only baseball experience is playing baseball presented as irrefutable fact. Regardless of how talented a baseball player is at playing baseball, that doesn't tell us a whole lot about their strategic and analytical skills.

Tuesday, July 10, 2007

A softball that I can't refuse...

Seriously, what do people not understand about objective statistical analysis?

There are a myriad of points to ridicule here, but I'd rather use this piece of hack job journalism to point out some favorite fallacies in anti-statistical ravings:
  1. Accuse sabermetricians of hating a player that they love. A quintessential straw-man attack, journalists love to bring up examples of players that they believe throw wrenches in the machinery of objective statistical analysis. Invariably, the players they bring up are universally recognized by objective analysts as great players. Here, this guy actually accuses sabermetricians of hating Ty Cobb. Ty. Cobb. He of the 194 career WARP3.
  2. Accuse sabermetricians of not watching real, live baseball games. Based on purely anecdotal evidence, I'd wager (and wager a lot) that most sabermetricians watch far, far more baseball than non-sabermetricians, even journalists covering baseball. You see, sabermetricians are people who are so crazy about baseball that they spend the time they aren't watching baseball thinking about baseball. For sabermetricians, baseball isn't a job, it's a passion.
  3. Accuse sabermetricians of loving a player who only has good stats. In this case, that player is Barry Bonds.

    ...

    ...

    ...

    Barry.

    Bonds.

  4. Accuse sabermetricians of not caring about the finer points of the sport. Let me ask you a hypothetical question. If I tell you that Pilot A has a 100% chance of getting you safely to your destination and that Pilot B has a 10% chance of getting you killed en route, which one will you choose?

    Now what if I tell you that Pilot A often has bumpy landings, but that Pilot B never does?

    Wait. That doesn't change your mind? You obviously don't care about the intangibles involved in flying a plane.
  5. Accuse sabermetricians of using slide rules (get it?! slide rules!!!!!!!!!!!!!!!!1111111), living in Mama's basement, and liking either Star Trek or Star Wars. Take that you fucking geeks!
Honestly, it makes me happy to know that one day the geeks will win because we are right. And the fact that we are right means that we can build a better, more efficient baseball team. And the fact that we can build a better, more efficient baseball team means that we will make more money. And owners of baseball teams love money. I expect at that point we will get to hear a lot of whining about the good old days, when real men ran baseball teams. And then all those guys will die, and we can all be happy again.

Friday, July 6, 2007

Alternative Statistics

"We keep a log on outfielders who charge the ball and who don't charge, who have accurate arms and powerful arms. We take all that into consideration. This is not a blind experiment to say hey, run until you get thrown out." - Los Angeles Angels' manager Mike Scioscia

Baserunning is often overrated. In particular, you hear broadcasters go nuts talking about the virtues of being aggressive. Most of the time, mindless aggressiveness leads to outs. In fact, it is the Angels themselves who seemed to popularize this style of play. When the won the World Series in 2002, people went bonkers over their "productive outs" approach, ignoring the fact that they pounded the snot out of the baseball in October.

I don't like the Angels. I don't like "productive outs." I don't like "small ball." Most of you probably already know this. However, this shouldn't blind one to otherwise sound research. If the Angels have a plan, and it's objectively sound, and they execute it, more power to them. The type of aggressiveness presented above isn't a bad idea at all. In fact, it's a great idea. That is the way that teams should be coaching their players. It's way better than the mindless aggression espoused by many commentators.

Someone who seeks to understand baseball can never dismiss out of hand research (as opposed to commentary) that runs counter to his or her predisposition or biases. The goal is always to incorporate new research with the old to constantly perfect the art and science of baseball strategy.

Monday, July 2, 2007

Incentive

It is a commonly held belief that the reason that many baseball records based on longevity are safe is that players make so much money compared to the early days of baseball, when many of these records were set, that they have little incentive to hang around long enough to break the record.

I'm pretty sure this would make baseball players the first group of people in recorded history to be disincented by the prospect of making massively more money.

Sunday, July 1, 2007

Misunderstanding "statheads"

Peter Abraham has a great blog, if you're a Yankee fan. As a beat writer for the Journal News, he has access to a lot of information that other fans do not. He uses his blog to publish this information almost instantaneously. If you want need up to the second Yankee information, there isn't a better source.

However, Mr. Abraham has this really bad habit of going out of his way to stick to "statheads" over their comments on his blog. I think this reflects a general habit among print media, particularly beat writers. Peter often responds to statistical criticism by referring to comments made by Yankee coaches or players. That's not too surprising. After all, it's his job to cover these people.

I do think that his criticism is generally misguided. For example, he's been on a crusade against people who supported more playing time for Josh Phelps recently. In his latest mailbag, he refers to to a recent correspondence with a fan who had the audacity to suggest that he knew more, through statistics, about Phelps' defense at first base than Don Mattingly, defensive first baseman extraordinaire in his day and now the Yankees' bench coach.

Here's what's interesting about the exhange: it really doesn't matter how good Don Mattingly thinks Josh Phelps is at first base. What matters is Phelps' performance at first base. Statistics are often an excellent starting point for measuring performance, because they tend to be more objective. If we had a statistics (and we don't) that perfectly measured first base defense, and it showed that Josh Phelps was adequate there, Don Mattingly would be wrong. It's that simple. Since this stat doesn't exist, we do weight Don Mattingly's opinion higher, since he may be capable of observing things that our imperfect stats cannot detect. However, this is also where scouting observations can be misleading: Josh Phelp's may look like complete trash as a first baseman, but that doesn't matter if he actually performs well there. This is the chief market inefficiency that Billy Beane was portrayed to have exploited in "Moneyball."

I'm not saying Phelps is a good first baseman. I suspect Mattingly is right; he probably isn't very good. The important thing to take away from this is that expert opinions cannot change objective truth. This is why it is important to always have some objective base for analysis and to know the limitations of this base. It allows you to put expert opinions in perspective. I would behoove Mr. Abraham to remember this when statheads start disagreeing with the baseball experts in the Yankee organization. No amount of expert hand waving can turn a poor player into a productive one or a productive player into a poor one. No matter how you slice it, it's very, very hard to imagine a world in which Miguel Cairo is more productive at first base than Josh Phelps. The objective evidence is too strongly in Phelps' corner.

It's not hard to see why beat writers like Mr. Abraham lean towards the opinions of men like Don Mattingly and Joe Torre. For starters, they are eminently qualified to speak on the topic of baseball from the perspective of a player and coach. However, and I think this is the key point, Mr. Abraham and others in his profession have a vested interest in promoting the opinions of these men. If we stopped caring about what Joe Torre or Don Mattingly thinks, Pete Abraham is out of a job. I don't think that beat writers intentionally promulgate opinions based on this pressure, but I do think that it surfaces subconsciously in their writing.

What's most aggravating about this anti-statistical bent is that it misunderstands the stathead position consistently. When I say that Alex Rodriguez has been 55.7 runs better than a replacement level player this year, as measured by VORP, this statement is unequivocally true. The question is whether or not VORP accurately measures what it purports to measure. Unfortunately, mainstream writers never tackle this question, instead choosing to throw quotes around from their subjects that have nothing to do with an objective evaluation of anyone's baseball playing ability.

If VORP is perfectly accurate, then it doesn't matter one iota what Joe Torre, Don Mattingly, Brian Cashman, or anyone else thinks about a player's performance. They will be right or wrong inasmuch as their opinions agreed with those inferred from the correct use of VORP. Since VORP isn't perfectly accurate, their opinions matter, but how much? If VORP is 99% correct, or 95% correct, or 90% correct, how much weight should a person's opinion be given? This question can only be answered by a thorough analysis and understanding of VORP and other tools of its ilk. For whatever reason, the print media has largely chosen to duck this issue altogether. Pity. I would absolutely love to read a well-reasoned, objective criticism of statistical tools like VORP.

I won't hold my breath.