Friday, August 3, 2007

A Quiz

Quick! Which player is on steroids:

Home Runs by Age:

Age: 20 21 22 23 24 25 26 27 28 29 30
Player A: 13 27 26 44 30 39 40 34 45 44 24
Player B: xx 16 25 24 19 33 25 34 46 37 33

Age: 31 32 33 34 35 36 37 38 39 40 41 42
Player A: 32 44 39 29 44 38 47 34 40 20 12 10
Player B: 42 40 37 34 49 43 46 45 45 5 26 20

I eagerly anticipate your responses.

Thursday, August 2, 2007

The Great Divide

The subject of Derek Jeter's defensive ability has long been a contentious subject among the more objectively inclined baseball analysts. There has yet to be a defensive metric, no matter how sophisticated, that can show Derek Jeter to be one of the best fielding shortstops in the league. Very few show him to be above average.

Yet, Derek wins award after award and draws rave reviews from many commentators, players, and managers. In the latest round of this puzzling affair, Derek has been voted the best defensive shortstop in a survey of American League managers by Baseball America (hat tip to RLYW for making me aware of this).

Before I dissect why this divide exists, allow me to make three disclaimers:
  1. Derek Jeter is my favorite baseball player of all time. On a related note, The Flip is probably my favorite baseball play of all time.
  2. I am not qualified to gauge Derek Jeter's defense with my eyes. I am only slightly more qualified to talk about his performance in various statistical measures.
  3. My personal belief is that Jeter's defense is somewhere around league average, probably below it. His strengths are his awareness of the game situation, reading pop-ups, charging slow hit baseballs, and perhaps making that cool jump throw from the hole. His weakness is fielding groundballs hit up the middle. It is a very sizable weakness.
With that out of the way, I want to explore not the value of Jeter's defense nor his true talent at fielding, but rather why it is the this divide exists. I can think of a few reasons.

First, I really don't think that American League managers are very qualified to make these judgments. This has nothing to do with their ability to judge talent. It has to do with a simple reality: an opposing manager will see an opposing shortstop as many as 19 times during a season and as few as six. That's just not a very large sample on which to draw.

Secondly, managers are human beings just like the rest of us. They are aware of Derek Jeter's reputation. They've seen the highlights. They saw The Flip. It is impossible for them to completely separate this reputation, largely created by media members in need of an image to sell and a story to write, from their own observations. Confirmation bias and peer pressure set it. When an opposing manager sees Derek execute a jump throw against his team, that registers as evidence for Jeter's greatness. If a groundball rolls past a diving Jeter, that registers as evidence of Jeter's hustle and grit.

Finally, I can't help but shake the feeling that objective measures of Jeter's defense underrate him. Most of these measures are based primarily around the concept of range, Jeter's biggest weakness. These metrics have a harder time with pop-ups, for example. That's one of Jeter's strengths. I don't think that these metrics are wrong enough to justify the opinions of the AL managers, but I do think that it is enough to move Jeter from atrocious to simply below average.

Then again, you should go back and read my disclaimers. I really, really, really don't want Derek Jeter to suck. At anything. Accepting that he probably wasn't a Gold Glove caliber shortstop, despite the adulation, is one of the hardest things I've ever had to do. I don't mean this to sound sappy or to equate it with decisions and actions that are both difficult and important. Certainly, baseball is unimportant in the grand scheme of things. Certainly, this is not one of the most important things I've ever done, not even close.

However, forcing yourself to accept something that you wish weren't true, to which you have a sizable emotional attachment, even though there exists a large body of people willing to confirm your bias, is a monumentally difficult task. For me, it represented a commitment to making sure that my opinions and beliefs were never based on what I wanted to be true, but only on that which could be shown to be true. The inability to do this is perhaps the cause of the majority of the problems in society today. I can only pray that my efforts to discover and accept the truth are successful, even if it is about something as silly as baseball.

Tuesday, July 17, 2007

By God, I love Jason Whitlock

I don't always agree with Mr. Jason Whitlock of the Kansas City Star. That being said, everyone should read what he says right here in answering questions #2 and #3.

Well said, sir. Well said.

Monday, July 16, 2007

Does anyone here know math?

ESPN's MLB page has a poll up asking viewers which team will be the next team in Major League baseball to lose its 10,000th game as a franchise. The choices are the Braves (9681 losses), Cubs (9425 losses), and Reds (9341 losses).

Only 47% of responders have voted for the Braves, presumably because they are a perennially good team and the Cubs and Reds are not. There's only one problem: the Braves are 276 losses "ahead" of the Cubs, with only 319 losses to go. The Cubs would have to lose at nearly twice the rate of the Braves for about the next five years to catch them. Even if the Braves play at a 100 win per season pace until their 10,000 loss and the Cubs play at a 100 loss per season pace until their 10,000 loss, the Braves will still get there first.

It will take an upset of epic proportions for the Braves not to be the next team to lose 10,000. For whatever reason, people are just too dumb to infer this from even the most cursory examinations of the numbers.

Saturday, July 14, 2007

Pass

Gary Sheffield has once again proven himself to be at best ignorant and at worst a racist. I really have nothing more to say. At this point, Gary does a better job at making my point than I do.

Thursday, July 12, 2007

Statistical noise and its implications

There's an interesting discussion going on here (may be Insider only) about the impact of injuries on player performance. The controversy stems from the claims of one of the posters in the comments section of the blog, MGL. MGL claims that there is no statistical evidence that playing hurt affects player performance. Not only is this highly counter-intuitive, but it's also hard to understand the details of MGL's claim, so I thought I'd take a crack at it.

For a little perspective, MGL, Mitchel G. Lichtman, is a well known statistical analyst. He's one of the co-authors of The Book and frequents many of the same baseball forums that I do (not that this puts me anywhere near MGL's caliber as an analyst). I rarely disagree with his analysis (except when it comes to Derek Jeter, for whom he appears to have an irrational hatred) because it's usually based on rock solid reasoning.

In this case, MGL cites a chapter from The Book in which he looks at hot and cold streaks among players. His conclusion is that no matter how you define a hot streak or a cold streak, there is no tendency for players in the midst of a streak to continue that streak going forward. In other words, the past year or three years of performance is a better predictor of the future performance of streaking players that their streak performance. MGL then reasons in the above blog post that if playing hurt significantly effected player performance, this trend could not exist. There would be a way to slice the data that showed a relationship between cold streaks and poor immediate future performance because there would be a relationship: some of those in the midst of cold streaks would be hurt players. We cannot find this relationship, so either it doesn't exist or its effects are so small that they cannot be detected.

Most of the posters on the site have reacted negatively to this conclusion, and understandably so. Again, it's highly counterintuitive. However, I do think that it's correct, inasmuch as you understand what is being claimed. Let's look closer.

Suppose I have a group of ten players. All of these guys have been struggling for two weeks, playing well below their statistical norms. However, five of these guys are hurt, and I hypothesize that this is what is dragging down their numbers. Therefore, I decide to look at the next week of performance for the players and see if it is still below their normal performance.

If being hurt is the cause of the poor performance of the five players and they are still hurt for the next week, then I would expect the numbers for the entire group to still be below expectations. Even if the other five rebound to normal performance, the poor performance of the hurt five will still drag down the numbers. The numbers as a whole will be less affected than they were, but they will still be affected. This will provide me with evidence that playing hurt affects performance.

There is a key clause in the above hypothesis: if the players are still hurt. Perhaps when I run the numbers, the players' totals are right back to normal. In this case, it could be possible the the hurt five are now healthy. In this case, I could try reducing the cold streak to one week and looking at the next week of performance. If playing hurt was the cause of the poor performance, I would now surely see the groups aggregate performance dragged down because I will be analyzing the first week the five were hurt and the second week they were hurt as well.

In other words, if playing hurt does effect player performance, there must be some way to slice this data up such that I can see the effects of playing hurt on the whole group.

But why look at the whole group? Part of this is simply a data problem: we don't have good data on when a player is hurt, how long he is hurt, and to what degree he is hurt. Essentially, in our above example, we know that five players are hurt, but not which ones, and not for how long. The only way we can test for the effect is to look at the whole group of slumping players. The other part of this is a statistical issue. It is important that we look at the whole group, because this provides us with a large enough sample that the effects of pure, random, statistical noise are drastically reduced.

Now there is another problem with the above example. Since the sample is so small, its possible that the five healthy players could all immediately get hot and cancel out the continued cold of the hurt players. I can compensate for this by examining 5000 players instead of ten. Now the odds of this happening are infinitesimal. I can have confidence that random fluctuation between the two periods under examination will not be the cause of failing to detect sustained poor performance by the hurt players in the group. If 2500 of these players are hurt, remain hurt, and it affects their performance, then surely I will see the aggregate numbers of the 5000 affected negatively. This is in essence what MGL has done.

The important thing to understand here is that if player performance is negatively affected by playing hurt, it absolutely must show up to some degree under this analysis or one of four things must be true:
  1. Playing hurt does not affect player performance.
  2. Everyone is always playing hurt. This is likely, to some degree, but is a worthless conclusion, for obvious reasons.
  3. Everyone who is hurt enough to have performance significantly affected isn't playing. The fact that players who are too hurt to play on the disabled list probably can't account for all of our failure to detect the effect of playing hurt, but it certainly goes a long way.
  4. Playing hurt does affect player performance, but so much less and so much less often than random hot and cold streaks that its effect is drowned out in the analysis. This, in conjunction with point three, is the likely explanation. Those players who would be largely affected by injuries go on the disabled list. Those who remain are able to play much closer to their established level. If this weren't possible, they'd be on the disabled list.
It's important to understand this concept. If playing hurt significantly affected player performance, it is impossible that we would be unable to find a way to detect it by looking at cold streaks for all player and seeing if there is a lingering effect. This must be true. There is no way for the math to not work. If we have x players playing at expected level and y players playing below expected level, than the performance of x plus y will also be below expectations, though less so than when looking at just group y. It is impossible for this to be otherwise.

Let's look back at the example. What if instead of five healthy players and five hurt players, we have five hundred healthy players and five hurt players. Suddenly, when the healthy players as a whole rebound to their expected level and the hurt players continue slumping, the aggregate numbers of the group barely reflect this continued slump. The large amount of players slumping for random reasons has drowned out the players slumping for injury reasons. So playing hurt does affect performance, but it is so rarely the cause of poor performance that it we would never, ever assume that a player's poor performance was caused by injury when our only information is that the player is performing poorly.

It is this last point that is important. Ultimately, the argument isn't that playing hurt doesn't affect performance. It's that poor performance is so much more likely to be the cause of random statistical fluctuations that one would never, ever infer that a player is playing hurt from his statistics alone.

Note that if we had a priori knowledge of which players were hurt and for how long, we could run the same test using only these players and see if they maintained their cold streak. This would be the correct way to answer the question of whether or not playing hurt does affect player performance. However, it would still be the case that one could never conclude on the basis of a player's performance alone the he was hurt. We simply cannot find that trend no matter how hard we try.

Wednesday, July 11, 2007

Experience and the reality of disjoint skill sets

It is not uncommon to hear baseball players voice the opinion that those who have never played baseball have nothing to teach those who have. The implication is that those stats-geeks should shut up. They know nothing of "the right way to play the game" or "team chemistry" or "the little things."

This is a silly notion. It assumes the the skill set involved in playing baseball is the same as the skill set involved in analyzing baseball. In fact, this is almost guaranteed to be false. Playing baseball is an activity with a very aggressive form of natural selection on a player's physical talents. You have to be able to hit or pitch or field at an elite level to be a ball player. Analyzing baseball is a strategic activity that imposes none of the above physical survival constraints on those engaging in it. In order to excel at both, you must be both physically and strategically adept. Not everything is like this. In chess, the physical survival constraint is practically non-existent: even a quadriplegic can play.

What's amazing to me isn't that these two baseball skill sets are disjoint. It's that people have a hard time accepting that they are. To make this easier, allow me to present the analogy that inspired this post:

When one sets out to build a really grand, tall, expensive building, one hires an engineer. This engineer designs the building from the ground up, probably in conjunction with many other people: architects, other engineers, etc. He has specialized training in constructing really grand, tall, expensive buildings. He also may have never picked up a hammer in his entire life.

So when the time comes to actually build the really grand, tall, expensive building, one will not ask the engineer to actually build it. One will hire a construction crew. These crews have a lot of hands-on experience in the practice of building really grand, tall, expensive buildings. They can frame, nail, drill, pound, pour, and weld way better than the engineer can. They know far more about the practice of construction than the engineer does.

So why don't we let these guys design and plan really grand, tall, expensive buildings? Because no matter how well you can frame, nail, drill, pound, pour, or weld, none of it will matter if you're building has a fatal engineering flaw. These two skill sets are disjoint. Sure, you may find a brilliant engineer who is also a battle-hardened construction worker or a life-long construction worker who has a talent for engineering. However, you would never infer one from the other. Not unless you want your really grand, tall, expensive building to end up a twisted heap of steel and glass.

This is why it's so frustrating to see opinions and commentary from people whose only baseball experience is playing baseball presented as irrefutable fact. Regardless of how talented a baseball player is at playing baseball, that doesn't tell us a whole lot about their strategic and analytical skills.