Thursday, July 12, 2007

Statistical noise and its implications

There's an interesting discussion going on here (may be Insider only) about the impact of injuries on player performance. The controversy stems from the claims of one of the posters in the comments section of the blog, MGL. MGL claims that there is no statistical evidence that playing hurt affects player performance. Not only is this highly counter-intuitive, but it's also hard to understand the details of MGL's claim, so I thought I'd take a crack at it.

For a little perspective, MGL, Mitchel G. Lichtman, is a well known statistical analyst. He's one of the co-authors of The Book and frequents many of the same baseball forums that I do (not that this puts me anywhere near MGL's caliber as an analyst). I rarely disagree with his analysis (except when it comes to Derek Jeter, for whom he appears to have an irrational hatred) because it's usually based on rock solid reasoning.

In this case, MGL cites a chapter from The Book in which he looks at hot and cold streaks among players. His conclusion is that no matter how you define a hot streak or a cold streak, there is no tendency for players in the midst of a streak to continue that streak going forward. In other words, the past year or three years of performance is a better predictor of the future performance of streaking players that their streak performance. MGL then reasons in the above blog post that if playing hurt significantly effected player performance, this trend could not exist. There would be a way to slice the data that showed a relationship between cold streaks and poor immediate future performance because there would be a relationship: some of those in the midst of cold streaks would be hurt players. We cannot find this relationship, so either it doesn't exist or its effects are so small that they cannot be detected.

Most of the posters on the site have reacted negatively to this conclusion, and understandably so. Again, it's highly counterintuitive. However, I do think that it's correct, inasmuch as you understand what is being claimed. Let's look closer.

Suppose I have a group of ten players. All of these guys have been struggling for two weeks, playing well below their statistical norms. However, five of these guys are hurt, and I hypothesize that this is what is dragging down their numbers. Therefore, I decide to look at the next week of performance for the players and see if it is still below their normal performance.

If being hurt is the cause of the poor performance of the five players and they are still hurt for the next week, then I would expect the numbers for the entire group to still be below expectations. Even if the other five rebound to normal performance, the poor performance of the hurt five will still drag down the numbers. The numbers as a whole will be less affected than they were, but they will still be affected. This will provide me with evidence that playing hurt affects performance.

There is a key clause in the above hypothesis: if the players are still hurt. Perhaps when I run the numbers, the players' totals are right back to normal. In this case, it could be possible the the hurt five are now healthy. In this case, I could try reducing the cold streak to one week and looking at the next week of performance. If playing hurt was the cause of the poor performance, I would now surely see the groups aggregate performance dragged down because I will be analyzing the first week the five were hurt and the second week they were hurt as well.

In other words, if playing hurt does effect player performance, there must be some way to slice this data up such that I can see the effects of playing hurt on the whole group.

But why look at the whole group? Part of this is simply a data problem: we don't have good data on when a player is hurt, how long he is hurt, and to what degree he is hurt. Essentially, in our above example, we know that five players are hurt, but not which ones, and not for how long. The only way we can test for the effect is to look at the whole group of slumping players. The other part of this is a statistical issue. It is important that we look at the whole group, because this provides us with a large enough sample that the effects of pure, random, statistical noise are drastically reduced.

Now there is another problem with the above example. Since the sample is so small, its possible that the five healthy players could all immediately get hot and cancel out the continued cold of the hurt players. I can compensate for this by examining 5000 players instead of ten. Now the odds of this happening are infinitesimal. I can have confidence that random fluctuation between the two periods under examination will not be the cause of failing to detect sustained poor performance by the hurt players in the group. If 2500 of these players are hurt, remain hurt, and it affects their performance, then surely I will see the aggregate numbers of the 5000 affected negatively. This is in essence what MGL has done.

The important thing to understand here is that if player performance is negatively affected by playing hurt, it absolutely must show up to some degree under this analysis or one of four things must be true:
  1. Playing hurt does not affect player performance.
  2. Everyone is always playing hurt. This is likely, to some degree, but is a worthless conclusion, for obvious reasons.
  3. Everyone who is hurt enough to have performance significantly affected isn't playing. The fact that players who are too hurt to play on the disabled list probably can't account for all of our failure to detect the effect of playing hurt, but it certainly goes a long way.
  4. Playing hurt does affect player performance, but so much less and so much less often than random hot and cold streaks that its effect is drowned out in the analysis. This, in conjunction with point three, is the likely explanation. Those players who would be largely affected by injuries go on the disabled list. Those who remain are able to play much closer to their established level. If this weren't possible, they'd be on the disabled list.
It's important to understand this concept. If playing hurt significantly affected player performance, it is impossible that we would be unable to find a way to detect it by looking at cold streaks for all player and seeing if there is a lingering effect. This must be true. There is no way for the math to not work. If we have x players playing at expected level and y players playing below expected level, than the performance of x plus y will also be below expectations, though less so than when looking at just group y. It is impossible for this to be otherwise.

Let's look back at the example. What if instead of five healthy players and five hurt players, we have five hundred healthy players and five hurt players. Suddenly, when the healthy players as a whole rebound to their expected level and the hurt players continue slumping, the aggregate numbers of the group barely reflect this continued slump. The large amount of players slumping for random reasons has drowned out the players slumping for injury reasons. So playing hurt does affect performance, but it is so rarely the cause of poor performance that it we would never, ever assume that a player's poor performance was caused by injury when our only information is that the player is performing poorly.

It is this last point that is important. Ultimately, the argument isn't that playing hurt doesn't affect performance. It's that poor performance is so much more likely to be the cause of random statistical fluctuations that one would never, ever infer that a player is playing hurt from his statistics alone.

Note that if we had a priori knowledge of which players were hurt and for how long, we could run the same test using only these players and see if they maintained their cold streak. This would be the correct way to answer the question of whether or not playing hurt does affect player performance. However, it would still be the case that one could never conclude on the basis of a player's performance alone the he was hurt. We simply cannot find that trend no matter how hard we try.

No comments: