- Fans completely misinterpreting the predictions of objective projection systems.
- "Experts" releasing hilariously skewed subjective predicitons.
What am I talking about? Specifically this: people, whether interpreting objective projections or making their own subjective projections, fail to discern the difference between saying that no individual team is likely to win 100 games and that the league is likely not to see any team win 100 games; the difference between not projecting a single pitcher to win 20 games and projecting the league not to have any 20 game winners; the difference between projecting that no single hitter is likely to hit 40 homeruns and drive in 120 runners and projecting that the league is likely not to see any hitter hit 40 homeruns and drive in 120 runners.
Do you think all these things are the same? If you do, you are failing to understand a basic concept in probablility. There's no shame in this. Probability is freaking confusing. Unless you're really, really well-versed in it, you're going to screw it up all the time. I know I do. Nonetheless, let's dig a little deeper into this problem.
Let's say I have a game of chance. You roll a 20-sided die. If you roll a natural 20, I will give you twenty dollars. If not, you will owe me one dollar. What are your expected winnings? Well, we would project you to owe me one dollar 95% of the time and win twenty dollars 5% of the time. That means your expected winning from any single dice roll is exactly five cents (0.95 x -1.0 + 0.05 x 20.0). There's no ifs, ands, or buts about it. We would project you to win five cents.
But what if 1,000 people play my game? Since the dice rolls are independent, we would project each of them to win five cents, just like in the individual game. But does this mean that we expect no one to win twenty dollars? Of course not! In fact, we would expect 50 people to win twenty dollars, we just can't predict which 50 people will win. Thus, each person expects to win five cents, even though roughly 50 people will win twenty dollars. In fact, the odds of not one person winning twenty dollars are less than one in 10,000,000,000,000,000,000,000.
So what good is our expectation of a five cent win? Simple: that's the expectation that will give us the least error over an infinite number of games. We can't improve on it because the only thing we haven't accounted for in our model of expectation is random chance. Note that this is different from the most likely outcome for one game. The most likely outcome from one game is that you lose a dollar. Nonetheless, we can't just ignore the (relatively) massive twenty dollar payout, even if it is (relatively) rare. That's how expected value works. It aggregates all possibilities into one number that has properly weighted all outcomes.*
The situation appiles exactly the same way to baseball predictions. Some pitchers are going to win 20 games (probably), we just don't know which ones. Some teams are going to win 100 games, we just don't know which ones. Some hitters are going to hit 40 home runs and drive in 120 runners, we just don't know which ones.
That's why when fans react negatively to a projection system that predicts no pitcher to win 20 games, they are making a big mistake. And it's also why when you see Joe Expert predicting some team to win 100 games or some pitcher to win 20 games, he's making a big mistake. Yes, these events are probably going to happen, but it's a fool who thinks he can pick exactly which player or team is going to do it.
There are too many variables that go into a baseball season to determine which teams and players will hit the highs and lows. That's why any sane projection gives numbers that fit in a much more narrow range than you will end up seeing in the real season. There's nothing wrong with this. There's nothing else you can do. If all that remains is random noise, then it will be impossible to improve on your projection's accuracy.** If you could eliminate it, it wouldn't be random would it?
If you can, it often helps to look at player and team projection in terms of percentiles instead of mean expectation. This helps you get a sense of the uncertainty in the projetion. You can more easily see highs and lows and see a variety of outcomes. However, if all you're looking at is the mean expectation, try to keep in mind my simple dice game. Even though you won't have the shape of the curve involved, you'll still understand that this is the expectation that will give you the least error when the real results finally come in.
* Note that expected value is not necessarily the right variable to use for decision making. Value and utility are not necessarily the same thing. Furthermore, it is not true that the utility of the expected value of a particular choice is the same as the expected utility. For both baseball and my dice game, the two are likely to be close enough to be interchangable. For more information, read about the St. Petersburg paradox (which is awesome, by the way).
** That's not to say that any projection system out there is truly left with nothing but random noise. There may be opportunities for genuine improvement, but these opportunities can never fully overcome randomness. Furthermore, these improvements must be derived through rigourous statistical and scientific processes, not someone's gut feeling or some arbitrary pattern they've pulled out of thin air. These are not improvements.