Basebology (The Study of Baseball): Properly interpreting projections

Thursday, February 26, 2009

Properly interpreting projections

Every year, right about now, you see two related phenomenon:

Fans completely misinterpreting the predictions of objective projection systems.
"Experts" releasing hilariously skewed subjective predicitons.

What am I talking about? Specifically this: people, whether interpreting objective projections or making their own subjective projections, fail to discern the difference between saying that no individual team is likely to win 100 games and that the league is likely not to see any team win 100 games; the difference between not projecting a single pitcher to win 20 games and projecting the league not to have any 20 game winners; the difference between projecting that no single hitter is likely to hit 40 homeruns and drive in 120 runners and projecting that the league is likely not to see any hitter hit 40 homeruns and drive in 120 runners.

Do you think all these things are the same? If you do, you are failing to understand a basic concept in probablility. There's no shame in this. Probability is freaking confusing. Unless you're really, really well-versed in it, you're going to screw it up all the time. I know I do. Nonetheless, let's dig a little deeper into this problem.

Let's say I have a game of chance. You roll a 20-sided die. If you roll a natural 20, I will give you twenty dollars. If not, you will owe me one dollar. What are your expected winnings? Well, we would project you to owe me one dollar 95% of the time and win twenty dollars 5% of the time. That means your expected winning from any single dice roll is exactly five cents (0.95 x -1.0 + 0.05 x 20.0). There's no ifs, ands, or buts about it. We would project you to win five cents.

But what if 1,000 people play my game? Since the dice rolls are independent, we would project each of them to win five cents, just like in the individual game. But does this mean that we expect no one to win twenty dollars? Of course not! In fact, we would expect 50 people to win twenty dollars, we just can't predict which 50 people will win. Thus, each person expects to win five cents, even though roughly 50 people will win twenty dollars. In fact, the odds of not one person winning twenty dollars are less than one in 10,000,000,000,000,000,000,000.

So what good is our expectation of a five cent win? Simple: that's the expectation that will give us the least error over an infinite number of games. We can't improve on it because the only thing we haven't accounted for in our model of expectation is random chance. Note that this is different from the most likely outcome for one game. The most likely outcome from one game is that you lose a dollar. Nonetheless, we can't just ignore the (relatively) massive twenty dollar payout, even if it is (relatively) rare. That's how expected value works. It aggregates all possibilities into one number that has properly weighted all outcomes.*

The situation appiles exactly the same way to baseball predictions. Some pitchers are going to win 20 games (probably), we just don't know which ones. Some teams are going to win 100 games, we just don't know which ones. Some hitters are going to hit 40 home runs and drive in 120 runners, we just don't know which ones.

That's why when fans react negatively to a projection system that predicts no pitcher to win 20 games, they are making a big mistake. And it's also why when you see Joe Expert predicting some team to win 100 games or some pitcher to win 20 games, he's making a big mistake. Yes, these events are probably going to happen, but it's a fool who thinks he can pick exactly which player or team is going to do it.

There are too many variables that go into a baseball season to determine which teams and players will hit the highs and lows. That's why any sane projection gives numbers that fit in a much more narrow range than you will end up seeing in the real season. There's nothing wrong with this. There's nothing else you can do. If all that remains is random noise, then it will be impossible to improve on your projection's accuracy.** If you could eliminate it, it wouldn't be random would it?

If you can, it often helps to look at player and team projection in terms of percentiles instead of mean expectation. This helps you get a sense of the uncertainty in the projetion. You can more easily see highs and lows and see a variety of outcomes. However, if all you're looking at is the mean expectation, try to keep in mind my simple dice game. Even though you won't have the shape of the curve involved, you'll still understand that this is the expectation that will give you the least error when the real results finally come in.

* Note that expected value is not necessarily the right variable to use for decision making. Value and utility are not necessarily the same thing. Furthermore, it is not true that the utility of the expected value of a particular choice is the same as the expected utility. For both baseball and my dice game, the two are likely to be close enough to be interchangable. For more information, read about the St. Petersburg paradox (which is awesome, by the way).

** That's not to say that any projection system out there is truly left with nothing but random noise. There may be opportunities for genuine improvement, but these opportunities can never fully overcome randomness. Furthermore, these improvements must be derived through rigourous statistical and scientific processes, not someone's gut feeling or some arbitrary pattern they've pulled out of thin air. These are not improvements.

2 comments:

D.Cous. said...: I'm not sure I understand (surprise). You seem to be claiming that predicting which specific teams will do well (say, the 100 wins) is a fool's errand, because of random chance, or luck. This may be true, but I think it's less true than you seem to be implying, because not all teams are the same. Isn't it more like a dice game where some of the dice are loaded, and are more likely (though not 100% likely, mind you) to roll twenties?

In that case, assuming that you can know which players have the loaded dice, wouldn't picking them be a totally legitimate exercise? Sure, they might just come up with nothing, and someone with a non-loaded die might still roll a twenty, but it's still more likely to work out that way than not.

Am I making sense here? Certainly, a lot of people are going to think that their team is the one with the loaded die (say, the best combination of players), and so you'll get a lot of bad predictions, but aren't you also likely to get some good ones, by the people who correctly guess the indicators of a team that is likely to succeed?

Also, who in the world has twenty-sided dice?; February 27, 2009 at 12:00 PM
John Lynch said...: Yes, you're absolutely right. Obviously, some teams are more likely than others to win 100 games. Nonetheless, given the choice between "The Boston Red Sox will win 100 games this year" and "A team that is not the Boston Red Sox will win 100 games this year," you should take the latter almost all the time. It's a rare, special team whose median projection involves winning 100 games.

When an analyst actually picks *the* team that will one 100 games or *the* pitcher that will win 20 games, they are going to be wrong (and deservedly so) more often than not.

Don't mistake the forest for the trees here. Obviously, we have more information about baseball than dice. Yes, if you put a gun to my head, I can make an intelligent guess about which team will win 100 games. Nonetheless, even if you sit down and pencil in the most likely team to win 100 games to actually win 100 games and the most likely pitcher to win 20 games to actually win 20 games, you will still be more wrong more often than the guy who doesn't pick any team to win 100 and doesn't pick any pitcher to win 20 but instead puts down a mean expectation.

Even if we load the dice, as you say, the story doesn't change. Even if some dice are five times as likely to roll 20s, we would never put down 20 as our expectation for those players, even though they are far more likely to actually roll 20 than those with unloaded dies. Thus, when comparing each player's expected result, we will still see no one projected to roll 20, even though it is a near certainty that someone will and even though some players are more likely than others to do so. If a dice "fan" were to start complaining that my system is broken because I don't predict anyone to roll a 20 when *obviously* some has to, he's missing the point.

So, in summary:
* Yes, some teams are better than others.
* The field is almost always more likely to accomplish a particular task than even the most likely participant.
* Thus, a good projection will rarely predict any single team or player to reach an accomplishment that is common from the league's perspective but not from a team's perspective.; February 27, 2009 at 1:27 PM

Post a Comment

Key Stats

ARP
Adjusted Runs Prevented

ARP measures the amount of runs that a relief pitcher prevented from scoring above what an average relief pitcher would have prevented. ARP is adjusted for the situation in which the pitcher was used.

ISO
Isolated Power

ISO is the ratio of extra bases that a player has accumulated to the number of at bats he has received. ISO is essentially a player's SLG minus his batting average. This has the effect of giving a player credit only for extra base hits. ISO is not a useful measure of player value on its own, but is a very effective measure of a player's extra base ability.

OBP
On Base Percentage

OBP is the ratio of the number of times a player reached base safely to the number of opportunities he had to reach base. It effectively measures a player's skill at not making outs. Since outs are a teams most precious commodity, OBP measures perhaps the most valuable and fundamental skill a player can have.

OPS
On Base Plus Slugging Percentage

OPS is a crude metric that simply sums a player's on base and slugging percentages. It is probably the most popular non-traditional measure of overall batting performance due to its simplicity. However, it has drawn criticism from performance analysts for its inaccuracy relative to other advanced metrics and because it works by adding two numbers with different denominators together to produce a conceptually meaningless quantity. It is best used as a quick and dirty estimator of batting prowess.

SLG
Slugging Percentage

SLG is the ratio of total bases that a player has accumulated to the number of at bats he has received. It is essentially a weighted batting average that gives a player more credit for extra base hits.

UZR
Ultimate Zone Rating

UZR is a defensive metric that uses play-by-play data to determine how good a player's defense is. On Fangraphs, it is denominated in runs saved above average.

VORP
Value Over Replacement Player

VORP measures the amount of runs that a player contributed above what a "replacement player" at the same position would produce. VORP considers only offensive contributions.

WARP
Wins Above Replacement Player

WARP measures the amount of wins that a player contributed above what a "replacement player" at the same position would produce. WARP considers both offensive and defensive contributions.

WXRL
Win Expectancy added above Replacement adjusted for Lineup

WXRL measures the amount of wins that a relief pitcher contributed above what a "replacement player" would produce. WXRL differs from WARP because it is adjusted for both the game situation in which the pitcher was used and the hitters that the pitcher faced.

Basebology (The Study of Baseball)

Thursday, February 26, 2009

Properly interpreting projections

2 comments:

Blog Archive

Key Stats

Contributors