Wednesday, August 22, 2007


If one happens to be listening to a baseball game, one will often hear analysts talking about "confidence." For example, "He just doesn't have a lot of confidence in his fastball right now" or "He doesn't look very confident at the plate." Now, I normally try to take what broadcasters says with an epic grain of salt. However, in this case, their emphasis on confidence is well placed. Confidence is one of the most important aspects of analysis.

Unfortunately, we're talking about a different kind of confidence than they are.

When one deals in the realm of probability, which is to say, when one analyzes anything with unknowable information, one can never predict anything with 100% accuracy.

For example, let's say I have 1,000,000 consecutively numbered balls in a very large hopper. I draw a ball from the hopper and predict that the value on the ball I have drawn will not be exactly one. This is not a very dangerous prediction. In fact, I will be right 99.9999% of the time. However, I will also be wrong 0.0001% of the time. No matter how many balls I stick in the hopper, I will always have some chance of selecting that one ball that breaks my prediction.

The same is true in baseball. As the old adage goes, even a .300 hitter fails 70% of the time. Ignoring the banality of the adage, what it implies is important: we can never be 100% certain of a particular outcome in baseball.

Baseball involves two elements: skill and chance. Some prefer the term "luck" to "chance." I do not, because luck implies some moral element: "good" luck or "bad" luck. Rather, "chance" simply implies that there are things that are beyond a player's control.

Thus, whenever we are examining a player's performance record, we have to take care to account for the fact that some of that player's performance could be due to chance. So how much is due to chance and how much is due to skill? Again, we can never say for certain. However, we can leverage probability to tell us how much uncertainty there is in our estimate.

Let me illustrate what this looks like without any rigorous math, of which I assure you, Dear Reader, there is plenty. Suppose I have an apple sitting on my head and a bow and arrow with someone must attempt to shoot the apple. Having studied this subject repeatedly, I know that an average person can hit the apple without killing me (success!) only 25% of the time. I am in quite a fix.

However, the evil sadist that has put me in the gruesome predicament has given me a way out: I can pick who will be the shooter from a list of anonymous citizens, who have all provided me with their career attempts and successes in the common practice of shooting an apple off of one's head. How should I make my choice? Should I simply choose the person with the best ratio of success to failure?

Of course not! I'll end up choosing someone who shot one apple off of one head. This person will have taken one attempt in their entire life and just happened to succeed. Now, not knowing anything else about this person, should I conclude that they are a 100% apple-shooter? Only if I'm suicidal!

The odds are that this anonymous shooter is actually an average Joe: he probably hits the apple one out of every four tries and happened to get lucky this one time. To be sure, he could be the greatest apple shooter who ever lived. However, I cannot have any confidence in this conclusion based on only one trial.

Let's say I have another anonymous archer who has made 5,000 apple-shooting attempts and succeeded at 4,500 of them. The odds of this guy being a normal 25% shooter are close to zero. Amazingly, it is not guaranteed that this guy is not an average shooter. After all, in a universe of infinite possibilities, some average shooter will hit 4,500 of 5,000 shots. However, I can say with much, much more confidence that this man has a real skill at shooting apples of off heads.

Naturally, given a choice between the two, I'll pick the guy who's almost guaranteed to give me a 90% chance over the guy who has a ominously low chance of being 100% accurate.

So, how do we quantify this?

Well, that's where confidence comes into play. Confidence is the likelihood that our measured value is within a given range. For example, after that one trial, I can say with 99% confidence that our shooter falls withing the accuracy range of 100% ± 90%. Or I can say with 90% confidence that our shooter falls into the range of 100% ± 75%. Or I can say that 50% confidence that our shooter falls into the range of 100% ± 40%.

These numbers are made up, but they reflect the way confidence works: for any given sample we can increase our confidence in the range of values by increasing the size of the range. We can decrease the size of the range by decreasing our confidence level.

In the case of our apple shooters, because we only have one trial on which to base our conclusions about the man with 100% success, we will not be able to get a range of values useful enough to make a decision without destroying our confidence level. This ought to make intuitive sense to you. The more you increase the range, the more likely you are to be right, but the less useful the range is. The more you decrease the range, the more likely you are to be wrong, but the more useful the range is. With so little data, this guy is hardly more valuable than a shooter who's never taken a shot in his life. Without knowing anything else, it's very hard to conclude anything but that he's an average guy who got lucky once.

What of our man with 5,000 attempts? In his case, I might be able to say with 99% confidence that his skill is 90% ± 5%. Or I can say with 90% confidence that his skill falls into the range of 90% ± 2%. Or I can say that 50% confidence that his skill falls into the range of 90% ± 0.1%.

Why do we have more confidence with this guy? Because we have so many more attempts on which to base our conclusions. Again, it makes intuitive sense: the more data we have, the more confident we are.

Here is the important point: the only way to both increase confidence and decrease the range is to add more data points. Period.

Normally, when presenting statistics researchers will choose a confidence level and let the range fall where it is. A 50% confidence level just isn't that useful. Ninety-nine or ninety-five percent are probably the most common.

So why have I spent so many words belaboring this point? Because we're about to explore some bona fide objective analysis and confidence is a foundational concept. Confidence is why it means so little when a guy is 5 for 7 lifetime off of a given pitcher. Confidence is why analysts are so skeptical about a clutch hitting skill. Confidence is why it's so hard to judge how good a reliever is statistically.

So if you don't get confidence, you may want to reread the post (or find a better teacher), because if you don't, you may be the next guy to do this. Or at least, you won't get my upcoming posts.


die Amerikanerin said...

I don't get the last link.

John Lynch said...

Go check out Gary Matthews Jr.'s career stats and then ask yourself:

Why would I pay this man $50M over five years to play baseball for me?

His 2006 performance has "fluke" written all over it, but the Angel's signed him to a large deal anyway. It's a gross misunderstanding of random statistical variation.

D.Cous. said...

Heh heh, you should've titled this post "I have confidence in sunshine, I have confidence in rain." Then again, you don't seem to be into the song lyric thing.