Wednesday, May 30, 2007

Does True Talent Work?

So far in our quest to create a model of a baseball player, we've come to a few key conclusions:
  1. Our model will never be perfect, but it will be simple and sufficiently accurate.
  2. Our model will use a player's "true talent" at a given skill, such as getting on base or hitting home runs.
  3. Our model will estimate "true talent" from existing data, under the assumption that this past data reflects a player's "true talent."
It's the last of these points that is the trickiest, and we'll save it for a later date. The first two are more or less defining what it is that any good model should do: it should simplify a problem to the point that accurate conclusions may be easily drawn.

In fact, the first two points are all we need to start talking about hypothetical situations. For example, to drive home a point about the connection (or lack thereof) between closer performance and save totals, one might start by assuming that there exist closers with true runs per inning rates of 2.0, 4.0, and 6.0. We would then be able to insert these closers into any number of situations and talk about the effect on their save totals.

But does this work? Does this simplification actually model player behavior? I think the answer is "Yes." Let's walk through why that is.

First, let's examine the problem from the point of view of the atomic unit of a plate appearance: the pitch. On any given pitch, there are a myriad of factors that go into determining the outcome of that pitch: the current wind speed, the hitter's mind set, the umpire's expiring parking meter, etc. Furthermore, there are a myriad of possible outcomes for each pitch: dribbler down the third base line, swinging over the pitch, wild pitch under the catcher's glove, etc.

A perfect model would take every possible factor into consideration and provide is with the exact outcome. By varying the inputs, we could study the effect of anything perfectly. Naturally, this is impossible.

So what can we do? First, we start eliminating the least useful factors. As we trim these from our model, our model starts changing its output. Now, instead of outputting the exact result, it outputs the range of possible results each with an associated probability of occurring. After a while we've trimmed our set of input factors down to a few very measurable, understandable things.

Now we have another problem: no matter what input we provide, our output contains a near infinite number of possible outcomes, each with a stupendously small probability of occurring. We start tackling this problem almost the same way. We identify common elements in the outcomes and group them accordingly. For example, a ten-foot bunt single on the third base side of the mound is roughly the same as an eleven-foot bunt single on the third base side of the mound. We call this new outcome "ten to eleven foot bunt single on the third base side of the mound" and combine the probabilities of all outcomes that fall in this category. As we go, we keep simplifying away outcomes that don't add any extra useful information.

The point here isn't what we're simplifying toward. The point is that as we add unknowns to our equation, we introduce variability to the results and that we can counteract this variability by focusing only on relevant differences in outcome. Since every exact input produces infallible output and each possible input has an infallible probability associated with it, our results are also infallible even if the exact result remains uncertain.

As a trivial example, let's model the coin flip. We have factors such as the volume and shape of the coin, the density throughout the volume of the coin and the force with which the coin is struck. When we account for all these factors, we successfully predict each coin flip. However, what if instead of giving the model a detailed physical description, I give it every physical description with the probability that each physical description will end up as input. Now my model gives me a whole host of outcomes, each with a probability of occurring.

Of course, I don't care about a ton of the outcomes in the coin flip. I don't care how far away it landed from where it was flipped. I don't care about what angle the coin landed at. I don't care about if the coin was deformed at all during the flip. All I care about is whether or not the coin came up heads or tails. So I simplify all the results down into two groups: those that came up heads and those that came up tails. Each of these groups has a probability associated with it.

Of course, we only gave the model every possible input for the physical description factor. What if we gave it every possible input for every factor. After grouping our nearly infinitely sized output set into groups of heads and tails, we would now have a probability for heads and a probability for tails that are independent of the input to the model.

So why do we need the model anymore? Well, we can always get a better estimate by giving the model the information that know about for certain. However, no matter what the inputs to the model end up being, the results are always just a set of outcomes and their associated probabilities. That's it.

This is why true talent level works. When we talk about "true talent level", what we are saying is: "Yeah. I know that I don't have a perfect model. But what if I did? And what if it gave me these probabilities? What would we then be able to say about the sacrifice bunt with a runner on second and no one out in a tie game in the bottom of the ninth?"

True talent level starts at the end. It asks the user to make the assumption that if we did have the perfect model, it would have given us this answer. True talent level is the recognition that every event with unknown factors can be reduced to a series of results with associated probabilities. Because of this, we can talk about hypothetical situations ad nauseam simply by assuming that we know the true talent levels involved.

This brings us to point three. We'd much rather talk about real players than hypothetical ones. The really hard question isn't whether or not true talent level is a valid concept. The question is: given that I don't have the perfect model, how can I reliably estimate true talent level? The answer, like most of my posts, is sure to be long, boring, needlessly semantic, and far too metabaseball for everyone but myself.

Stay tuned!

No comments: