We've talked in previous posts about the need to have an accurate player model for objective player analysis. Furthermore, we've established that this player should reflect a player's "true talent level"; that is, it should reflect a player's level of proficiency at a skill under average conditions.
The next step is to measure a player's true talent level. For example, if we are trying to study the effects of out-making among different players in baseball, we need a model of how players make outs. This model needs to accurately reflect the differences between players at this skill. Those players who don't make many outs should make few outs in our model. Those players who make many outs should make many outs in our model.
I know that all of this sounds trivial. In fact, you've probably already leaped many steps ahead. The reason I'm being so particular about these points isn't that they're hard to understand, it's to drive home that nature of the process. This is crucial for when we begin to model more complex behavior.
For something simple like out-making, our model can simply provide us with the probability that an out will be made given an opportunity to make an out. This sort of binary outcome (out, not out) is well explored problem and provides many excellent statistical properties.
So how do we assign out-making probabilities to each player in our model? How do I know what Derek Jeter's true out-making talent level is?
The answer: I don't, nor can I ever know for certain.
This is a very unsatisfying answer, but like most things in science, we can cheat. We can estimate Derek's true talent level if we make a few assumptions.
First, we can assume that if Derek Jeter's true talent level at out-making is 60% (that is, he makes outs 60% of the time he's given the opportunity to) that in real life, he will make outs around 60% of the time, given average conditions. Secondly, we can assume that even though he doesn't ever see "average conditions", he sees enough of a variety of conditions that they tend towards normal in the aggregate.
This two points imply two other very important points. First, due to the fact that we don't have an infinite number of trials on which to examine a player, our estimation is only guaranteed to be in the right neighborhood. There is probably going to be some error. Secondly, the more trials we have, the more likely it is that the overall conditions will tend towards average. This is not guaranteed and needs to be verified when doing research. This may lead to adjustments in the way data is analyzed. For example, a player who played half his games in the pre-humidor Coors Field cannot be said to have played under "average conditions" no matter how many games he plays.
So after all this, what are we going to do? We are going to estimate a players true talent level by looking at all the relevant data points in that player's career and assuming that they are an accurate reflection of a player's true talent. The more points we have, the more confident we can be in our estimation. So for Derek Jeter, we estimate that given an opportunity to make an out, he will do so 61% of the time (his OBP is 0.390 for his career). We can apply this process similarly for other true talent levels, like home run rate or stolen base success rate.
This is a crude first attempt at a model, but it does illustrate to process and leaves with a couple of places to go. First, how can we quantify our confidence in our estimation? Secondly, how can we make adjustments to the data points to improve the model? These aren't easy questions. For now, the important point is that even this simple model provides us with a starting place for objective analysis.
As a final point, it is important to note that we can be confident in this model because if it were wrong the results would have to show up in the real world. If Derek Jeter's true talent level at out-making is actually 70%, it is massively, apocalyptically unlikely that he would actually only make outs 61% of the time over the course of his career. Now, perhaps his true talent level is actually 60.75% or 61.3%. This could be within our range of acceptable error. However, we are not likely to be significantly wrong. This point is important because when we begin turning our model into tools that can be used to make decisions, conclusions that we don't like can't be hand-waved away by saying that the underlying model must be significantly wrong in this case. It might be, but only a fool is going to bet against it.
**EDIT** Fixed some typos and added some language to clarify some sentences.