Thursday, May 31, 2007

Another (Brief) View of True Talent

From comment #78 on this thread:
When we talk about finding a player’s “true talent” what we really mean is finding the best predictor of what that player will do in the future.
I couldn't have said it better myself. And Lord knows I've been trying.

Wednesday, May 30, 2007

Does True Talent Work?

So far in our quest to create a model of a baseball player, we've come to a few key conclusions:
  1. Our model will never be perfect, but it will be simple and sufficiently accurate.
  2. Our model will use a player's "true talent" at a given skill, such as getting on base or hitting home runs.
  3. Our model will estimate "true talent" from existing data, under the assumption that this past data reflects a player's "true talent."
It's the last of these points that is the trickiest, and we'll save it for a later date. The first two are more or less defining what it is that any good model should do: it should simplify a problem to the point that accurate conclusions may be easily drawn.

In fact, the first two points are all we need to start talking about hypothetical situations. For example, to drive home a point about the connection (or lack thereof) between closer performance and save totals, one might start by assuming that there exist closers with true runs per inning rates of 2.0, 4.0, and 6.0. We would then be able to insert these closers into any number of situations and talk about the effect on their save totals.

But does this work? Does this simplification actually model player behavior? I think the answer is "Yes." Let's walk through why that is.

First, let's examine the problem from the point of view of the atomic unit of a plate appearance: the pitch. On any given pitch, there are a myriad of factors that go into determining the outcome of that pitch: the current wind speed, the hitter's mind set, the umpire's expiring parking meter, etc. Furthermore, there are a myriad of possible outcomes for each pitch: dribbler down the third base line, swinging over the pitch, wild pitch under the catcher's glove, etc.

A perfect model would take every possible factor into consideration and provide is with the exact outcome. By varying the inputs, we could study the effect of anything perfectly. Naturally, this is impossible.

So what can we do? First, we start eliminating the least useful factors. As we trim these from our model, our model starts changing its output. Now, instead of outputting the exact result, it outputs the range of possible results each with an associated probability of occurring. After a while we've trimmed our set of input factors down to a few very measurable, understandable things.

Now we have another problem: no matter what input we provide, our output contains a near infinite number of possible outcomes, each with a stupendously small probability of occurring. We start tackling this problem almost the same way. We identify common elements in the outcomes and group them accordingly. For example, a ten-foot bunt single on the third base side of the mound is roughly the same as an eleven-foot bunt single on the third base side of the mound. We call this new outcome "ten to eleven foot bunt single on the third base side of the mound" and combine the probabilities of all outcomes that fall in this category. As we go, we keep simplifying away outcomes that don't add any extra useful information.

The point here isn't what we're simplifying toward. The point is that as we add unknowns to our equation, we introduce variability to the results and that we can counteract this variability by focusing only on relevant differences in outcome. Since every exact input produces infallible output and each possible input has an infallible probability associated with it, our results are also infallible even if the exact result remains uncertain.

As a trivial example, let's model the coin flip. We have factors such as the volume and shape of the coin, the density throughout the volume of the coin and the force with which the coin is struck. When we account for all these factors, we successfully predict each coin flip. However, what if instead of giving the model a detailed physical description, I give it every physical description with the probability that each physical description will end up as input. Now my model gives me a whole host of outcomes, each with a probability of occurring.

Of course, I don't care about a ton of the outcomes in the coin flip. I don't care how far away it landed from where it was flipped. I don't care about what angle the coin landed at. I don't care about if the coin was deformed at all during the flip. All I care about is whether or not the coin came up heads or tails. So I simplify all the results down into two groups: those that came up heads and those that came up tails. Each of these groups has a probability associated with it.

Of course, we only gave the model every possible input for the physical description factor. What if we gave it every possible input for every factor. After grouping our nearly infinitely sized output set into groups of heads and tails, we would now have a probability for heads and a probability for tails that are independent of the input to the model.

So why do we need the model anymore? Well, we can always get a better estimate by giving the model the information that know about for certain. However, no matter what the inputs to the model end up being, the results are always just a set of outcomes and their associated probabilities. That's it.

This is why true talent level works. When we talk about "true talent level", what we are saying is: "Yeah. I know that I don't have a perfect model. But what if I did? And what if it gave me these probabilities? What would we then be able to say about the sacrifice bunt with a runner on second and no one out in a tie game in the bottom of the ninth?"

True talent level starts at the end. It asks the user to make the assumption that if we did have the perfect model, it would have given us this answer. True talent level is the recognition that every event with unknown factors can be reduced to a series of results with associated probabilities. Because of this, we can talk about hypothetical situations ad nauseam simply by assuming that we know the true talent levels involved.

This brings us to point three. We'd much rather talk about real players than hypothetical ones. The really hard question isn't whether or not true talent level is a valid concept. The question is: given that I don't have the perfect model, how can I reliably estimate true talent level? The answer, like most of my posts, is sure to be long, boring, needlessly semantic, and far too metabaseball for everyone but myself.

Stay tuned!

Tuesday, May 29, 2007

Random Musings

  • I hope that one of the things that comes out of the Yankees' dreadful start this year is that the lie is finally put to the idea that a team can accept sub-par performance at a particular position because they are getting excellent performance at other, harder-to-fill positions. This has always been a bad idea in theory, since it wastes whatever advantage a team may be reaping by filling hard-to-fill positions with All-Star caliber talent, but now we see how it hurts you in practice. The Yankees are (shockingly) struggling to score runs. In particular, they've gotten terrible performance at second base, which was one of those positions that they expected would "pick up the slack" offensively for first base. Now they're getting production from neither second nor first, and it shows in the team's record. If they had acquired an adequate offensive first baseman, they would be much better prepared to suffer through Robinson Cano's craptastic start. You can't ever afford to punt a position because you expect others to make up for it.
  • Todd Jones Watch: he's third in the AL in saves with 15. Of course, he's blown three as well. He's pitched 21.1 innings, allowing 21 hits and 9 walks while striking out only 7 men. He's probably fortunate that his ERA is only 4.66. Again, pitching like crap, racking up saves. It's not that hard, folks.
  • The Replacement Level Yankees Weblog had a very telling post about Johnny Damon's range in centerfield this year. Normally, I'd blow something like this off as a small sample, but the graph is pretty stark and is supported by the observations of those following the Yankees regularly. Damon is obviously hurt and should not be playing centerfield. Again, it would be easier to stomach Melky's bat in the lineup if first base was hitting at even replacement level (which it isn't).
  • To this point, the Yankees have played (according to runs scored and runs allowed) like a 26-27 win team (with 21-22 losses). They've been roughly 5.5-6.5 games worse than one would expect. Would anyone be worried if this team were 27-21? No. This team is going to start playing well, it's just probably dug far too large a hole to get out of.

Wednesday, May 23, 2007

Measuring True Talent

We've talked in previous posts about the need to have an accurate player model for objective player analysis. Furthermore, we've established that this player should reflect a player's "true talent level"; that is, it should reflect a player's level of proficiency at a skill under average conditions.

The next step is to measure a player's true talent level. For example, if we are trying to study the effects of out-making among different players in baseball, we need a model of how players make outs. This model needs to accurately reflect the differences between players at this skill. Those players who don't make many outs should make few outs in our model. Those players who make many outs should make many outs in our model.

I know that all of this sounds trivial. In fact, you've probably already leaped many steps ahead. The reason I'm being so particular about these points isn't that they're hard to understand, it's to drive home that nature of the process. This is crucial for when we begin to model more complex behavior.

For something simple like out-making, our model can simply provide us with the probability that an out will be made given an opportunity to make an out. This sort of binary outcome (out, not out) is well explored problem and provides many excellent statistical properties.

So how do we assign out-making probabilities to each player in our model? How do I know what Derek Jeter's true out-making talent level is?

The answer: I don't, nor can I ever know for certain.

This is a very unsatisfying answer, but like most things in science, we can cheat. We can estimate Derek's true talent level if we make a few assumptions.

First, we can assume that if Derek Jeter's true talent level at out-making is 60% (that is, he makes outs 60% of the time he's given the opportunity to) that in real life, he will make outs around 60% of the time, given average conditions. Secondly, we can assume that even though he doesn't ever see "average conditions", he sees enough of a variety of conditions that they tend towards normal in the aggregate.

This two points imply two other very important points. First, due to the fact that we don't have an infinite number of trials on which to examine a player, our estimation is only guaranteed to be in the right neighborhood. There is probably going to be some error. Secondly, the more trials we have, the more likely it is that the overall conditions will tend towards average. This is not guaranteed and needs to be verified when doing research. This may lead to adjustments in the way data is analyzed. For example, a player who played half his games in the pre-humidor Coors Field cannot be said to have played under "average conditions" no matter how many games he plays.

So after all this, what are we going to do? We are going to estimate a players true talent level by looking at all the relevant data points in that player's career and assuming that they are an accurate reflection of a player's true talent. The more points we have, the more confident we can be in our estimation. So for Derek Jeter, we estimate that given an opportunity to make an out, he will do so 61% of the time (his OBP is 0.390 for his career). We can apply this process similarly for other true talent levels, like home run rate or stolen base success rate.

This is a crude first attempt at a model, but it does illustrate to process and leaves with a couple of places to go. First, how can we quantify our confidence in our estimation? Secondly, how can we make adjustments to the data points to improve the model? These aren't easy questions. For now, the important point is that even this simple model provides us with a starting place for objective analysis.

As a final point, it is important to note that we can be confident in this model because if it were wrong the results would have to show up in the real world. If Derek Jeter's true talent level at out-making is actually 70%, it is massively, apocalyptically unlikely that he would actually only make outs 61% of the time over the course of his career. Now, perhaps his true talent level is actually 60.75% or 61.3%. This could be within our range of acceptable error. However, we are not likely to be significantly wrong. This point is important because when we begin turning our model into tools that can be used to make decisions, conclusions that we don't like can't be hand-waved away by saying that the underlying model must be significantly wrong in this case. It might be, but only a fool is going to bet against it.

**EDIT** Fixed some typos and added some language to clarify some sentences.

Tuesday, May 15, 2007

As If On Cue...

...Baseball Prospectus has start a series of articles explaining their statistics. This dovetails nicely with my summary of my favorite statistics that I've just added here. In fact, the majority of my statistics come from BPro.

Here's the first installment. It's on VORP.

Sunday, May 13, 2007

Site Update

On the right hand side of the page, you will now notice a brief glossary of statistics that I use frequently. Hopefully, this will be updated as a I reference new statistics.

Wednesday, May 9, 2007

Must Read

This is a fascinating look at infield defense. A couple of things stand out:
  • Very few ground balls are hit up the middle. This would imply that for a shortstop and second baseman, the least essential skill is going to your left for a shortstop and going to your right for a second baseman.
  • The Yankees were well above average in terms of fielding ground balls up the middle last year. This is surprising since most metrics rate Derek Jeter as very bad at this particular skill. Is Robby Cano off the charts at this particular skill? Did Yankee pitchers play out of their minds? Or Is Derek actually good at this, all other analysis to the contrary? I really want to see this explained.
  • The Yanks biggest weaknesses were between short and third and right down the third base line. This may imply that A-Rod has terrible range at third base and/or that Derek Jeter's jump throw from the hole is inadequate to make up for range in that direction.
  • Cleveland was unspeakably horrible.

Sunday, May 6, 2007

The Holy Grail

Before I delve too deeply into what we actually have as far as modeling player performance, I wanted to take a brief interlude to muse about what we would have, if it were possible.

Remember, the goal of developing a player model is to create a system that allows us to answer questions that we have about player value, strategic team decisions, tactical game decisions, etc. The better our system performs, the better we could answer these questions.

The system has three basic parts: the model itself, the input to the model, and the output from the model. By playing with the input to the model, we can isolate variables and test hypothesis. The results of the model will provide us data for future decision making.

The perfect model for baseball would be physically based. We would input data about a player's height, weight, arm strength, leg strength, how muscles attached to bone, how their tendons were holding up, etc. The model would then be able to exactly simulate this players performance. We would input the data for Mark McGuire and Randy Johnson and the model would tell us exactly how McGuire would perform against Johnson because it would be able to simulate 1,000,000 match ups between them perfectly.

Naturally, this is totally impossible (for now). We just don't have the data that we need, nor probably the computational power to be that precise. Trying to simulate synapses firing, muscle response, hand-eye coordination... It's a daunting task.

Similar to trying to simulate Manny Ramirez's mindset though, it's also unnecessary. Instead we can start our model from the moment that the player exerts influence on the model and stop it at the moment they cease to influence the model.

Instead of modeling Randy Johnson, we could model his fastball. We could measure its velocity, where it leaves his hand, the rotation on the ball, etc. How we got to that point would be largely inconsequential. We would have data that's much easier to measure and work with. We could also grossly simplify the physical simulation. Instead of running through a massive, complex simulation of Newtonian physics, we could instead measure what happens to baseballs that are thrown with a particular velocity, from a particular location, with a particular spin and see what the results of the play are. Our model would map the state of the model from the moment Randy Johnson ceases to exert influence on the baseball game to all the possible results of that play (with associated likelihoods, of course).

This is where we're headed. Instead of trying to deal with blunt instruments that are prone to error, like hits and walks and home runs, we want to deal with really precise measurements, like how hard the ball leaves Albert Pujols' bat or how much break Phil Hughes' curveball has. Modeling these events frees us from ambiguities introduced by other actors in the system, like Carlos Beltran robbing Pujols of a double in the gap, or an umpire calling Hughes' wicked 12-6 curve a ball when it should be a strike. When we eliminate the factors beyond our subjects' control from the equation we get closer and closer to the perfect model.

Eventually, this data will be available; I have no doubt of this. We already have people analyzing play-by-play data moving in this direction. There is too much money at stake in baseball for teams not to invest heavily in collecting the most granular data they possibly can. The teams that do it first will have an incredible leg up.

We don't have this luxury, not yet. For now, we get to deal with hits and walks and home runs. Our player model will have to have ways of dealing with all the noise that goes along with them.

Friday, May 4, 2007

True Talent Level

Last post, I began talking about what the purposes and limitations of scientific analysis are. Essentially, the purpose of science is to develop models of the real world from the basis of empirical observations. We want to apply this idea to baseball players so that we can both analyze past performance to determine what is and has been valuable and to help predict future performance.

Of these two, the latter is probably more important in that it is useful for decision making. However, it's not too useful to make a good tool for predicting performance if one does not first know what performance is valuable.

So what do we want from a model of a baseball player? Essentially, we want a way of describing a player so that we can draw conclusions easily about how valuable that player is in many situations. Now, it's virtually impossible to create a model of a person. People are complex. We can't possible model Manny Ramirez's mind and all of its quirks. Its impossible.

Fortunately, it's also not that useful. We don't care how Manny thinks. We care about how he performs. We only care about his mindset inasmuch as it translates to on-field performance.

The best place to start, therefore, is to try and quantify tangible baseball skills. For any given skill, we will call this quantification of that skill a player's "true talent level." This true talent level is the player's actual ability for a certain skill. For example, if we say that Alex Rodriquez's true talent level for hitting home runs is that he will hit one home run for every twelve plate appearances he receives, then we are saying that this is exactly how Alex will perform if we give him an infinite number of plate appearances. Of course, this is impossible, which is one of the reasons why a player doesn't have the exact same statistics every year.

The other reason is that a player's true talent level is always changing. Players play injured, age, and sometimes sleep funny. Furthermore, many things complicate our measurement of true talent level, not the least of which are the facts that players play in differing stadiums and face different pitchers. Therefore, when we talk about a player's true talent level, we are really talking about our best estimate of their true talent level, given various assumptions and data.

Often when talking about an abstract baseball point, I'll begin with assuming that we know exactly the true talent level of a player for a particular skill. We can do this because even if we can't exactly measure a player's true talent level, he still has one (as far as out model is concerned).

So our first task when analyzing players is to try and quantify their true talent level for various baseball skills. We'll look at the best way to do this later.