Basebology (The Study of Baseball): The Holy Grail

Sunday, May 6, 2007

The Holy Grail

Before I delve too deeply into what we actually have as far as modeling player performance, I wanted to take a brief interlude to muse about what we would have, if it were possible.

Remember, the goal of developing a player model is to create a system that allows us to answer questions that we have about player value, strategic team decisions, tactical game decisions, etc. The better our system performs, the better we could answer these questions.

The system has three basic parts: the model itself, the input to the model, and the output from the model. By playing with the input to the model, we can isolate variables and test hypothesis. The results of the model will provide us data for future decision making.

The perfect model for baseball would be physically based. We would input data about a player's height, weight, arm strength, leg strength, how muscles attached to bone, how their tendons were holding up, etc. The model would then be able to exactly simulate this players performance. We would input the data for Mark McGuire and Randy Johnson and the model would tell us exactly how McGuire would perform against Johnson because it would be able to simulate 1,000,000 match ups between them perfectly.

Naturally, this is totally impossible (for now). We just don't have the data that we need, nor probably the computational power to be that precise. Trying to simulate synapses firing, muscle response, hand-eye coordination... It's a daunting task.

Similar to trying to simulate Manny Ramirez's mindset though, it's also unnecessary. Instead we can start our model from the moment that the player exerts influence on the model and stop it at the moment they cease to influence the model.

Instead of modeling Randy Johnson, we could model his fastball. We could measure its velocity, where it leaves his hand, the rotation on the ball, etc. How we got to that point would be largely inconsequential. We would have data that's much easier to measure and work with. We could also grossly simplify the physical simulation. Instead of running through a massive, complex simulation of Newtonian physics, we could instead measure what happens to baseballs that are thrown with a particular velocity, from a particular location, with a particular spin and see what the results of the play are. Our model would map the state of the model from the moment Randy Johnson ceases to exert influence on the baseball game to all the possible results of that play (with associated likelihoods, of course).

This is where we're headed. Instead of trying to deal with blunt instruments that are prone to error, like hits and walks and home runs, we want to deal with really precise measurements, like how hard the ball leaves Albert Pujols' bat or how much break Phil Hughes' curveball has. Modeling these events frees us from ambiguities introduced by other actors in the system, like Carlos Beltran robbing Pujols of a double in the gap, or an umpire calling Hughes' wicked 12-6 curve a ball when it should be a strike. When we eliminate the factors beyond our subjects' control from the equation we get closer and closer to the perfect model.

Eventually, this data will be available; I have no doubt of this. We already have people analyzing play-by-play data moving in this direction. There is too much money at stake in baseball for teams not to invest heavily in collecting the most granular data they possibly can. The teams that do it first will have an incredible leg up.

We don't have this luxury, not yet. For now, we get to deal with hits and walks and home runs. Our player model will have to have ways of dealing with all the noise that goes along with them.

1 comment:

D.Cous. said...: God: "ARTHUR, I WANT YOU AND YOUR KNIGHTS TO CONSTRUCT A USEFUL STATISTICAL MODEL OF A GIVEN BASEBALL PLAYER'S EFFECT ON GAME OUTCOMES."

Arthur: "Good idea, oh Lord."

God: "OF COURSE IT'S A GOOD IDEA!"; May 8, 2007 at 12:59 PM

Post a Comment

Key Stats

ARP
Adjusted Runs Prevented

ARP measures the amount of runs that a relief pitcher prevented from scoring above what an average relief pitcher would have prevented. ARP is adjusted for the situation in which the pitcher was used.

ISO
Isolated Power

ISO is the ratio of extra bases that a player has accumulated to the number of at bats he has received. ISO is essentially a player's SLG minus his batting average. This has the effect of giving a player credit only for extra base hits. ISO is not a useful measure of player value on its own, but is a very effective measure of a player's extra base ability.

OBP
On Base Percentage

OBP is the ratio of the number of times a player reached base safely to the number of opportunities he had to reach base. It effectively measures a player's skill at not making outs. Since outs are a teams most precious commodity, OBP measures perhaps the most valuable and fundamental skill a player can have.

OPS
On Base Plus Slugging Percentage

OPS is a crude metric that simply sums a player's on base and slugging percentages. It is probably the most popular non-traditional measure of overall batting performance due to its simplicity. However, it has drawn criticism from performance analysts for its inaccuracy relative to other advanced metrics and because it works by adding two numbers with different denominators together to produce a conceptually meaningless quantity. It is best used as a quick and dirty estimator of batting prowess.

SLG
Slugging Percentage

SLG is the ratio of total bases that a player has accumulated to the number of at bats he has received. It is essentially a weighted batting average that gives a player more credit for extra base hits.

UZR
Ultimate Zone Rating

UZR is a defensive metric that uses play-by-play data to determine how good a player's defense is. On Fangraphs, it is denominated in runs saved above average.

VORP
Value Over Replacement Player

VORP measures the amount of runs that a player contributed above what a "replacement player" at the same position would produce. VORP considers only offensive contributions.

WARP
Wins Above Replacement Player

WARP measures the amount of wins that a player contributed above what a "replacement player" at the same position would produce. WARP considers both offensive and defensive contributions.

WXRL
Win Expectancy added above Replacement adjusted for Lineup

WXRL measures the amount of wins that a relief pitcher contributed above what a "replacement player" would produce. WXRL differs from WARP because it is adjusted for both the game situation in which the pitcher was used and the hitters that the pitcher faced.

Basebology (The Study of Baseball)

Sunday, May 6, 2007

The Holy Grail

1 comment:

Blog Archive

Key Stats

Contributors