Basebology (The Study of Baseball): "Statistical Goulash"

Tuesday, June 12, 2007

"Statistical Goulash"

That's the best phrase I've heard yet to describe ESPN's new player ratings. It's borrowed from Nate Silver's take on the endeavor, found here.

Essentially, the ratings are an arbitrary mish-mash of statistics. The author of the stat, Jeff Bennett, has assigned some more or less arbitrary weights to some more or less arbitrary statistics. The resulting ratings aren't necessarily bad, in fact they're pretty close to other quantifications of ratings systems. It's the process that I object to, which means that I found this quote, taken from a chat with Mr. Bennett, extremely frustrating, obfuscating, and just wrong:

Ian, NYC: I don't understand what your point is behind this list. There are people who have put a heck of a lot of science and research into coming up with formulas like this (Win Shares, VORP, etc), while much of what you have selected here is totally arbitrary (why exactly %10 for BA for eaxmple?), and by pretending this is somehow scientific degrades the whole field of work on this subject. Much of what you are including here has been proven to be no reflection on individual player quality (like saves, wins and RBI to a large extent), not to mention penalizing someone because they play on a bad team. Why should anyone take this list seriously?

Jeff Bennett: Ian, I think you hit on something. There is no sucjh thing as the perfect way to evaluate a baseball player. Win Shares and VORP are great, but you can ask the same types of questions about their lists. This system is very fluid and puts players in perspective based on where they rank in the majors vs their peers. Nothing more scientific than that.

First, congratulations to Ian in New York City for his intelligent questions. Second, here's what's wrong with the response:

VORP and to some degree Win Shares both withstand the above questions infinitely better than Player Rating precisely because they are empirically derived. VORP is not an arbitrary series of weights. The weights and the weighted statistics used in the derivation of VORP are based off of statistical research that attempts to model run scoring in baseball. ESPN's Player Rating does not have this force of research behind it.
There are a lot of things more scientific than a list that purports to put players in in perspective relative to their peers. For example, you could have a list that actually does put players in perspective relative to their peers. I've blogged about science and baseball before. Under no acceptable definition of science can you simply assert that your results are correct. You must first show the logic and reasoning behind your results so that everyone else has a reason to accept them. We have no reason to believe the Player Rating list because the reasoning behind it is arbitrary and faulty. Jeff Bennett says that RBIs are 5% of a player's overall worth? It must be so! Tell us, Jeff. Where did you get that 5%? Until you do, there is nothing scientific about your statistic.

In summary, I have to again concur with Mr. Silver: this is exactly the type of slavish number mangling that gives "statheads" a bad name. With any luck, people will be smart enough to reject ESPN's Player Rating.

**EDIT** I wanted to take a second to give Mr. Bennett his due for participating in the chat referenced above. Virtually all of the questions that he fielded were intensely critical, some far too personal, and certainly vitriolic. He could have simply fielded a bunch of softball questions, but he played hardball instead. It's very hard to subject something you created to that kind of criticism and for that he deserves a lot of credit.

1 comment:

E. W. Lynch said...: How much credit does he deserve though?

I'm thinking 5% for tolerance, 15% for thick-skinnedness, 10% Open-Mindedness, 30% Chat Endurance, and 45% heart. This gives him an overall credit score of 161.5

Way to go man!; June 14, 2007 at 11:20 AM

Post a Comment

Key Stats

ARP
Adjusted Runs Prevented

ARP measures the amount of runs that a relief pitcher prevented from scoring above what an average relief pitcher would have prevented. ARP is adjusted for the situation in which the pitcher was used.

ISO
Isolated Power

ISO is the ratio of extra bases that a player has accumulated to the number of at bats he has received. ISO is essentially a player's SLG minus his batting average. This has the effect of giving a player credit only for extra base hits. ISO is not a useful measure of player value on its own, but is a very effective measure of a player's extra base ability.

OBP
On Base Percentage

OBP is the ratio of the number of times a player reached base safely to the number of opportunities he had to reach base. It effectively measures a player's skill at not making outs. Since outs are a teams most precious commodity, OBP measures perhaps the most valuable and fundamental skill a player can have.

OPS
On Base Plus Slugging Percentage

OPS is a crude metric that simply sums a player's on base and slugging percentages. It is probably the most popular non-traditional measure of overall batting performance due to its simplicity. However, it has drawn criticism from performance analysts for its inaccuracy relative to other advanced metrics and because it works by adding two numbers with different denominators together to produce a conceptually meaningless quantity. It is best used as a quick and dirty estimator of batting prowess.

SLG
Slugging Percentage

SLG is the ratio of total bases that a player has accumulated to the number of at bats he has received. It is essentially a weighted batting average that gives a player more credit for extra base hits.

UZR
Ultimate Zone Rating

UZR is a defensive metric that uses play-by-play data to determine how good a player's defense is. On Fangraphs, it is denominated in runs saved above average.

VORP
Value Over Replacement Player

VORP measures the amount of runs that a player contributed above what a "replacement player" at the same position would produce. VORP considers only offensive contributions.

WARP
Wins Above Replacement Player

WARP measures the amount of wins that a player contributed above what a "replacement player" at the same position would produce. WARP considers both offensive and defensive contributions.

WXRL
Win Expectancy added above Replacement adjusted for Lineup

WXRL measures the amount of wins that a relief pitcher contributed above what a "replacement player" would produce. WXRL differs from WARP because it is adjusted for both the game situation in which the pitcher was used and the hitters that the pitcher faced.

Basebology (The Study of Baseball)

Tuesday, June 12, 2007

"Statistical Goulash"

1 comment:

Blog Archive

Key Stats

Contributors