Tuesday, June 12, 2007

"Statistical Goulash"

That's the best phrase I've heard yet to describe ESPN's new player ratings. It's borrowed from Nate Silver's take on the endeavor, found here.

Essentially, the ratings are an arbitrary mish-mash of statistics. The author of the stat, Jeff Bennett, has assigned some more or less arbitrary weights to some more or less arbitrary statistics. The resulting ratings aren't necessarily bad, in fact they're pretty close to other quantifications of ratings systems. It's the process that I object to, which means that I found this quote, taken from a chat with Mr. Bennett, extremely frustrating, obfuscating, and just wrong:
Ian, NYC: I don't understand what your point is behind this list. There are people who have put a heck of a lot of science and research into coming up with formulas like this (Win Shares, VORP, etc), while much of what you have selected here is totally arbitrary (why exactly %10 for BA for eaxmple?), and by pretending this is somehow scientific degrades the whole field of work on this subject. Much of what you are including here has been proven to be no reflection on individual player quality (like saves, wins and RBI to a large extent), not to mention penalizing someone because they play on a bad team. Why should anyone take this list seriously?

SportsNation Jeff Bennett: Ian, I think you hit on something. There is no sucjh thing as the perfect way to evaluate a baseball player. Win Shares and VORP are great, but you can ask the same types of questions about their lists. This system is very fluid and puts players in perspective based on where they rank in the majors vs their peers. Nothing more scientific than that.
First, congratulations to Ian in New York City for his intelligent questions. Second, here's what's wrong with the response:
  1. VORP and to some degree Win Shares both withstand the above questions infinitely better than Player Rating precisely because they are empirically derived. VORP is not an arbitrary series of weights. The weights and the weighted statistics used in the derivation of VORP are based off of statistical research that attempts to model run scoring in baseball. ESPN's Player Rating does not have this force of research behind it.
  2. There are a lot of things more scientific than a list that purports to put players in in perspective relative to their peers. For example, you could have a list that actually does put players in perspective relative to their peers. I've blogged about science and baseball before. Under no acceptable definition of science can you simply assert that your results are correct. You must first show the logic and reasoning behind your results so that everyone else has a reason to accept them. We have no reason to believe the Player Rating list because the reasoning behind it is arbitrary and faulty. Jeff Bennett says that RBIs are 5% of a player's overall worth? It must be so! Tell us, Jeff. Where did you get that 5%? Until you do, there is nothing scientific about your statistic.
In summary, I have to again concur with Mr. Silver: this is exactly the type of slavish number mangling that gives "statheads" a bad name. With any luck, people will be smart enough to reject ESPN's Player Rating.

**EDIT** I wanted to take a second to give Mr. Bennett his due for participating in the chat referenced above. Virtually all of the questions that he fielded were intensely critical, some far too personal, and certainly vitriolic. He could have simply fielded a bunch of softball questions, but he played hardball instead. It's very hard to subject something you created to that kind of criticism and for that he deserves a lot of credit.

1 comment:

E. W. Lynch said...

How much credit does he deserve though?

I'm thinking 5% for tolerance, 15% for thick-skinnedness, 10% Open-Mindedness, 30% Chat Endurance, and 45% heart. This gives him an overall credit score of 161.5

Way to go man!