Tuesday, November 25, 2008

Best. Shape. Ever.

And it's not even Thanksgiving yet!
[New York Yankees second baseman Robinson] Canó, Long said, has dedicated himself to physical fitness and is in “immaculate condition.”
Let the good times roll!

Thursday, November 20, 2008

Pujols and Howard

From Thomas Boswell in the Washington Post:

Thirty years ago, I created the statistic Total Average. Now I'm almost ashamed to have been one of the original baseball geeks. Where did we go wrong?

This week, Albert Pujols won the NL MVP Award. Why? Mostly because he had a better OPS and VORP (Value Over Replacement Player) than Ryan Howard. Say what? Meanwhile, back in the real world, the Phils' first baseman had 48 homers and 146 RBI to Pujols' 37 homers and 116 RBI.

Earth to my baseball writing buddies: We all love the new numbers, but lets not worship false idols. When I published my Total Average numbers, I'd always emphasize that while stats were wonderful, common sense was better. When stats WILDLY contradict common sense, always doubts the stats. In the case of the goofy gap between Pujols' VORP of 96.8 and Howard's 35.3, my reaction is "Time to revisit VORP. If it can be this wrong, it's not as good as I tought it was."

Let me pull the key sentence out of that quotation:
When stats WILDLY contradict common sense, always doubts the stats.
No. This is very, very, very wrong. I know I've emphasised this over and over and over again, but it bears repeating: the proper way to use statistics is not to break them out when you already agree with them. The proper way to use statistics is to develop a model or a test that describes something useful a priori and then to let the results speak for themselves.

Let's be clear about this: anything less than that level of rigor is simply catering to your own predispositions. It's engaging in an activity that adds nothing useful to any discussion anywhere at anytime. If you only use a statistic when it validates your point of view, you are being either naïve or intellectually dishonest.

Ah, but what if it's not your opinion that the statistic contradicts? What if it contradicts COMMON SENSE?

Rubbish. Ladies and gentlemen, boys and girls, there is no place in analysis for "common sense." All "common sense" means is that you believe that an idea is so basic that it does not require further argument. It fits some arbitrary set of criteria on which you have been conditioned to evaluate things. It says something about the person making the claim of "common sense" and nothing about the idea itself. "Common sense" is not an argument. It is a plea to avoid having to make an argument in the first place.

Common sense has its place, of course. One would rather not have to make an argument to one's self about why one should walk with the scissors pointed down or look both ways when crossing the street. But when ideas are being challenged, when ideas that you may think are common sense are being called into question, you cannot simply scream COMMON SENSE and expect that this is a rational rejoinder to your challenger. It just doesn't work that way.

Boswell has fallen back on common sense because that's the only argument he has left. He can't tell you why the numbers he hates are wrong from any real analytical perspective, so he has to claim that there is no reason to tell you why they are wrong. It's simply common sense.

The evidence that Boswell does provide forms only a narrow view of the two players. This type of mistake is easy to make. It's easy to focus on one or two of the most important issues in an argument and forget that all the other small issues, while lesser indivudually, may tip the balance in the aggregate. I've made this mistake on this blog many times, I know.

In this case, Boswell focuses on home runs and RBIs to argue that Howard's commanding lead in these categories is all you need to know. In fact, he explicitly exhorts his readership not to think beyond Howard's RBIs, simply because his lead is so large that nothing could possibly make up for it. He has to do this, because the moment you do the analysis you realize just how wrong that view is.

It's true. Ryan Howard had 48 home runs and 146 RBIs to Albert Pujols' 37 HR and 116 RBI. That is a large gap. If the only way to score in baseball were the three run home run, Howard would be the MVP in a landslide.

What happens when you look deeper? Pujols had 187 hits, Howard only 153. Pujols had 44 doubles, Howard only 26. Pujols walked 104 times, Howard only 81. Pujols made 364 outs. Ryan Howard made 475.

And that, friends, is the real kicker. Ryan Howard made 111 more outs than Albert Pujols last year. That is a ton of outs. Albert Pujols didn't just produce runs himself. He provided 111 more opportunities for his teammates to produce runs as well. Think of what your baseball team could do with 111 extra opportunities.

That's why the VORP numbers that Boswell disdains are so whacky. Yes, Ryan Howard produced a lot of runs himself, but he also drastically reduced his teammates' opportunities by using up a gigantic number of outs. In the end, that hurts run production. VORP accounts for that. Boswell's common sense does not.

The numbers don't get better for Howard as you keep digging. Pujols is one of the best fielders at first base. Howard is probably the worst. That counts. Boswell maintains that Howard was the king of the game-changing home run. Yet, when you account for the importance of the situation in the ball game systematically, you find that it's Pujols who was the better bet in the clutch with 6.39 Win Probability Added* to Howard's 2.37.

So in a perverse way, Boswell is right to complain that the difference between the two player's VORP is not accurate. In fact, it's far too kind to Ryan Howard. Once you get past the home runs and the RBIs it is crystal clear that Albert Pujols absolutely dominated Ryan Howard this year.

It's tempting to try to focus an argument down to one or two key sticking points. It's easier, clearer, and more concise. It's also poor analysis. One must account for everything. And when one finds one's opinion or common sense to be at odds with a more complete analysis**, only two legitimate options remain: demonstrate how the previous analysis fails or accept the results.

Boswell, like most contemporary baseball journalists, chooses instead to bury his head in the sand and complain about those geeks that are ruining baseball.

**EDIT** Joe Posnanski has written essentially the exact same post as I have over here, only his is better because he is a talented and professional writer and I am not.

* Win Probability Added is calculated by determining how likely a player's team was to win the game before and and after their at bat. The difference is credited to the player as WPA. You can think of WPA roughly as a raw number of wins that a player contributed to a team through his performance. Personally, I do not think that WPA demonstrates a useful skill (for reasons to lengthy to go into right now), but it does demonstrate that in those situations where the game could be greatly affected, Pujols was much more of a threat than Howard.

** It would be wrong to construe from my argument that VORP is the end all be all of analysis, nor is it meant to construe that there is only one way to perform a complete analysis. There can be many legitimate ways to analyze a problem that attempt to form a complete picture. They may even disagree. However, to not attempt a complete analysis is to pull the rug out from under yourself.

Wednesday, November 19, 2008

Rob Neyer on a roll

Great stuff from Rob today, though it is behind the Iron ESPN Curtain as usual (apparently Rob's blog is free now; awesome!):

What I'm not willing to say -- what I'll probably never be willing to say -- is that Joe Mauer deserved to finish behind Justin Morneau in the MVP balloting again. Two years ago, there was virtually no evidence that Morneau was more valuable than Mauer, yet Morneau finished first and Mauer finished sixth. This year, there is virtually no evidence that Morneau was more valuable than Mauer, and yet Morneau finished second and Mauer finished fourth.

Maybe that's a sign of progress. But for as long as I've been doing this, I've been told that I don't see enough games, that I don't know what it really takes to win, that I don't appreciate the little things that don't show up in the box scores.

And for as long as I've been doing this, every time the MVP voters have a choice between the guy with the power stats and the guy who does the little things, they pick the guy with the big numbers.

This is spot on, and represents perhaps the most absurd aspect of the false dichotomy between the old-timey, out-in-the-sun, scorecard-filling, sunflower-seed-chewing, team-bus-riding, player-interviewing journalist and the new-agey, basement-dwelling, cheese-puff-eating, Internet-surfing, stat-crunching über-geek, a dichotomy created entirely by those same journalists.

Journalists use numbers too. They have to. Everyone has to. They just use different (and inferior) numbers. It's left to the stat geek to attempt to pull the little things out of the vast expanses of data routinely ignored by the mainstream MVP voter. And what's the result? The result is that despite the whining by baseball journalists about how little baseball we watch, it ends up being the stat geek that advocates for an MVP candidate by trying to obtain a complete view of the player and the MVP voter who votes on the basis of gaudy numbers.

Monday, November 3, 2008

Insanity

It's become a common cliché, an oft-cited but unverifiable quote, but it remains relevant in baseball: "The definition of insanity is doing the same thing over and over again and expecting different results."

Why bring this up now? Because you can expect to see the sentiment Tim Kurkjian expresses here echoed a lot during the offseason (emphasis added):
But the Red Sox, one way or another, will contend next season because they have lots of money, lots of young pitching, lots of resources and a much healthier Josh Beckett.
What makes anyone think that Josh Beckett is a good bet to stay healthy over 162 games? I get tired of bringing it up (nb: not really), but Josh Beckett has had exactly one full, healthy, productive year in his entire career. Sure, when he's healthy he's generally effective and sometimes brilliant, but the fact is that he is always battling some ailment or another.

At some point, you have to stop counting on Beckett to be healthy and instead recognize that to continue to make a healthy Josh Beckett the keystone in your quest for a championship is essentially insane.