Basebology (The Study of Baseball)

Thursday, September 24, 2009

Let the games begin!

Which is to say, now that the Yanks are in the postseason, let's all start dissecting Alex Rodriguez like he was a stale frog in high school biology! Fangraphs' R. J. Anderson gets the party started with this fun piece. I can just smell the formaldehyde!

Monday, September 21, 2009

Sabermetric groupthink?

This post by J.C. Bradbury collects some comments on the notion of sabermetric groupthink: the idea that those of a sabermetric bent, like myself, tend to unjustly treat those who don't adhere strictly to sabermetric orthodoxy as morons. I'll leave aside the question of whether a particular sabermetric orthodoxy exists. Rob Neyer addresses that question well here. I want to make a more general point.

The overarching sabermetric philosophy is not related to baseball. The core sabermetric principle, as I understand it, is that baseball must be analyzed as a science. That's it. If your analysis of baseball is scientific, it is sabermetric. Of course, the "gotcha" here is that science is empirical. It is very, very hard to have science that spurns numbers of some kind or another because numbers are the language of empirics. Thus, sabermetrics tends to focus on numbers, on the quantitative over the qualitative.

Sabermetrics does not reject qualitative analysis. There is certainly a role for scouting and experience in baseball. No sabermetrician worth his salt disputes this. However, the use of qualitative analysis cannot be an excuse to flout systematic application of scientific principles. Qualitative analysis still must be backed up by empirical research. It must have a sound empirical basis and it must be vetted empirically.

How might a team do this? A good start is trying to systematically quantify how good your scouts are. Which scouts provide the best reports? How much information do these reports provide beyond what is available statistically? Can scouting data be incorporated into a useful model of player performance?

The point is that no matter what analysis you are undertaking, it must be systematic. You must know, in advance, how new data will inform your thinking. Too often this is not the case. Too often numbers are used ex post facto to provide faux-intellectual cover for unsystematic decisions. Too often numbers are used to confirm preexisting biases of those using them. Too often numbers are ignored when they provide evidence that runs counter to a cherished belief. Too often people use scouting and experience as escape hatches to avoid having to deal with the rigors of systematic, scientific analysis.

This is what causes sabermetricians to go crazy. It's not that we can't deal with scouts. It's not that we don't like baseball stories and anecdotes. It's not that we don't think men with experience have nothing to offer. Far from it. No, the problem is that we cannot stand the unsystematic, unscientific analysis that those in highly visible positions often engage in. It's lazy and worse: it's absolutely wrong. It must be shunned wherever it is found.

Let me close with a quotation from Malcolm Gladwell from this interview with Bill Simmons:

That's why I'm such a fan of the "Moneyball" generation of baseball GMs: It's not so much that their analytical tools are brilliant ways of predicting baseball success (and I have my doubts, sometimes), it's simply that they have an analytical tool. And when it comes to personnel evaluation, any tool is better than no tool...

Bingo. The merits of any particular tool, whether it be batting average, on base percentage, VORP, or scouting reports, are always up for debate. The important thing is that you have a tool and that you apply it systematically.

**EDIT** Here is a link to the Ken Rosenthal article that started it all. I like Ken. He does good work. Unfortunately, this article is an example of exactly what I'm talking about above. Ken throws out a bunch of numbers and throws in some other observations for good measure. And the result is... what exactly? How does he propose to use all this information to come up with a decision? Ken doesn't say.

Let me highlight this extended quote:

The first criterion for the award is "actual value of a player to his team, that is strength of offense and defense." Twenty-four of Mauer's 114 starts this season — more than one-fifth — have been at designated hitter, a position that requires no defense. Mauer also trails other candidates in the second criterion, number of games played.

When Mauer first stepped onto the field on May 1, the Twins already were 22 games into their season. Mauer obviously cannot be faulted for needing to recover from offseason kidney surgery, but two other MVP contenders — Tigers first baseman Miguel Cabrera and Jeter — have appeared in 141 and 139 games, respectively. Mauer has appeared in 120.

Am I nitpicking? Perhaps. But Mauer's absence in April, combined with his time at DH, raises the possibility another candidate may — repeat, may — be worthier. It certainly creates the opportunity for debate, which is my entire point.

Gee, if only we had a systematic way to weigh all these factors (playing time, quality of performance, positional adjustments, etc) to come up with an answer to our question! Oh, shit, we do! We have tons of them, and they all originate in sabermetrics.

So is there still room for debate? Of course there is! None of these systems are complete. They all have weak spots. Some are better than others. We can debate the merits of any particular system until the cows come home. The point is that you can't just throw out a bunch of disjointed pieces of information and then pull an answer out of your ass, not if you want to claim any sort of validity to your answer. You must be able to establish ex ante how one can determine who the best player is and then you must let the results of that process, that system, provide you with the best answer.

A fact

There is no good reason to deny Joe Mauer the MVP.

Thursday, September 3, 2009

More MVP talk

My mother (of all people) points me to this article by Allen Barra in the Wall Street Journal echoing my thoughts on Derek Jeter's MVP candidacy. A few thoughts:

The tone of the article is pretty funny. It essentially acknowledges that Joe Mauer has been better and should win, but says, "Hey, Derek's been great for a while and has been robbed a couple times. Let's give him this year's award anyway, as a kind of lifetime achievement award." I can't get behind that reasoning, but at least it's honest.
Naturally, the article falls back on intangibles to make Derek's case. This leads to one of my new favorite baseball quotes:
"Some people will argue that intangibles don't exist, but in the ninth inning of close games everybody believes in them." - Marty Appel
It's not quite as pithy as "There are no atheists in foxholes," but the sentiment is the same, and likely equally true.
The article strangely does not note the strongest part of Derek's case: the fact that he plays shortstop and none of the other contenders do. You don't need intangibles to close the gap between Derek and a first baseman with better offensive numbers. You just have to understand the massive, massive value of playing a tougher position.
Derek's longevity really is incredible. More than anything, this is what will get him in the Hall of Fame one day.

Thanks for the link, Mom!

Saturday, August 29, 2009

Dear MVP Voters...

...if you must vote for a player on a playoff team, please vote for Derek Jeter. You guys go on and on all the time about how valuable he is to New York, and this may be his most valuable year yet. His defense has been very good: by UZR, only two shortstops have saved more runs this year with the glove. He's hit for average (0.329) and, for a shortstop, power as well (17 HRs). He's stolen 22 bases and only been caught 5 times. He's got all those intangibles.

You may be tempted to vote for Mark Teixeira. Mark's been a better hitter than Derek, but not by as much as you probably think. Don't be swayed by his RBI and HR totals. They are not spectacular totals for a first baseman. Furthermore, who do you think he's been driving in? Derek Jeter, that's who. First baseman who can do what Mark does are much more common than shortstops who can do what Derek's done this year.

Look, things can change in the next month, but I implore you to look beyond a few somewhat gaudy numbers and appreciate the man who really makes the Yanks go. You assure us all the time that the game isn't about numbers. That it's about people. Prove it. Vote for the guy whose value is tied up in more subtle things than RBIs and HRs. Vote for Derek Jeter.*

* Personally, I'd vote for Joe Mauer and then Ben Zobrist before Derek, but if you think being on a playoff team is important, Derek's your man.

**UPDATE** Never mind. Derek just sacrifice bunted with a two run lead in the second inning, no one out, runners on first and second, and Jose Contreras struggling. This is a terrible, terrible play, for which Michael Kay is rightly excoriating him, noting that Derek is a great hitter having an MVP type year. Al Leiter is trying to defend him, but it's a real reach. It doesn't matter that Derek leads off now (not that lead off hitters sacrifice bunt a lot anyways, Al). It doesn't matter that it's small ball. You can't give up outs in that situation. You just can't.

Wednesday, August 26, 2009

Atrocious Wednesday Night Broadcast

I've been watching (on my DVR) tonight's broadcast of the Yankees and White Sox on ESPN2. The broadcast team is Dave O'Brien and Rick Sutcliffe. I mean no offense to either of these guys personally but...

This broadcast is seriously awful. It's completely unfocused and disjointed. Sutcliffe just rambles about whatever the hell he feels like, throwing out random and asinine assertions left and right. Every correlation is a causation. Every random anecdote "tells you something." It's just stunning to me that these guys can blabber on for three hours and say nothing important or relevant.

If I'm not mistaken, Sutcliffe chose Derek Jeter as his pick for greatest shortstop OF ALL TIME. I love DJ, and that's just flat out insane.

Also, it seems like A-Rod has fielded every single ball in this game. That's weird. And now Jorge Posada is hurt. That's just awesome. Crap.

Anyways, let me apologize for this post. I assure you it is more informative than tonight's broadcast.

Monday, August 24, 2009

The White Sox fail horribly

In the bottom of the second inning against Boston today and leading by the score of 2-0, the White Sox found themselves defending against a first and third situation with two outs. The runner on first, J.D. Drew, attempted to steal second base and the White Sox had him thrown out by thirty feet.

So what happened? They got him in a pickle and while they wasted a bunch of time trying to tag Drew out, the runner on third, Jason Bay, scored. The second baseman, Jayson Nix, even checked the runner on third before continuing with the pickle.

Now, it's not clear to me from the replay when Bay started running or how close the play might have been. It doesn't really matter though, because it's horrible baseball. If Bay was stealing at the same time as Drew, then the catcher should have simply held on to the baseball and put Bay in a pickle. Bay knows this. Everyone knows this. Thus, it stands to reason that Bay waited until he saw that Drew was in trouble. This is standard baseball fare: when a runner gets caught in a pickle, other runners try to advance.

What's really surprising is that the White Sox appeared to trade the run for the out. Let's look at the math:

Run Expectancy for possible outcomes:

Runner on 2nd, 3 out, 0 runs scored: 0.0. In this scenario, the White Sox throw home and get Bay out. I expect that this is what usually happens.

Runners on 1st and 3rd, 2 out, 0 runs scored: 0.54. In this scenario, the White Sox simply chase both runners back to their respective bags and fail to record an out. It is also our baseline scenario.

Runners on 2nd and 3rd, 2 out, 0 runs scored: 0.58. In this scenario, the White Sox prevent Bay from scoring, but allow Drew to take second. Note that it is almost no worse than the baseline and far, far better than any of the next two outcomes.

No one on base, 3 out, 1 run scored: 1.0. This is what actually occured.

Runner on 2nd, 2 out, 1 run scored: 1.32. In this scenario, the White Sox fail to get either runner and they both advance, scoring Bay.

The White Sox would have gained .42 expected runs by simply hanging on to the baseball and letting Drew steal second! Then, after throwing through, they chose not to throw back home, preferring not to risk 0.32 expected runs for the chance at -1.0 expected runs (the difference between the best and worst outcomes and what actually happened, which was itself the second worst outcome). In order for that to be a good play, you would have to believe that you had only a 25% chance at throwing Bay out. This strikes me as fantastically unlikely. It honestly seemed as if the White Sox were consciously trading a run for an out. If so, that is a gross misunderstanding of the costs and benefits involved.

And as I am typing this, the epic FAIL continues. With the bases loaded and two outs in the bottom of the third with the White Sox now up 4-1, David Ortiz dribbles a ball up the 1st base line. It's a fantastically easy play for the first baseman, who can tag Ortiz out by about 15 feet with plenty of time to field the ball... ...except Jose Contreras, the pitcher, runs over and muffs the play by trying to field the ball himself depsite the fact that he has a terrible angle on the ball and the runner. Run scores, no one out. It's actually stunning how bad a decision this is. Even if he picks up the ball cleanly, he has a tougher play than the first baseman.

He then walks in a run, throws a wild pitch to tie the game at four apiece, and servers up a three run bomb to ~~Basil Rathbone~~ Mike Lowell for good measure. Thanks for nothing, White Sox!