Monday, November 12, 2007

Is a run scored equal to a run saved?

In my previous post on the subject of Hank Steinbrenner's comment that the game of baseball was "70 percent" pitching, I started with the assumption that saving a run is roughly equivalent to scoring a run. To me, this seems like a safe assumption. However, I've been kicking the idea around and I've decided that I want to examine the issue in more detail. I have come up with a few different ways to look at the issue, and this post presents one of them. With luck, I'll be able to cover the other views in the near future.

The simplest way to examine this question is to use a modified version of Bill James' Pythagorean Theorem of Baseball to analyze how scoring and allowing runs influences a team's expected winning percentage. The "theorem," which gets its name from its resemblance to the Pythagorean Theorem proper, relates a team's winning percentage to its runs scored and runs allowed via the formula:

W% = RS^2 / ( RS^2 + RA^2 )

where W% is the team's expected winning percentage, RS is the number of runs the team scored, and RA is the number of runs a team allowed. Naturally, the relationship is not perfect (indeed, for this exercise I am using a slightly modified exponent for increased accuracy), but it does capture the essence of the relationship between run scoring and winning. For example, in 2007 the Yankees scored 968 runs and allowed 777. We would have expected them to win 98.3 games and lose 63.7. In reality, they won 94 and lost 68. They underperformed expectations by 4.3 wins. Historically, this is a fairly standard deviation from expectations.

From this formula, we can examine what happens to a team's expected winning percentage when we vary runs scored and runs allowed. Let's jump into the data.

Here you can see a spreadsheet where I've calculated the expected winning percentages for all combinations of RS and RA between 500 and 1000 in increments of 25 runs. The first thing that you should notice is the line where RS is equal to RA. As it should be, this line shows us that a team that scores as much as it allows should always expect a .500 winning percentage. If you did not expect this, it may be time for a refresher on basic math.

Now then, by picking one of the cells on the spreadsheet, we can see the effect on expected winning percentage if we save an additional 25 runs by moving up one cell. We can see the effect on expected winning percentage if we score an additional 25 runs by moving right one cell.

For example, if a team scored 900 runs and allowed 800, we would expect a winning percentage of 0.553, a roughly 90 win team over the course of a season. If that team were to save 25 more runs to become a 900 RS/775 RA team, its new expected winning percentage would be 0.567, a 92 win team. If that team were to add 25 more runs to become a 925 RS/800 RA team, its new expected winning percentage would be 0.565, also a roughly 92 win team. In this case, it appears that a run scored does equal a run saved.

Let's look at it a bit differently. Here you see a similar spreadsheet, but with different data. This spreadsheet shows the ratio of 25 additional runs saved to 25 additional runs gained from the current RS/RA. Here we see that the effects of runs scored and runs saved on expected winning percentage vary depending on our baseline of RS and RA. Interestingly, teams that outscore their opponents already benefit more from saving additional runs. Teams that are outscored by their opponent benefit more from scoring additional runs.

As a caveat, I note that while the differences at the margins appear extreme (the ratio approaches 2:1 depending on which side of 0.500 you are), the largest difference between scoring or saving an additional 25 runs is only 0.008, 1.3 wins over a 162 game season. Furthermore, as with many statistical models, the extremes are where the Pythagorean model itself breaks down.

From this data, it should be safe to conclude that, at least in terms of expected winning percentage, a run scored is on average equal to a run saved. Certainly, there is variation, but that variation is centered around a ratio of 1 RS to 1 RA. It would appear that Mr. Steinbrenner is incorrect.

There are definitely problems with the method. Primarily, we have examined expected winning percentage. If our expected winning percentage model does not itself capture the relationship between RS and RA, then our results will be poor. One of the ways we can examine this is to see if there is a relationship between over- or under-performing expectations and RS or RA. If there is, then this might indicate that our predictor is doing a poor job of capturing this relationship. Hopefully, I can follow up this post with an examination of this issue at a later date.

In order to address the problems of expected winning percentage versus actual winning percentage, my next view of the problem will try to answer the run scored versus run saved question from historical data. Stay tuned!

1 comment:

die Amerikanerin said...

I'll have you know, I studied all that very carefully, and I believe I got the idea. Yay for me!