Basebology (The Study of Baseball): The Uncanny Valley

Sunday, July 20, 2008

The Uncanny Valley

In robotics, there exists a hypothesis known as "the uncanny valley." In short, the hypothesis is that humans respond with increasing empathy to robots (and other objects) as they become increasingly human up until a point where they are almost human. At this point, humans will instead experience strong repulsion as they now perceive those characteristics that make them un-human instead of those that make them more human. However, this repulsion only lasts for a little while because as the robots become even more human we once again begin to identify strongly with them.

This hypothesis tends to ring true to me. Just look at the reaction that most people have to computer animated characters. We tend to obsess over the things that make them less human. We find them creepy, even if technically they are superbly animated. We don't have that reaction to simple hand-drawn animations. Indeed, much animation is designed to create truly unrealistic characteristics that are designed to elicit a truly empathetic response.

This, of course, has nothing at all to do with baseball. Not obviously, anyway.

I began thinking about this concept with respect to baseball as it pertains to baseball analysis. All baseball analysis is designed to simplify our view of baseball so that we can more easily extract information from it. As statistics become more and more complex, they begin to capture real baseball more and more accurately.

But will we reach a point where the models involved will be so minute that they miss the big picture? Can we create models that are too detailed? Will baseball models reach a point where they are so close to real baseball that they actually cease to tell us anything useful, instead providing us with either totally obvious or totally false conclusions?

One of the advantages of statistics is their coarse-grained nature. By not being bogged down in details, we can cut right to some important if general truths.

Obviously, I don't have an answer to these questions. And really, I probably don't need one. As long as we can measure the efficacy of the models we create, we should be able to discern between useful and useless models. Nonetheless, it's a question that's been eating at me for a few weeks. Thoughts?

3 comments:

Robert Lynch said...: So you're saying our models may get so good that they replicate baseball so well that there isn't really any benefit of actually having the model anymore?; July 22, 2008 at 7:54 AM
Robert Lynch said...: Or the idea is that because we've gotten so close to real baseball, like you said, where the model is the same we'll only learn obvious stuff and where the model is different we won't learn anything at all?

I think I'm starting to get it. You're wondering if baseball models will ever reach the point where they become super complex and cease to actually tell us anything right?; July 22, 2008 at 11:01 AM
Anonymous said...: What I'm saying is that right now the statistical models that we use are really coarse-grained. They deal with large chunks of information. Someone's OBP is an aggregate of a whole ton of plate appearances. It is useful precisely because it doesn't seek to capture a whole bunch of minute details.

I'm wondering if we will reach a point where our models are trying to capture so much minute detail that, despite approaching real baseball even more than our current models, they will tend to lead to gross errors when used to make conclusions about the minute details, precisely those areas that the more detailed model is supposed to succeed over the coarse model. If the model can't help us in the fine-grained areas, what good is it? We've already dealt well with the coarse-grained areas quite well.

Note that if this truly were an uncanny valley these models would NOT be so close to baseball as to be indistinguishable. Those models would exist on the other side of the valley.; July 23, 2008 at 5:42 AM

Post a Comment

Key Stats

ARP
Adjusted Runs Prevented

ARP measures the amount of runs that a relief pitcher prevented from scoring above what an average relief pitcher would have prevented. ARP is adjusted for the situation in which the pitcher was used.

ISO
Isolated Power

ISO is the ratio of extra bases that a player has accumulated to the number of at bats he has received. ISO is essentially a player's SLG minus his batting average. This has the effect of giving a player credit only for extra base hits. ISO is not a useful measure of player value on its own, but is a very effective measure of a player's extra base ability.

OBP
On Base Percentage

OBP is the ratio of the number of times a player reached base safely to the number of opportunities he had to reach base. It effectively measures a player's skill at not making outs. Since outs are a teams most precious commodity, OBP measures perhaps the most valuable and fundamental skill a player can have.

OPS
On Base Plus Slugging Percentage

OPS is a crude metric that simply sums a player's on base and slugging percentages. It is probably the most popular non-traditional measure of overall batting performance due to its simplicity. However, it has drawn criticism from performance analysts for its inaccuracy relative to other advanced metrics and because it works by adding two numbers with different denominators together to produce a conceptually meaningless quantity. It is best used as a quick and dirty estimator of batting prowess.

SLG
Slugging Percentage

SLG is the ratio of total bases that a player has accumulated to the number of at bats he has received. It is essentially a weighted batting average that gives a player more credit for extra base hits.

UZR
Ultimate Zone Rating

UZR is a defensive metric that uses play-by-play data to determine how good a player's defense is. On Fangraphs, it is denominated in runs saved above average.

VORP
Value Over Replacement Player

VORP measures the amount of runs that a player contributed above what a "replacement player" at the same position would produce. VORP considers only offensive contributions.

WARP
Wins Above Replacement Player

WARP measures the amount of wins that a player contributed above what a "replacement player" at the same position would produce. WARP considers both offensive and defensive contributions.

WXRL
Win Expectancy added above Replacement adjusted for Lineup

WXRL measures the amount of wins that a relief pitcher contributed above what a "replacement player" would produce. WXRL differs from WARP because it is adjusted for both the game situation in which the pitcher was used and the hitters that the pitcher faced.

Basebology (The Study of Baseball)

Sunday, July 20, 2008

The Uncanny Valley

3 comments:

Blog Archive

Key Stats

Contributors