Sunday, July 20, 2008

The Uncanny Valley

In robotics, there exists a hypothesis known as "the uncanny valley." In short, the hypothesis is that humans respond with increasing empathy to robots (and other objects) as they become increasingly human up until a point where they are almost human. At this point, humans will instead experience strong repulsion as they now perceive those characteristics that make them un-human instead of those that make them more human. However, this repulsion only lasts for a little while because as the robots become even more human we once again begin to identify strongly with them.

This hypothesis tends to ring true to me. Just look at the reaction that most people have to computer animated characters. We tend to obsess over the things that make them less human. We find them creepy, even if technically they are superbly animated. We don't have that reaction to simple hand-drawn animations. Indeed, much animation is designed to create truly unrealistic characteristics that are designed to elicit a truly empathetic response.

This, of course, has nothing at all to do with baseball. Not obviously, anyway.

I began thinking about this concept with respect to baseball as it pertains to baseball analysis. All baseball analysis is designed to simplify our view of baseball so that we can more easily extract information from it. As statistics become more and more complex, they begin to capture real baseball more and more accurately.

But will we reach a point where the models involved will be so minute that they miss the big picture? Can we create models that are too detailed? Will baseball models reach a point where they are so close to real baseball that they actually cease to tell us anything useful, instead providing us with either totally obvious or totally false conclusions?

One of the advantages of statistics is their coarse-grained nature. By not being bogged down in details, we can cut right to some important if general truths.

Obviously, I don't have an answer to these questions. And really, I probably don't need one. As long as we can measure the efficacy of the models we create, we should be able to discern between useful and useless models. Nonetheless, it's a question that's been eating at me for a few weeks. Thoughts?


rklllama said...

So you're saying our models may get so good that they replicate baseball so well that there isn't really any benefit of actually having the model anymore?

rklllama said...

Or the idea is that because we've gotten so close to real baseball, like you said, where the model is the same we'll only learn obvious stuff and where the model is different we won't learn anything at all?

I think I'm starting to get it. You're wondering if baseball models will ever reach the point where they become super complex and cease to actually tell us anything right?

John Lynch said...

What I'm saying is that right now the statistical models that we use are really coarse-grained. They deal with large chunks of information. Someone's OBP is an aggregate of a whole ton of plate appearances. It is useful precisely because it doesn't seek to capture a whole bunch of minute details.

I'm wondering if we will reach a point where our models are trying to capture so much minute detail that, despite approaching real baseball even more than our current models, they will tend to lead to gross errors when used to make conclusions about the minute details, precisely those areas that the more detailed model is supposed to succeed over the coarse model. If the model can't help us in the fine-grained areas, what good is it? We've already dealt well with the coarse-grained areas quite well.

Note that if this truly were an uncanny valley these models would NOT be so close to baseball as to be indistinguishable. Those models would exist on the other side of the valley.