In robotics, there exists a hypothesis known as "the uncanny valley." In short, the hypothesis is that humans respond with increasing empathy to robots (and other objects) as they become increasingly human up until a point where they are almost human. At this point, humans will instead experience strong repulsion as they now perceive those characteristics that make them un-human instead of those that make them more human. However, this repulsion only lasts for a little while because as the robots become even more human we once again begin to identify strongly with them.
This hypothesis tends to ring true to me. Just look at the reaction that most people have to computer animated characters. We tend to obsess over the things that make them less human. We find them creepy, even if technically they are superbly animated. We don't have that reaction to simple hand-drawn animations. Indeed, much animation is designed to create truly unrealistic characteristics that are designed to elicit a truly empathetic response.
This, of course, has nothing at all to do with baseball. Not obviously, anyway.
I began thinking about this concept with respect to baseball as it pertains to baseball analysis. All baseball analysis is designed to simplify our view of baseball so that we can more easily extract information from it. As statistics become more and more complex, they begin to capture real baseball more and more accurately.
But will we reach a point where the models involved will be so minute that they miss the big picture? Can we create models that are too detailed? Will baseball models reach a point where they are so close to real baseball that they actually cease to tell us anything useful, instead providing us with either totally obvious or totally false conclusions?
One of the advantages of statistics is their coarse-grained nature. By not being bogged down in details, we can cut right to some important if general truths.
Obviously, I don't have an answer to these questions. And really, I probably don't need one. As long as we can measure the efficacy of the models we create, we should be able to discern between useful and useless models. Nonetheless, it's a question that's been eating at me for a few weeks. Thoughts?