Thursday, July 21, 2011

Michael Humphreys' Wizardry:
Imperfect But Critical Analytics  

Errors using inadequate data are much less than those using no data at all. -- Charles "The Dudmaston Devastator" Babbage

Social systems are determined by technological systems -- Leslie A. White

In business as in Baseball, technology triggers innovation which affects comparative advantage. The spread of relatively inexpensive business analytics tools in the late 1990s proliferated an immense cadre of people with various combined levels of skill, of insight, and of training to attack the kinds of large-dataset problems that have much to yield to such technology. Business, as always, lagged behind Baseball (Government did too, but, as usual, not quite as much), because Contemporary Corporate governance is almost always following financial aspirations, while Baseball's aspiration portfolio is broader.

Because Baseball is intrinsically far more accountable than the corporate workplace, it has embraced more advanced accountability engines, such as the only-recently capable of deployment video capture technologies that can identify events such as true balls and strikes, the exact trajectory, speed and rotation of a pitched or batted ball, the distance and vector paths a fielder takes to get to a batted ball.

The majority of the data loaded and analysed from these systems has decoded pitching for umpires, coaches and pitchers themselves. But the most important data loaded and analysed from these systems has been that aimed at decoding fielding. That's because judging fielding has been Baseball's most gaping knowledge lag. There are good data to have a glimpse at the value of batters and, to a lesser degree, pitchers, but all pre-contemporary high-tech analysis to judge fielding has been interesting but under-infused with hard facts.

So the arrival of that technology has been good. But there's a sad background to that, an Internet-inspired trend in the background that dulls the innovative advantage of this magnificent innovation. It needs to be proprietary, because the Internet has enabled people who don't respect intellectual property (the idea that inventors deserve compensation for their inventions, creators deserve compensation for their creations) to expropriate public instances of others' private property for their own private profit.

"Content wants to be free" is the mantra of the non-creative free-market types who want to reap the creations of creative people at their own whim for their own profit. It's a perfectly parasitical paradigm, perniciously peddled by pseudo-intellectual free-market rent-boys like Laurence Lessig. In a society that values money > creativity, creativity will gravitate towards serving the purpose of money so people with money but no creativity will buy creation while people with creativity will tend to constrain their focus to serve uncreative people with money.

This twin-killing has made it very difficult to achieve much with the Business Analytics tools our technological innovators have made possible, in part because the ubiquity of the Internet intellectual-property-theft tools our technological innovators have made possible. Beyond Baseball, probably in your own organization, innovation and mission advancement is stunted by the same trends.

If the companies that invested in the high-tech creations that have brought so much actionable information to baseball were not very protective of their data, they would be rich in insight and poor in money, with no chance of earning back their investment. So all this wonderful "batted ball" data that decodes fielding skill and enables baseball teams to make better, more informed decisions is kept proprietary, not shared with the vast cadre of analysts I described earlier. And so fewer informed ideas get tested, vetted, argued for and against -- that is, refined with the scientific method.

HUMPHREYS HATCHES A HARBINGER Into this scientific gap plunged Michael Humphreys, with an attempt to see what could be synthesised using only publicly-available data, and using what fragments of the proprietary data had been publicly-shared to "test" it.

The result was a few years of peer-review and dialogue that culminated in a book, Wizardry: Baseball's All-Time Great Fielders Revealed, (Oxford University Press, 2011, New York). In it, Humphreys devises a system that approximates the knowledge that could be uncovered using proprietary systems.

It's a most noble effort, one I found flawed in some ways, but one that I believe achieves its very useful mission: putting ideas that benefit from the scientific and analytic method into a public dialogue. The book, therefore is not an end in itself, but a means towards that end, and end that's very difficult to achieve in our finance-led society which gravitates in the other direction.

Wizardry has two parts. Part I is a detailed, open-book description of Humphreys' analytic methods (which I like much for its insight and openness). Part II is a position-by-position and "era"-by-"era" application of the methods to name names and built stacked lists of bests and worsts (which I didn't like much).

On the eccentric side, he proposes pitchers' fielding be credited with infield pop-outs and shallow outfield flies with long hang time, rather than any individual fielders' numbers, as he views them as automatic outs and not really something with which you should credit an individual fielder or the rest of the team. Unless I mis-read his intent, I suspect this should be credited to the pitcher's pitching instead (like a strikeout is credited to his pitching and not to the catcher's or the pitcher's fielding). But it's an interesting and thought-provoking assertion.

And in tribute to the now-widely accepted but laughably wrong "Wisdom of Crowds" cult, he proposes at one point that the only way to posit one aspect of outfield defense is to take two existing obviously-flawed systems and make a simple average between the two. Yikes...that's like suggesting that averaging the coordinates of two pitches called balls are the best way to determine what a strike is.

Disputes like this aside, he's made his analysis something others can build on by making it an open systems effort and bases it on publicly-available data. It doesn't need to be perfect; it needs to be sufficient and something good enough that others can build on it, and Humphreys' work is both.

BEYOND HUMPHREYS & BASEBALL I want to encourage you, if you are an analyst or have any management affect on analysis departments to grab a copy of Wizardry and read it for ideas for your own efforts.

First, absorb how he spread his ideas around to different people with very different points of view and used their critiques to synthesise refinements to his own system. In your own shop, that could mean circulating the answers and questions they engender to other departments with very different kinds of insights or could mean combining with other organizations that are not direct competitors to synthesise your mutual wisdom. It's not fully open source (though going fully open source is a strategy that was at least as effective as secured analysis for the Oakland A's Moneyball strategy), but pushes the energy in that direction and the comparative gains that has to offer.

Second, see how you can use available non-proprietary data to blend in with your own, the way Humphreys has. Analysts, I've found, too often restrict their span to their own perimeters of collected data.

Third, try posing naïve questions, Paul DePodesta style. Play around with your questions in eccentric (not out of this universe) ways, parallel to Humphreys' crediting pitchers' fielding with pop-up outs.

BALANCING SCIENCE & CREATIVITY In Baseball, at least, there are solutions to finding a workable balance. Don Malcolm has some ideas, hinted at here & to be described in full at some later date, I hope.

Beyond Baseball, as long as as a society we find money more worthy than anything else, the Internet and skilled lobbyists for Red China's industrial plutocrats ("we mustn't offend them, and free markets demand we respect their needs, and it makes everything so cheap to buy, so it feels good") make it seem inevitable that innovation and creativity must serve as the midwife to the uncreativity of finance. I'm pretty sure it's not inevitable, but it does require people working and thinking and acting like Humphreys, as well as being willing to pay innovators and creators instead of cheap idea cloners and purveyors of cheap toxic crap.

It just takes will and enforcing accountability, something Baseball does every minute.

This page is powered by Blogger. Isn't yours?

free website counter