Wednesday, December 24, 2003

Earl Weaver & Stochastic Evolution:
Using Metrics Wisely  

Last week I was writing entries about oversimplifying metrics, the seduction in organizations that don't have a natural appreciation of numbers being lured too heavily to a single metric (a TOGN) that can overimplify reality enough to distort it.

I was not as clear as I needed to be. Some readers of mine who have a lovely internal corporate or government blog that I think they're keeping sub-rosa wrote about those posts. Their expressions made me realize I needed to clarify what I was getting at. What's below looks like a quote, but I'm paraphrasing their exact words to protect their identity (aloha-there...write me and tell me it's okay to put a pointer to you and I'll use your exact words and link to you).

I see the dangers of One Big Number thinking, but I see that to the same degree there are dangers in us not using productivity metrics. It's mandatory for the manager tracking performance to take into consideration the pattern the data paints, both for entire teams and for individual employees. My data, for example, indicated conclusively I wasn't delegating enough. I was operating with the impression I was succeeding at that. And then I saw the metric. Doh.

I think I gave the impression I was opposed to metrics or expected them to be highly elaborate. Let me try to be clear about management metrics and my experience with them.

A TOGN (The One Great Number) that attempts to boil down a baseball player or an employee into a single, apparently one-dimensional (though perhaps highly-inclusive) stat is a noble effort. But it has a couple of severe shortcomings, even if the observer can escape the seduction of simplifying one life into a single column of numbers doesn't shut off the mind to more nuanced realities.


For one thing, a TOGN is foreordained to bleed out the shading, depth and context of individual performance. The recent Oakland A's, for example, pursued a TOGN strategy with pretty good results, but they ended up with a team that, while it pretty much accomplished what the TOGN was designed to do, was so limited in subtle respects such as fielding and fundamental baserunning that they found themselves losing some games they felt they couldn't afford to. They didn't dump the pursuit of players who fit their TOGN, they just started pursuing variations on the theme, knitting in some other skills at the cost of some incremental excellence as measured by their TOGN.

And for another thing, any TOGN, even researched to an extreme degree, is designed for a specific environment in which it was formulated. Evolution is inevitable, and it moves "the optimum" stochastically, that is, unpredicatably and an unpredictable rate in an unpredictable direction, though usually not re-making every factor totally, overnight. An acute observer using performance measurement systems built on several numbers is more likely to catch the trend early than one glued to a TOGN hyperoptimized to the "now" it was designed for.


That was a key part of the uncommon brilliance of Earl Weaver, manager in a couple of stints for the Baltimore Orioles. He was the original quoteable guy who pushed the theory of a single and walk and a 3-run homer, a big-inning offensive theory which provided great results because a) it worked and b) other teams were still futzing around with little-ball. But baseball evolved during Weaver's career, with ownership changing the rules and the ball in ways that changed the benefit/cost ratio of various strategies. Weaver was able to tune his biases and metrics to be in harmony with the reality of what was going on in his work environment on the field. [One of the most remarkable accomplishments of Weaver's transcendance as a manager, something so few beyond baseball every achieve, is that the lessons of his first years of managing were turned around radically right afterwards, and yet he adapted cleanly enough to lead his team to 109 wins the next year in a radically different environment. His first year was 1968, with rules changes that amplified pitching and devastated hitting -- league batting was .230/.295/.340 and league ERA was 2.98. In 1969 changes induced walks up 44%, and the lessons that made you effective the year before just weren't applicable. It's nearly universal that outside of baseball, managers who learn an early lesson are lashed to it, for better and worse, like Ahab to the whale, like stock-brokers who came into the market in the middle 90s, like French cavalrymen who learned tactics fighting in third-world colonies and then had to fight in Europe in W.W. I, like American strategists who developed their skills in W.W. II and then faced third-world guerillas in Vietnam. Not Weaver. Weaver never stopped using metrics, observing his team members, observing the larger environment. And he never complained about change; he simply used it as an advantage by seeing it more quickly and adapting to it more quickly].

Bill James, the best known of the sabermetricians, both craves a TOGN measure and fully understands the nature of time/change being the enemy of a TOGN to deliver "truth". One of his early creations was a stat called Runs Created (RC), a valuable measure still widely used that took basic components of offense (hits, walks, total bases and at bats) in a formula that projected the run value of a player or team stat line. He came up with an original formula, but soon realized while it held for the year he developed it, the components' value changed over time, so he ended up developing 14 different versions of the number that tweaked the component values for various seasons.

If you have enough different numbers, you're more likely to adapt to changing conditions than if you have just one. That's because within a set of diverse measures, one is more likely to pick up the change in the environment than the others. And seeing a pattern in them that constitutes real success (not just a high score on paper) means you're more likely to be able to use regression studies to redefine a metric for success in an evolving environment.

All that said, a TOGN is better than nothing. Too many big organizations fall prey to binary thinking. I worked for a supervisor at Microsoft who believed that because you couldn't find a single number that delivered absolute 100% Truth, that no numbers told you anything of value. I'm not exaggerating. He wouldn't use project management tools because "in the end it'll never come out (exactly) the way the timelines tell you they will". Because it could never score 100, he believed it was a 0. True, he was extraordinarily cognitively underqualified to be a manager, but big organizations harbor lots of men and women who operate this way, and also tend to reward them with promotions. Give these metrics-rejectors an executive mandate to measure, though, and they'll have you measuring a thousand trivial factors every millisecond. Bang, Boom, Barf they go, from one binary extreme to another, never finding an effective, homeostatic point in the continuum from zero to infinity.


So, process-mavens, let me summarize my attempted clarification. Almost every kind of work that requires a deliverable .and. needs to be effective .and. is being managed should have some performance metrics developed to observe it. One can develop a time-specific TOGN as a centrepiece, but develop other numbers to give it texture and context and shading, but don't present the TOGN as a stand-alone number lazy-minded execs can focus on and miss the point.

A great example of a textured metric is the fairly complex display Don Malcolm uses to describe starting pitcher performances, his Qmax tables. Here's an example of one for Greg Maddux, and here's a glossary to describe its components. It's not so complex it's crippling, but it's complex enough to slow down the rush to judgement types up the hierarchy who would misuse performance data out of laziness.

If the constituents of success started evolving because of rules changes, or because ownership started draining some of the Kickapoo Joy Juice out of the ball, or a new informal standard strike zone, that is, if what a starting pitcher did to help his team achieve wins, it would be obvious from a system like Qmax that the sweet spots in the matrix were shifting subtly, that the relative advantage of any given cell or group of cells was "worth" more or less.

That's a performance measurement system that is designed with texture, and evolution, built right in.

This page is powered by Blogger. Isn't yours?

free website counter