Management by Baseball

Monday, October 20, 2003

Metrics Mentor: Don Malcolm --As if
Predicting the Marlins Wasn't Enough

Don Malcolm is one of the most controversial baseball analysts who persistently breaks new ground in applying numeric analysis to the study of game. At least one of his inventions makes a powerful analogy you can transport from the study of baseball to the study of the performance of any individual, organization or system. I'll explain the way this metrics mentor applies contextual analysis to starting pitcher performance with his system.

First, though, you need to know he picked the Florida Marlins to be the big surprise team of the year. A fair number of people, me included, thought they could be very interesting this year. I had thought they could be very competitive for streaks. But according to an infographic on a televised playoff game, the Marlins actually had the best record in baseball this season after May 23rd (I haven't verified this number; I've found this un-named network's numbers to be in error a surprising number of times; they have a "We make it up as we go along...you just nod, Buckwheat" ethic). The only commentator outside of South Florida who dared predict a high level of competitiveness for this team was Don.

Malcolm's interesting public baseball commentary surfaced in his collaboration in The Big Bad Baseball Annual, a weighty yearly book that covered trends for every team and supported the commentary with metrics both new and inherited from previous Annuals. The book suffered the fate of much of the deliquescing book business in The Permafrost Economy, but the work goes on, appearing, when Don has time, on the book's website.

His masterpiece this year IMNSHO was a series called Fish Fry, a periodic buffet of analyses of the Marlins performance over a week. His 8-to-5 work got in the way of his regular writing after June, though there are a few entries after that. There are a couple I consider exemplary presentations of contextual analysis, including this one. And this link points at the first in the series, a table-setter. His writing is drizzled with amusing popular culture references and acerbic assaults on other baseball researchers and pundits he considers knee-jerk or shallow adherents to what he calls neo-sabermetrics, a school of study best embodied by the guys over at Baseball Prospectus.

Anyway, if you click on this Google search, you can find links to all of Don's Fish Fry analyses (look in the supporting text of each entry to see which Week number the entry is...there are a number of duplicate hits).

Masterful Metrics

My most frequent finding of shortcoming in metrics presented as "truth" or "insight" is a lack of context. From averages that ignore equally-important aspects such as level of consistency and confusing the utility of counting stats (RBI, gross sales $, units-failed) with that of rate stats (slugging average, net margin, percentage-failed), the single most common presenter failure is that of including context. In Malcolm's world, he calls this aspect shape.

So while other researchers try to find a single number to define a starting pitcher's performance in a single number (for example, the Game Score metric I referred to recently), Don uses QMAX, a two-dimensional matrix. On one dimension, he grades a start by "Stuff", a measure of the rate of hit-prevention ability in that start. On the other, "Command", a measure of the rate of walk-prevention. Once the starter's stint is complete, you can file his performance in one of these p.o. boxes. The table here is from his site, and it shows what the aggregate ERA in 1994-96 was for starters.

There are zones on the QMAX chart...the two with the greatest success shaded in this table. This matrix is an example of one thing researchers or anyone presenting metrics should always do: test the assumptions of the numbers before presenting them. If you look at this chart, you can see the underlying "meaning" or level of effectiveness. Malcolm's taken a commonly-known measure of starting pitcher effectiveness, ERA, and shown how it works against his more compound system. And yes, it seems to work...that is, the two shaded areas in the upper left, the success squares correspond to lower ERA numbers than the others. He then builds up an entire set of regions on the table that describe specific kinds of performances (with known aggregate results).

You can show the results for a team or an individual player (or a team or a league), by entering the number of starts that fall into each pigeon-hole, and they quickly indicate the shape of the pitcher's performance over time. This page shows a pair of tables of 13-game stretches of Greg Maddux starts. The two contrast well. You can actually see, given a couple of minutes examining the method, that he collapsed late in the season and how. If you just tracked ERA for example, it would have been harder to notice because by halfway through a season, there's so much data already stored in the number, each start can only change it a little. And ERA only indicates generally what, while QMAX provides indications of why.

If you're interested in performance metrics, spend a little time with QMAX. Here's a glossary that explains the evaluation system, the shaded regions and names for them. If your numeracy is very low, this might hurt to look at, but I suggest if your numeracy is very low, you shouldn't be in the metrics game.

I can see a lot of applications for this kind of presentation. Sales people, for example (one axis for ranges of size of sale, and the other for ranges of net margin). You would figure out what your goals are (revenue? weighted by the strategic importance of the product line? net income?) and make a QMAX equivalent. QMAX doesn't have to be two-dimensional, it could be three- or more-, though it would be complex to work with. But this has a ton

Finally, don't forget to test your assumptions before you institutionalise your system. In the case of QMAX, Malcolm aggregated ERA into the matrices boxes, then shaded specific areas and named them. In the same way, you need to test what each box means in a measure you know to be important before you start assigning a "value" to it.

If you're interested in performance metrics, Malcolm's mastery of context is a great model to emulate.

10/20/2003 06:58:00 AM posted by j @ 10/20/2003 06:58:00 AM