Tuesday, March 15, 2005

Bill James & Adrienne Barbeau Confront
The Fog of Metrics & Post-Modernity  

A cynic is a man who will laugh at anything
so long as it isn't funny
-- Oscar Wilde

There are a pair of delusions that dominate analysis. They are binary opposites, but they come from the same lack of self-discipline.

One is an optimistic delusion; that everything useful can be measured and that everything measurable must have value. I'll cover that in the next entry. This entry is about the other, the opposite. The cynical delusion that if an effect can't be isolated or made predictable through measures, there must not be an effect at all. The cynical delusion has its roots in a cultural position that is on an up-trend and it's becoming common in all endeavors. Nowhere, however, is it more transparent & instructive than it is in baseball. And nowhere has it been more elegantly dissected than by the master himself, Bill James, in the most recent SABR publication.

NOTE: Once a year, SABR (the Society for American Baseball Research) issues a collection of research papers under the title The Baseball Research Journal (the BRJ I'm talking about here is the 2005 edition, Volume 33). This year's is the best in the last decade for several reasons: well-chosen pieces, a balance of types of statistical as well as investigative content, crisp editing and very professional publication quality (which wasn't always the case before Jim Charlton became editor and starting driving the team). It's apparently not yet available to the general public.

James' essay in the BRJ is called Underestimating the Fog, and it takes on the symptom of the delusion's up-trend in statistical baseball research coming from a school of thought Don Malcolm calls "neo-sabermetrics".

The underlying delusion many have when analysing measures, James asserts (and he includes himself as an occasional transgressor), is that if they can't prove something is significant, they assume it doesn't exist as a real factor. From the BRJ article:

If you make up a list of the leading hitters in the National League in 1982 (or any other year) and check their batting averages in 1983 (or the follow-up year, whatever it is) you will quite certainly find that those hitters hit far better than average in the follow-up season. If you look at the stolen base leaders in the National League in 1982, you will find that those players continue to steal bases in 1983. If you look at the Hit By Pitch Leaders in 1982, you will find that those players continued to be hit by pitches in 1983. That is what we mean by a persistent phenomenon-that the people who are good at it one year are good at it the next year, as well.

If the opposite is true-if the people who do well in a category one year do NOT tend to do well in the same category the next year-that's what we mean by a transient phenomenon. Here today, gone tomorrow.

All "real" skills in baseball (or anything else) are persistent at least to some extent. Intelligence, bicycle riding, alcoholism, income-earning capacity, height, weight, cleanliness, greed, bad breath, the ownership of dogs or llamas and the tendency to vote Republican . . . all of these are persistent phenomena. Everything real is persistent to some measurable extent. Therefore, if something cannot be measured as persistent, we tend to assume that it is not real.

Some of the factors that serious, intelligent researchers have studied and rejected as real phenomena and see as merely transient include:

  • "clutch hitting",
  • "catcher ERA" -- the ability of an individual catcher to affect the runs-allowed effectiveness of a pitcher,
  • "individual platoon differential" - the ability to for an individual hitter to have more success with pitchers who throw from one side over the other.

All three of these factors are widely accepted by old line baseball management and to an even great degree by announcers. Clutch hitting is a great example...in May you'll hear some radio mouthpiece getting all breathy about some batter's 2-for-6 performance this year as proof he's a clutch hitter with the bases loaded (ignoring both the minute sample size and the fact that he's 2-for-11 in other situations with a runner in scoring position). The logical flaw is that these numbers are merely background noise, "luck", call it what you will.

James goes on to describe one of his own studies, a 1988 examination of the last bulleted item.

One of the conclusions of that article was that "The platoon differential is not a weakness peculiar to some players. It is a condition of the game." I based this conclusion on the following research and logic. Suppose that you identify, in last year's platoon data, two groups of players: those who had the LARGEST platoon differentials, and those who hit better the wrong way (that is, left-handed hitters who hit better against left-handed pitchers, and right-handed hitters who hit better against right-handed pitchers.) Suppose that you then look at how those players hit in the FOLLOWING season. You will find that there is no difference or no reliable difference in their following-year platoon differentials. The players who had huge platoon differences in Year 1 will have platoon differences in Year 2 no larger than the players who were reverse-platoon in Year 1.

Individual platoon differences are transient, I concluded, therefore not real. Individual platoon differences are just luck. There is no evidence of individual batters having a special tendency to hit well or hit poorly against left-handed pitchers, except in a very few special cases.

In 1988, he came to the conclusion that because there was no way to prove statistically that the effect existed outside of a few exceptional players, it was logical to equate zero persistence with "luck" or random factors or background noise.

He concluded only a few years ago that the 1988 study was flawed. And that flaw, James suggests, was that his conclusion was based on random factors and background noise...the very thing he was trying to overcome with the study. Platoon differentials have noise in them. When you take the batting average against left-handed pitchers, a number that reflects some level of skill and some level of luck, and take the batting average against right-handed pitchers, a number that reflects the mixture of skill and luck, then when you compare them, you're adding together all the randomness/luck/noise from both sets of numbers.

In the case of platoon differential, a normal one for the League average is about 27 points of batting average, about .0275. James believes "the randomness is operating on a vastly larger scale than the statistic can accommodate," and that the "luck" is about 10 times the factor being measured (platoon differential). The noise is ten times the size of that being measured. Of course a good researcher can't nail the factor; it's overwhelmed by the noise. Then when you compare platoon differentials from multiple seasons to evaluate persistence, you're adding in more randomness. Rather than neutralizing the transient, you're overwhelming the truth inherent in the numbers.

But that doesn't mean in practice it doesn't happen and that managers shouldn't apply the knowledge that platoon differential exists as a general rule that embodies itself to some (probably varying) degree in each individual batter. If a right handed hitter, in the general case, has a .0275 (about 1 in 36) better chance of success against a left-handed pitcher than a left-handed hitter would have, a manager would be foolish to ignore the effect as a general case. The .0275 is the difference between a batting average of .276 and .303, not a giant difference, but one worth trying to harvest in key situations.

One of the giants who originated sabermetrics is Dick Cramer, a professional data analyst who's been doing it for a long time and had great success both in his vocation, pharmaceutical biochemical analysis, and in baseball. Cramer had impaled himself on the same fallacy with a study of clutch hitting. And James gets to the core delusion:

We ran astray because we have been assuming that random data is proof of nothingness, when in reality random data proves nothing. In essence, starting with Dick Cramer's article, Cramer argued that "I did an analysis which should have identified clutch hitters, if clutch hitting exists. I got random data; therefore, clutch hitters don't exist."

Just because you can't prove something with the numbers doesn't mean it doesn't exist. It doesn't mean you should give up -- that which you're trying to measure could be important (if a team could raise its batting average 27 points in general, it would yield them, all things being equal, another 16 runs a season, generally, about 3 or 4 extra wins, which is nothing to turn one's nose up at). But if you can't measure it well enough to isolate it and use it as a predictor it doesn't mean you should decide it doesn't exist.

Baseball players (as a group) have a platoon differential and the numbers indicate it. Individual players have a platoon differential, though the individuals' numbers aren't able to indicate it.

The binary impulse around metrics (otoh, they are everything, and if you can't measure it it ain't real, but on the other, you can follow the numbers and still be wrong so why bother?) leads to more cratered projects and organizations than I have room to list.

Good analysis requires skepticism, but it seems the post-modern trend is away from skepticism and towards cynicism. Cynicism can be very entertaining (I'm much more entertained by listening to Lewis Black or Christopher Hitchens than I am listening to Barack Obama or Ralph Reed). Cynicism may get someone attention in the media or in the classroom or the boardroom, but it doesn't do a great job of furthering useful analysis.

I had a boss once, Swish Nicholson, who was juggling multiple projects. He was highly-trained and exceptionally skilled at the content of what his department produced, while completely untrained in management or any stripe. In my spare time I attempted to build a project management foundation for him, simple tools he could use to rationalize the chaotic environment and daily events that made his life (and the life of everyone who worked for him) miserable.

Swish hated the thought of having to manage projects. Whenever I'd track the staff's use of time and then draw up coordinated schedules that would create ways to get more work done in the same amount of calendar time, he would hammer them. "You can't tell me that this item will be delivered in 36 days and with an adequate quality, you're just guessing," he'd say, "and therefore, this is just a pile of lies".

There is, of course a lot of room between predictive abilities that would let you declare the winner of the 2005 World Series with the winning scores and a pile of lies, enough room to do doughnuts with a lardass Dodge Ram 2500.

I went through the exercise that proved that the team as a whole could make fairly predictable progress based on his decisions and external factors. But because I couldn't predict to the day when some giant piece of work would be finished (too many external factors and "luck") Nicholson saw the data as useless as opposed to a place to start.

In your own workplace, don't be fooled by studies that fail to deliver significant results that are sensitive enough to create great plans on. Don't be fooled into thinking the factors you studied don't exist at all; keep in mind that which you're examining may be something you haven't been able to isolate from randomness or from other factors that you haven't neutralized. Don't give up asking a question you believe has a valuable answer just because you haven't found that answer yet.

As Bill James says, the absence of proof that a significant relationship exists is not proof that it doesn't exist. Or, as he wrote in this little Freudian story to conclude the BRJ essay:

...A sentry is looking through a fog, trying to see if there is an invading army out there, somewhere through the fog. He looks for a long time, and he can't see any invaders, so he goes and gets a really, really bright light to shine into the fog. Still doesn't see anything.

The sentry returns and reports that there is just no army out there-but the problem is, he has underestimated the density of the fog. It seems, intuitively, that if you shine a bright enough light into the fog, if there was an army out there you'd have to be able to see it--but in fact you can't. That's where we are: we're trying to see if there's an army out there, and we have confident reports that the coast is clear-but we may have underestimated the density of the fog. The randomness of the data is the fog. What I am saying in this article is, that the fog may be many times more dense than we have been allowing for.

Let's look again; let's give the fog a little more credit. Let's not be too sure that we haven't been missing something important.

This page is powered by Blogger. Isn't yours?

free website counter