Wednesday, September 22, 2004
A Minnesota Twins Knowledge Management Lesson:
Johan Santana Meets Supernatural Carlos Santana
All organizations, baseball or not, get virtuoso performances out of contributors. Successful ones nurture the achiever, but to continue benefiting from the virtuousity, to really make the most out of it, they need to come to understand how the contributor achieved that performance so they can help her and others recreate those achievements.
That process, analysing the basis for success (and failure) and then cloning it is the essence of knowledge management (long definition here, with some background here).
A lot of metrics and analysis is flat. Let me use a baseball example based on the American League's best pitcher this year, the Minnesota Twins' Johan Santana. His virtuousity is remarkable, almost like his American namesake Carlos Santana, he of the transcendant, always recognizable riffs (Isn't it amazing that even if he's just backing up someone who plays a completely different kind of music, you can always pick him out? I've been trying to master just one of his leads for over two years now, and I'm still trying to figure out a few of the transitions which seem to defy the limits of five fingers and a Fender).
You can get a ton of data on Johan Santana. There's the really simple, flat metric presentation (courtesy of MLB.COM):
W L ERA G GS CG SHO SV SVO IP H R ER HR BB SO 2004 15 6 3.06 28 28 1 1 0 0 188.0 136 68 64 24 47 213 Career 38 18 3.68 145 69 1 1 1 1 584.1 499 254 239 65 213 611
This tells you what he's done on a seasonal level, like the usual accounting statements non-profits and business put together. You can tell he's having a very successful year within a short but successful career. But it isn't actionable. You can't expect success by telling Johan or his fellow Twins hurlers to please go out and earn an ERA of 3.06 & strike out a lot more guys than they walk and expect to infuse success into the group.
Better, there's the sophisticated annual historical record (courtesy of Bigleagueplayers):
Last 3 years Team G GS W L IP H R ER HR BB K ERA WHIP BAA 2002 MIN 27 14 8 6 108.1 84 41 36 7 49 137 2.99 1.23 .212 2003 MIN 45 18 12 3 158.1 127 56 54 17 47 169 3.07 1.10 .216 2004 MIN 29 29 16 6 195.0 137 68 64 24 48 224 2.95 0.95 .196 Career 146 70 39 18 591.1 500 254 239 65 214 622 3.64 1.21 .228
This adds some interesting rate (quality measures) at the end, WHIP (baserunners allowed per inning; anything under 1.3 is good, and anything under 1.0 is totally superb) and BAA (the composite batting averages achieved against him). More illuminating, more specific (you now have a glimmering into why he's won so many games; bnatters don't hit much against him, he strikes out more than a batter per inning, and his walks must be relatively low because his WHIP is under 1). Again, not actionable in a significant way.
Better yet, there are the really fine "split stats" (here courtesy of Bigleaguers again) that lump performances into various components to see if there are specific strengths and weaknesses in the overall performance.
G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA Total 32 32 19 6 0 1 1 217.0 151 68 64 24 49 254 2.65 0.92 .195 G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA vs. Left 32 0 0 0 0 0 0 53.1 37 - - 5 8 52 - 0.84 .195 vs. Right 32 0 0 0 0 0 0 163.2 114 - - 19 41 202 - 0.95 .195 G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA Home 20 20 11 4 0 1 1 137.1 95 42 40 14 31 167 2.62 0.92 .194 Away 12 12 8 2 0 0 0 79.2 56 26 24 10 18 87 2.71 0.93 .196 G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA Day 17 17 9 5 0 0 0 117.1 83 38 36 13 34 138 2.76 1.00 .199 Night 15 15 10 1 0 1 1 99.2 68 30 28 11 15 116 2.53 0.83 .189 G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA Grass 10 10 7 1 0 0 0 66.1 45 19 18 9 16 74 2.44 0.92 .190 Turf 22 22 12 5 0 1 1 150.2 106 49 46 15 33 180 2.75 0.92 .197 G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA Indoors 21 21 12 4 0 1 1 145.1 98 44 42 15 31 174 2.60 0.89 .190 Outdoors 11 11 7 2 0 0 0 71.2 53 24 22 9 18 80 2.76 0.99 .204 G GS W L SV CG SHO IP H R ER HR BB K ERA WHIP BAA April 5 5 1 0 0 0 0 28.1 30 18 17 5 8 24 5.40 1.34 .273 May 6 6 1 3 0 0 0 32.2 42 22 21 6 11 30 5.79 1.62 .313 June 5 5 4 1 0 0 0 37.2 21 10 10 5 6 46 2.39 0.72 .160 July 6 6 3 2 0 1 1 46.0 14 7 6 4 15 61 1.17 0.63 .095 August 6 6 6 0 0 0 0 43.1 29 11 10 4 7 52 2.08 0.83 .188 September 4 4 4 0 0 0 0 29.0 15 0 0 0 2 41 0.00 0.59 .150
I highlighted in Pepto-Bismol pink some junk results in here -- numbers that should not have been presented because they can't tell you anything. There are almost no turf parks left...the Twins park is one of them, so all this can tell you is how he pitched at home (already presented) along with a few crumbs of how he pitched barely elsewhere.
But here is some actionable information if you were looking to deconstruct Santana's strengths & weaknesses to try and help him ampfiy the good and dampen the not-so-good. But hark, his splits are extraordinary. I'm going to use BAA (batting average hitters achieve against him) as the bellwether for this. If you can believe them, his performance is extraordinary. BAA by left-handed hitters is .195, and right handers .195. He doesn't just crush lefties (many left-handed pitchers can do this), but he is equally transcendant against righties. Okay. ¿What else?
Home and away. Many pitchers learn to take advantage of their home park. But again, he's essentially the same home and on the road, allowing .194 and .196 BAA. This tells you he's either so overwhelming that it doesn't matter where he's pitching or alternately, he's figured out how to use almost any park he's in to his advantage (or both at the same time).
You can also see if you look at the last splits, by month, his season started with a mediocre April, an Epicacal May and consistent excellence since. Was he injured? Did he learn something? Does he do poorly in cold weather? Probably not the latter, since his smallish sample September is a cool weather performance month, and he looks veritably Nazgulish (uh, 41 strikeouts & 2 free passes in 29 innings). But this is a data set that leads us to better understanding of the components of the success, as well as a set of questions with which to follow-up.
But again, this is more about Santana's excellence than what he does to be excellent. How does he achieve that kind of performance, and more importantly, what is he doing that others might emulate?
SELF-PROCLAIMED SPORTS DORK SETH STOHS GETS US TO ACTIONABLE
To find that out, you have to break it down to its unit level. If you're an analyst, you need to examine what he does in a game, inning-by-inning. You probably can go down as far as individual pitches and no farther. In non-baseball work, you'll have to pick your own level, but start small and get your hands dirty in data instead of just looking at bigger pictures, because you might see patterns in individual-event data you might otherwise miss.
I found Seth Stohs' website thanks to Aaron's Baseball Blog. Stohs has masterfully analysed Johan Santana's most recent pitching performance, and although he's blended in some gushy star-eyed fannish superlatives, I'll snip those puppies out in the interest of the stodgy, academic flatness of prose for which I'm known. He's analysed it by looking at the game the way a competitor would, pitch by pitch. He answers the questions:
- What kinds of pitches did Santana throw?
- At what counts did he throw which pitches?
- What effect did he get from each?
Here's some Seth showing how really good analysts present data and information both (rasty formatting inherited...I'll try to clean some of it up, but be warned, Seth's intelligence is a lot higher than his HTML skill) with my comments in bold:
One of the things that people define "Ace" with is a guy who, when the team needs a win, gets the win. Sunday, the Twins didn't really need the win, but obviously the team would prefer to go into Chicago on a positive note. I can't imagine a more positive note... Yesterday was as dominant a pitching performance as I have seen. Here are the basic numbers:
...............IP H .R ER BB SO
Johan Santana 8.0 7 .0 .0. 0 14Incredible. Impressive. Amazing... Enough superlatives? The 14 strikeouts was his career high. He completely shut down and baffled a hot Orioles team. His scoreless inning streak increased to 30 (snip). It was his fourth straight start in which he didn't allow a run. It was his 12th consecutive start where he got a win. (snip)
Let's dive into Johan's pitching performance yesterday. (snip) I charted each of his 103 pitches and noted the type of pitch, whether it was a ball or strike and what the speed of the pitch was. Again, the speed of the pitch comes from what showed on Fox Sports Net. Here are just some interesting things to note from the game.
Of the 103 pitches that Santana threw, 80 of them (78%) were strikes. 67% is generally considered good. I don't know if I've seen this impressive a percentage before. [he looks at a basic indicator and then puts it into context for others]
Here is a breakdown of the type of pitch that Santana threw, (snip).
Fastball ...........57 (55.3%)
Change Up ..........21 (20.4%)
Curveball/Slider ...25 (24.3%)[Note, he lumped two smaller categories together, Curve and slider, either because he couldn't tell the difference, or because alone either was too small to take into consideration.]
Here are the number of pitches he threw each inning and the type of pitch:
1st inning - 13 pitches (7 fastball, 3 curveball, 3 changeup)
2nd inning - 13 pitches (7 fastball, 4 curveball, 2 changeup)
3rd inning - 14 pitches (8 fastball, 2 curveball, 4 changeup)
4th inning - 12 pitches (5 fastball, 4 curveball, 3 changeup)
5th inning - 15 pitches (10 fastball, 2 curveball, 3 changeup)
6th inning - 11 pitches (6 fastball, 2 curveball, 3 changeup)
7th inning - 15 pitches (9 fastball, 4 curveball, 2 changeup)
8th inning - 10 pitches (5 fastball, 4 curveball, 1 changeup)
Total ......103 pitches (57 fastball, 25 curveball, 21 changeup)[Good breakdown here. The average # of pitches it takes a pitcher to get through an inning is around 15-16, and as you look through this table, you can see the inning he labored the most was 15 throws. You can also see he kept mixing his pitches up. More detail needed, but Seth will give that to us; this is a good foundation.]
It was interesting to me that Santana seemed to be stronger as the game went on. [Good analysts insert their opinions, presented as opinions and not as capital T Truth, as well as just crunching numbers, as Seth does here] His best, most dominant inning may have been when he struck out the side on 10 pitches in the 8th inning. [Good analysts point out highlights] Santana was consistent with his fastball throughout, but check out the velocity of the pitches he threw by inning (Note - please recall that I did not differentiate between a curveball and a slider. I think Santana threw more sliders late in the game):
Inning --------Fastball Curveball Changeup
1st inning ---- 91.7 ------81.3 ---77.3
2nd inning ---- 90.6 ------78.5--- 77.0
3rd inning ---- 91.6 ------80.0--- 79.3
4th inning ---- 92.4 ------82.8--- 76.7
5th inning ---- 92.6 ------82.0--- 77.3
6th inning ---- 92.5 ------85.0--- 77.7
7th inning ---- 92.9 ------80.5--- 83.5
8th inning ---- 93.0 ------84.3--- 79.0[Seth doesn't give you conclsuions on this but its presented so well that other analysts, in this case, Yours Turley, can provide some insight. Santana's fastball and changeup were both faster at the end of the game than at the beginning. He was throwing easy, not his hardest, through the early going. As the game wore on, he either started tiring and throwing harder to compensate, or intentionally sped things up to put the hitters' timing off.]
Did Santana alter the pitches he threw each time through the batting order? The O's had four hits the first time through the order. They had just one the second time through and two hits the third time through the order. Santana struck out the top two hitters in the Orioles lineup, Brian Roberts and Tim Raines, in their 4th plate appearances. [Seth knows but isn't telling because he believes most readers will already know that hitters tend to perform better against pitchers the 3d and 4th times they face them in a game because the batter has already seen what the pitcher has to throw today and can better guess which pitch as well as time it better. If a pitcher Ks a batter in a 4th appearance, the hurler either has so much mojo that day, or is so unpredictable, that the hitter is helpless in the face of that quality].
Time
Through
Order - --- -- FB FB% -- -CB CB% -- -CU CU% ---Total
1st --- ---- --19 57.6%--- 8 24.2% ---6 18.2% ---33
2nd --- ----- -18 52.9% ---7 20.6% ---9 26.5% ---34
3rd --- ----- -16 55.2% ---8 23.5% ---5 14.7%- --29
4th (2 batters) 4 57.1% ---2 28.6% ---1 14.3% --- 7So what does this show? (snip) Putting all of this together, it really just verifies the fact that Santana is willing to throw any pitch at any time. He will throw the fastball a little more than half the time, and the rest of the time, he picks between the curveball and changeup, and all three pitches are incredible.
Santana was so dominant yesterday. He only had one 3 ball count. Actually, he had just three 2 ball counts the whole game. Just again to illustrate how unpredictable Santana is, take a look at the pitches he threw on each count: [FB = fastball, CB = curve or slider, CU = change up]
Count -FB CB CU
0-0 ---17 10 2
0-1 ---12 5 -2
0-2- -- 7 2 -6
1-0- -- 4 1 -2
1-1- -- 8 2 -1
1-2- -- 8 2 -7
2-1- -- 1 0 -0
2-2- -- 0 2 -0
3-2- -- 0 1 -0It is interesting to me to see that Santana does seem to throw more breaking balls with two strikes. Of the 14 strikeouts, seven came with the changeup, four with a curveball and three with the fastball. So, what is his strikeout pitch? Any of the three. [Great stuff. Conslusion supported by data. Note, on the one 3-2 count he got to all game, Santana had the unmitigated gall to throw a curve...an outside-the-Bachs move if there ever was one] [Seth didn't tell you one of the most important things, which perhaps he overlooked, but again, his presentation was so thorough, no plaque, no junk, that it jumps right out for other analysts. I'll get to this in my next paragraph]
Stohs doesn't point out that there are two missing counts here, 2-0 and 3-1. That's really important. Because those are THE two Red Meat hitter's counts, the counts on which batters have statistically the best chances for success (intermediate and beginning fans, if your team has runners on base and the count gets to either 2-0 or 3-1, this is a time to start rhythmic clapping, even if the P.A. system operator doesn't know to put on the claptrack). Hitters try to work a pitcher to get to a 2-0 or 3-1 count because it means the pitcher is overwhelmingly likely to throw a fastball (for most hitters, the easiest, relatively, to hit).
BEYOND BASEBALL
This is a classic model of analysis and presenting data about it. Regardless of what line of work you're in, if you want to communicate both to dedicated peer analysts and more raw outside observers, slipstreaming Stoh's style here is of great value. Some of the key elements:
- Present data, starting with the big picture
- Drill down and present what's significant
- Provide conclusions clearly labeled as such
This is actionable. This is the kind of detail that makes knowledge management possible, and that, in turn, makes future success, both for the individual contributor and the team as a whole more likely.
In the next entry, I'll show you a recent illustration of the Anti-Stoh, a big ugly failure of metrics, and how to avoid such fact-plaque.
free website counter