In Defense Of Human Error

David Cameron is a good baseball analyst (or I wouldn't bother critiquing him), but I think he's dead wrong about this.

There always seems to be a crowd that is skeptical that anything could be quantified, just because the possibility for error exists. With defensive stats, the question always comes up about whether or not one man's line drive is another man's fliner or if someone collecting data is going to say a ball hit deep in the infield instead of in the shallow outfield. And I'm quite certain that people collecting the data make mistakes from time-to-time. The question that no one bothers to answer is how often those people make mistakes and how much those mistakes matter. There's a good reason no one answers these questions--we don't get to watch the play-by-play scorers from Baseball Info Solutions (or whichever your favorite service is) in action so we can't really say one way or another how well they are doing.

My major beef, though, is with the idea that no human error is involved in the collection of offensive statistics.

When we talk about something like on base percentage, it is a statistic based on indisputable factual results - Player X reached base Y times in Z plate appearances. There’s no gray area - it happened, it was recorded, and no one disagrees.

Emphasis mine. Anyone who has ever watched a baseball game (including David Cameron) knows that this is incorrect, whether or not they realize it. Nearly every game, an umpire makes a call that someone doesn't like. Hell, umpires made enough bad calls on HR/not HR that we've decided to institute a replay system whose very existence proves that people doubt the supposedly "indisputable" nature of offensive statistics. In the case of nationally televised games, not only do some people disagree with the recorded result, but sometimes millions of people disagree with the recorded outcome.

And if we talk about hits, which every offensive valuation system known to man includes, we have to start worrying about whether or not a scorekeeper decided that a batted ball should be a hit or an error. Even just limiting the jury to sports broadcasters, it is clear that not everyone agrees with the decisions that scorekeepers make.

The only reason that anyone actually believes that these are the "incontrovertible facts" is that we've all agreed ahead of time who gets to make the decisions. This is tantamount to saying that if MLB were to choose Baseball Info Solutions as its official batted ball judgment team that suddenly defensive statistics would become "incontrovertible facts."

So, there is human error in taking the data that records defensive and offensive stats, but guess what? THAT'S OKAY. That some human error exists in collecting the data does not inherently make it the most important source of variability in the statistics that we collect.

There are a few reasons that I am really comfortable in saying that the variability in defensive statistics is not mainly due to human error in the data collection:

1. Yes, some defensive stats disagree on which defenders are good and which are bad, but these differences exist even for stats that use the same exact set of data, whether it is from BIS or whoever. My intuition is that because none of these stats are really open source, and some are highly proprietary, we're not really advancing towards a consensus on how to value the various bits of information that we are given. (From a personal standpoint, I kind of get this--it's much more fun to tinker around with the formulas to improve them than it is to sit down, spell everything out (anyone who has had to write documentation knows that it is not fun), cut through all of the erroneous criticisms to get to the real criticisms, and make the hard choices on where you were wrong and should incorporate someone else's viewpoint.)

2. Not all of the all-inclusive offensive stats give us the same picture of how valuable a hitter is. If our offensive stats, supposedly built on infallible data, can disagree, it seems as though we are holding defensive stats to an unfair standard.

3. Sample size.

4. Sample size.

5. Sample size.

6. Sample size.

7. Sample size.

8. Sample size.

9. Sample size.

10. Sample size.

Let me expound a bit, using Justin Morneau, everyone's favorite Kent Hrbek clone. RZR is one of the most straightforward defensive stats that I consider to be reasonable (though it's not really as good as +/- or UZR.) If you track each kind of batted ball for a season, you can figure out which types of batted balls are fielded by a particular fielder over 50% of the time. RZR then defines that as the fielder's zone. RZR is the number of successful plays made (Plays) on balls in the fielder's zone divided by the total number of balls in that zone (BIZ). After that, the Hardball Times also reports plays made out of the defender's zone (OOZ). For Morneau over the last five years, we have:

Year BIZ Plays RZR OOZ
2004 60 46 .767 20
2005 132 120 .909 42
2006 128 98 .766 71
2007 223 171 .767 18
2008 183 128 .699 22

The average RZR in 2008 for all first basemen was .739. Overall, Morneau looks pretty good compared to that average, coming in at a five-year average of .776 and beating the average in all but last year. Three of the five RZR's are actually remarkably similar to one another.

But what I want to focus on here is his total opportunities. Over five seasons, where he compiled over 2,500 cumulative at-bats, Morneau had a mere 726 balls in his zone--726 plays where an average first baseman had a better-than-even shot at making the play. That's only 145 chances per season.

Now let's look at Morneau's hits and extra-base hits from 2007-8 if we divide his at-bats to somewhere close to a 60/132/128/223/183 breakdown. (Span is the span of games I chose. 463-496, for instance, means Morneau's 463rd through 496th major league career games played.)

AB H AVG XBH Span
60 18 .300 7 (413-428)
134 37 .276 18 (439-462)
130 41 .315 17 (463-496)
222 56 .252 22 (497-555)
181 48 .265 12 (556-606)

Now here we have Morneau's average split up into comparable sample sizes to his defensive data, and all the sudden his incontrovertible batting average is jumping all over the place. Is he an average hitter or a superstar? Apparently batting average--really at the heart of all offensive stats out there--is a completely useless statistic that has been completely soiled by the human error of umpires misjudging safe/out calls.

Certainly I can find positions with larger sample sizes. [Okay, it looks like first base is the lowest, which I should have suspected because that's where everyone puts their worst defender, the Cardinals notwithstanding.] Center fielders and middle infielders tend to get more opportunities than anyone else (which is at the heart of why they are the most important positions, but says nothing in and of itself about how difficult they are to play), but even then, the effective sample sizes are pretty small.

If we look at MLB as a whole at each position, here are the number of BIZ per 140 games played (estimated as 8.5 innings per game.)

BIZ/140G -- Position
352 -- 2B
350 -- SS
292 -- 3B
288 -- CF
241 -- RF
228 -- LF
180 -- 1B

I nearly went into a huge RZR tangent, but I'm here to talk sample size. At best, we're looking at about 352 data points*, and some positions are clearly going to be more problematic than others. For fairness, I'll re-run the Morneau analysis, but with Torii Hunter.

*My tangent would have involved talking about how at some positions there are more "gimme" plays than there are at others, so not every BIZ is equally useful to us.

Year BIZ Plays RZR OOZ
2004 287 236 .822 65
2005 222 185 .833 33
2006 330 295 .894 48
2007 384 342 .891 47
2008 289 257 .889 93

Note that the average RZR for centerfielders in 2008 was .922, and they had on average 80 OOZ plays per 140 games (with games estimated as 8.5 innings per game.) Now if we take the first approximately 287/222/330/384/289 AB's from those seasons, we get:

AB H AVG XBH Span
288 78 .271 34 (692-768)
220 61 .277 26 (830-886)
329 90 .274 28 (928-1016)
382 110 .288 50 (1075-1178)
287 81 .282 32 (1235-1309)

Torii's actually pretty consistent with these endpoints, basically as consistent as his season totals. Then again, his RZR is fairly consistent, too, if you accept that something changed between '05 and '06. I'd like to use his injury history to explain that away, but '06 was the season where he probably played the most games while clearly hobbled with an injury. (Though in general, I think his play in the field suffered from him playing through injuries.)

At any rate, in the very best case scenario, you're looking at about half the sample size for defense that you get on offense, and on top of that, I think there are a lot of plays out there (especially in the outfield) that don't inform us as much as a typical at-bat. We can sit back and blame STATS or BIS all day, but ultimately I see no reason to blame them for the variability--we just have less data to work with and we need to figure out how to work under that limitation.

Last but not least---There is still time to let Casey Blake walk, Bill Smith. Back out while you can! I know it stings when you let one get away, but remember the sage words of The Hold Steady: There's always other boys, and you can make them like you!

8 comments to In Defense Of Human Error

  • Granted I don't spend a lot of time thinking about defensive metrics and their limitations, but the small sample size issue hadn't occurred to me.

    Thanks, ubes, good job laying it all out and giving me something relevant to consider when I do think about these things.

    Unrelated note:

    They are pouring through defensive studies and seeing that below-average defenders like Ramirez and Burrell in the field depreciate their offensive numbers because of what they give up.

    Pouring - (verb) to make a liquid or granular solid flow from a container.

    Poring - (verb) to read or study with steady attention; to meditate or ponder intently.

    That is one of those things that gets me riled up every time.

  • I'll ditto Big Mak. Very nicely done.

    The sample size thing is, I guess, obvious once somebody goes to the trouble to make the case (which is often the situation with Good Ideas). Many PA (~30 pct in the AL, adding in HBP) end in True Outcomes, which cuts down the potential chances for fielders.

  • Offensive stats are so uncontrovertable that official scorers will even change them at a later date. So that they are even more uncontrovertable.

  • So, basically a player at a "premium" defensive position will have somewhere around twice as many PAs as balls in his zone during the year. A player at a non-premium defensive position will have 4 or 5 times as many PAs as BIZ. It would seem to me that sacrificing offense for defense is a really stupid idea. Especially when you consider how many routine outs there are in a game, plays that will be made 99 percent of the time no matter who is out there (assuming positioning isn't totally out of whack).

    So, maybe the angst over Harris playing SS was/is overblown. And the applause for signing Everett was way overblown.

    Also, I'm re-thinking my excitement of watching Gomez and Span playing CF and LF. I'm thinking it would be a much better idea to start the season with Gomez in AAA and Denard in CF with Delmon in LF and Cuddyer in RF. Not pretty defense, but certainly the most ideal offensive allignment.

    • so, for a premium defensive position, 200 PA for every 100 BIZ. Assuming a OBP of .350 and a defensive fial rate of .10 (errors plus balls not fielded that "should" have been), that's about 70 safeties for every 10 BIZ missed outs.

      for a non-premium defensive position, call it 450 PA for every 100 BIZ. Again assuming an OBP of .350 and failure rate of .10, that's 140 safeties for every 10 missed outs.

      Hence, comparing a good-glove, no-stick SS (say, a .300 OBP and a .05 failure rate) to its opposite (say, a .400 OBP and a .15 failure rate), the tradeoff over a 600-PA season would be in the ballpark of 180 safeties to 15 defensive failures vs. 240 safeties to 45 defensive failures. Does that sound about right?

    • A player at a non-premium defensive position will have 4 or 5 times as many PAs as BIZ. It would seem to me that sacrificing offense for defense is a really stupid idea.

      It's not nearly that black and white. I would rephrase to say that elite bats have an impact so great that even an elite defender can't have that sort of value. Look at Albert Pujols, for instance. He had a 98.7 VORP this last season--far and away more than anyone else in baseball. With Pujols, obviously you don't have to sacrifice defense, because he's a good defender, but even if he was a -20 runs defender (which is basically as bad as you can be relative to your position until they move you to a different position), he could still have an MVP case. Conversely, no one can possibly be so good on defense and have even an average bat that they would be better than 100 runs over replacement level.

      All of those PAs give players more opportunity to distinguish themselves from other players, but not that many players really distinguish themselves that much. There were about 192 players between 5 and 30 VORP last year. For all of those players, defense matters, because the difference between an above-average defender and a below-average defender can easily be 10-20 runs.

      The correct way to look at things is to convert everything to runs and then see whether or not the offense outweighs the defense or vice versa. Hard and fast rules like "don't sacrifice offense for defense" are not going to lead to good conclusions. If anything, the Rays this last year showed us how much difference correctly aligning your defense can make, and it was a huge difference.

      Especially when you consider how many routine outs there are in a game, plays that will be made 99 percent of the time no matter who is out there (assuming positioning isn't totally out of whack).

      Be careful to avoid this trap. Just as there is a practical lower bound for the number of plays that a defensive player will make because he has some easy plays to make, there is a practical lower bound for how well a player selected for the major leagues will hit. Surely a good portion of Mario Mendoza's .215 batting average was built on meatballs that practically anyone would have hit. We need to focus on how players distinguish themselves from each other--at basically every position there is a 10-20-run swing between being a good defender and a poor defender, and at the extremes you get 30-40-run differences between really good and really bad defenders at a given position.

      So, maybe the angst over Harris playing SS was/is overblown. And the applause for signing Everett was way overblown.

      I do not think that this is the case at all. For players with non-elite bats, their defense very much matters. Everett in one of his seasons from say 2003-2007 is about 40 runs better than Harris in a typical Harris season. That's huge. (And at least personally, the Everett signing was good in large part because we're not paying Everett anymore, which is more than we can say for Lamb. Everett's contract was ridiculously low-risk and there's almost no way that a contract like that can really hurt your team.)

      Also, the difference between Everett and Harris is about as big of a defensive value gap at any one position as you can find. Elite defenders are usually about +20 runs relative to their position and terrible defenders are usually about -20 runs relative to their position. 40 runs is a really big deal--only 40 players in all of baseball managed a VORP of higher than 40 last year. Now, if you want to talk about the impact of Punto versus, say, Tolbert at shortstop, you're probably down to a 10-15-run difference, which is a lot less of a big deal. Everett and Harris are really extreme cases. Everett grades out as essentially the most valuable defender in the era of keeping detailed defensive stats and Harris grades out essentially as badly at shortstop as Manny is in LF.

      Also, I'm re-thinking my excitement of watching Gomez and Span playing CF and LF. I'm thinking it would be a much better idea to start the season with Gomez in AAA and Denard in CF with Delmon in LF and Cuddyer in RF. Not pretty defense, but certainly the most ideal offensive allignment.

      None of these guys is such a good hitter that we can ignore their defensive value. Over the course of a full season, Delmon is conservatively a -10-run left fielder.* Span is conservatively a +10-run corner outfielder and an average CF. Gomez is conservatively a +10-run center fielder.** So a Delmon-Span alignment is collectively about -10 and a Span-Gomez alignment is collectively about +20. Is Delmon 20 runs better than Gomez? It was close last year--StatCorner has Gomez as -20 with the bat and Delmon as -3 with the bat.

      (*Delmon is possibly worse than this. PMR has him as the 2nd-worst LF in baseball, and John Dewan's +/- had him as the worst left fielder in baseball at -25 plays, and outfield plays are more valuable than infield plays (tend to go more often for XBH), so that's probably more like -15 runs.)

      (**Dewan has Gomez at +32 plays for 2008, which is probably around +20 runs.)

      Obviously, Delmon could improve at the plate next year, but so could Gomez. Also, if you are more confident that Gomez is very good in CF and Delmon is very bad in LF, you could come up with a difference on defense of about 35 runs, and I would be very, very surprised if Delmon was 35 runs better than Gomez at the plate next year. Ultimately, though, you can platoon all of the outfielders collectively, both offensively and defensively (much like I suggested platooning Punto and Harris last year) to take advantage of their various strengths and weaknesses, and I think that would be the best thing for the Twins.