Just how inconsistent is your offense?

September 5th, 2007 by ubelmann

Since the very beginnings of baseball, fans have complained about their offense being too inconsistent. Well, I don't know that for certain, but at the end of the post, I'll give you an example of a team that I think will convince you.

If someone says an offense is good or bad, it's easy to point at how many runs they've scored as a measure of their quality. However, when someone claims an offense is inconsistent, there aren't any readily available stats to address that question.

Ideally, our measuring stick for consistency would be really simple. The simplest measure I can think of is the standard deviation of runs scored. If you've ever taken (or at least passed) a stats and probability class, you probably know about this. Those interested in the details of calculating the standard deviation should look elsewhere, but it's fairly easy to understand conceptually.

Say we have three hypothetical teams that each score 5 runs per game. The Steady Stans score exactly 5 runs in every game. Since every observation is exactly the same as the mean, their standard deviation is zero. The Jekyll/Hyde Psychopaths score 0 runs in half of their games and 10 runs in half of their games. Their standard deviation is five. The Average Joes score anywhere from 0 to 10 runs in each game, and thus have a standard deviation somewhere between zero and five.

So basically, the larger your standard deviation is, the more inconsistent your offense is. This jives with the idea of consistency because we tend to think of something as consistent when it is nearly the same each time out and we think of things as inconsistent when wildly different outcomes are commonplace.

Consistency as measured by standard deviation

Armed with the run distribution for each team and a spreadsheet, I can calculate each team's standard deviation:


Team StdDev Consistency
TEX 3.8 Inconsistent
NYA 3.7 Inconsistent
DET 3.6 Inconsistent
BOS 3.6 Inconsistent
COL 3.5 Inconsistent
MIN 3.5 Inconsistent
OAK 3.4 Inconsistent
KCA 3.4 Inconsistent
ATL 3.4 Inconsistent
SLN 3.3 Inconsistent
SEA 3.2 Typical
SDN 3.2 Typical
ANA 3.2 Typical
TBA 3.2 Typical
PIT 3.2 Typical
CLE 3.1 Typical
TOR 3.1 Typical
PHI 3.0 Typical
FLO 3.0 Typical
CIN 3.0 Typical
LAN 3.0 Consistent
SFN 2.9 Consistent
HOU 2.9 Consistent
ARI 2.9 Consistent
NYN 2.9 Consistent
MIL 2.9 Consistent
CHN 2.8 Consistent
CHA 2.8 Consistent
WAS 2.7 Consistent
BAL 2.5 Consistent

The standard deviation is in units of runs. As a simple but rough way to classify the offenses, I divided the 30 teams into 3 groups of 10: inconsistent, typical, and consistent. The Twins, as we might expect, fall in the inconsistent category.

Henry Blanco Syndrome

Those astute readers will have noticed that many of the above inconsistent offenses score a lot of runs. The Yankees, Tigers, and Red Sox have essentially the three best offenses in the AL and all are considered inconsistent by this measure. It turns out that this seems to be a pattern:

Runs Per Game vs. Standard Deviation

We could do a linear regression analysis here, but just by looking at the graph, we can see a pretty strong relationship between a team's runs scored per game and the standard deviation of their runs scored. [Technical side note: The variance for each team is 2-3 times as large as the mean, so the run scoring distribution doesn't seem to be all that well approximated with a Poisson Distribution.]

For those who prefer charts, I divided the offenses into three groups (good, average, bad) according to how many runs per game they scored:


Team R/G StdDev Quality Consistency
NYA 5.88 3.7 Good Inconsistent
DET 5.54 3.6 Good Inconsistent
PHI 5.50 3.0 Good Typical
BOS 5.31 3.6 Good Inconsistent
ANA 5.15 3.2 Good Typical
COL 5.08 3.5 Good Inconsistent
CLE 5.05 3.1 Good Typical
TEX 5.02 3.8 Good Inconsistent
ATL 4.99 3.4 Good Inconsistent
SEA 4.94 3.2 Good Typical
CIN 4.88 3.0 Average Typical
FLO 4.84 3.0 Average Typical
NYN 4.83 2.9 Average Consistent
MIL 4.79 2.9 Average Consistent
TBA 4.76 3.2 Average Typical
BAL 4.61 2.5 Average Consistent
PIT 4.57 3.2 Average Typical
CHN 4.57 2.8 Average Consistent
MIN 4.55 3.5 Average Inconsistent
TOR 4.52 3.1 Average Typical
KCA 4.51 3.4 Bad Inconsistent
SLN 4.51 3.3 Bad Inconsistent
SDN 4.50 3.2 Bad Typical
LAN 4.48 3.0 Bad Consistent
OAK 4.45 3.4 Bad Inconsistent
HOU 4.40 2.9 Bad Consistent
ARI 4.28 2.9 Bad Consistent
SFN 4.26 2.9 Bad Consistent
CHA 4.17 2.8 Bad Consistent
WAS 3.96 2.7 Bad Consistent

This gets me finally to Henry Blanco Syndrome. It is easy for a hitter to be consistent, or at least, it's easy for a hitter to be consistently bad. Because Blanco's mean is low (a .220 batting average) it's easy for him to be consistently close to his mean.

The same thing generally holds true for team offense. If you have a crappy offense that scores an average of one run per game by always scoring 0, 1, or 2 runs, it would have a very low standard deviation--it's consistently bad. Now, if you want to have an offense that averages 7 runs per game to have the same standard deviation, they are going to have to essentially score 6, 7, or 8 runs per game. I think we can all agree that it'd be a lot harder to score at least 6 runs per game than it would be to average 7 runs per game overall.

This raises some quesions:

1) Is standard deviation a good way to measure offensive inconsistency? To a degree, I think that the standard deviation is giving us an accurate reflection of how fans/players/managers/etc. view the game. Pitching and defense are highly valued because good pitching and defense are by nature consistent, while good offense is less valued because it can "disappear."

2) How much do teams control their consistency? Just by looking at the data above, one might hypothesize that there's no difference between having a high scoring average and having a high standard deviation. Thanks to GreekHouse's excellent three-part series on Piranhas versus Sharks, we have some indication that teams that depend mostly on singles are less consistent than teams that depend mostly on home runs, even if they have the same scoring average. Of course, GreekHouse examined the most extreme cases possible, so it remains to be shown how large we should expect that effect to be in a real life offense.

3) Can we devise a measure of consistency that is (mostly) independent of scoring average? The simplest thing to do here is to just divide the standard deviation by the scoring average (runs per game). I'll call this 'scaled consistency.'


Team R/G StdDev/(R/G) Quality ScaledConsistency
OAK 4.45 0.77 Bad Inconsistent
MIN 4.55 0.76 Average Inconsistent
KCA 4.51 0.76 Bad Inconsistent
TEX 5.02 0.76 Good Inconsistent
SLN 4.51 0.74 Bad Inconsistent
SDN 4.50 0.72 Bad Inconsistent
COL 5.08 0.69 Good Inconsistent
SFN 4.26 0.69 Bad Inconsistent
PIT 4.57 0.69 Average Inconsistent
TOR 4.52 0.69 Average Inconsistent
WAS 3.96 0.68 Bad Typical
ATL 4.99 0.68 Good Typical
BOS 5.31 0.67 Good Typical
ARI 4.28 0.67 Bad Typical
TBA 4.76 0.67 Average Typical
CHA 4.17 0.66 Bad Typical
LAN 4.48 0.66 Bad Typical
SEA 4.94 0.66 Good Typical
HOU 4.40 0.65 Bad Typical
DET 5.54 0.65 Good Typical
FLO 4.84 0.63 Average Consistent
NYA 5.88 0.63 Good Consistent
ANA 5.15 0.63 Good Consistent
CLE 5.05 0.62 Good Consistent
CIN 4.88 0.61 Average Consistent
CHN 4.57 0.61 Average Consistent
MIL 4.79 0.60 Average Consistent
NYN 4.83 0.59 Average Consistent
PHI 5.50 0.55 Good Consistent
BAL 4.61 0.54 Average Consistent

This might be over-correcting the problem, since no bad offenses manage to get listed as consistent, but at least one good, average, and bad offense gets classified as inconsistent.

There are other questions to ask here regarding how this correlates to team power, how much of this is just dumb luck, and just how valuable it is to be consistent anyway. But those are questions for another day.

Back in my day, we walked uphill both ways

I earlier asserted that since the beginnings of professional baseball, fans have complained about inconsistent offenses. As supporting evidence, I present the 1871 Troy Haymakers (great name, by the way.) They finished third (of nine) in scoring that season with 12.1 R/G, well above the league average of 10.5 R/G. How they got there was a different question altogether, though. Check out their runs scored by game:

5, 29, 3, 5, 25, 20, 8, 5, 11, 3, 33, 37, 9, 3, 10, 13, 11, 4, 17, 4, 5, 5, 17, 16, 13, 3, 13, 5, 19 (And yes, the numbers on the board are correct.)

Of their 29 games all season, the Haymakers played in 2 one-run games and 17 games decided by 5 or more runs. I think it's safe to say that their fans considered the offense (and probably the whole team) a tad inconsistent.

For those old-timers out there yearning for the days when men were men and pitchers finished what they started, I present the Haymakers' John Mullin, a lefty who started every game that season and pitched all but one of the team's 250 innings on defense. His 12/75 SO/BB ratio doesn't look terribly appealing, though, even considering that the league as a whole had 175 SO and 393 BB. (And yes, perhaps I'm distracting myself with the oddities of baseball in the 1800's, but at the moment it's more amusing than perusing Twins statistics.)



This entry was posted by ubelmann on Wednesday, September 5th, 2007 at 3:00 pm and is filed under Guest Writers, MLB, Minnesota Twins, Stat Geekery, ubelmann. It is one of 640 entries by the author. We are no longer accepting Letters to the Editor on this post.



Comments Feed26 Letters to the Editor

Rhubarb_Runner replied on September 5, 2007 at 3:14:17 pm

I wondered why you weren't an active participant in today's gamelog. Nice to see you spent your time more wisely than the rest of us.

SBG replied on September 5, 2007 at 3:17:17 pm

Nah, this has been in the can for a few days. ubelmann saw today's lineup and went out to smell the roses for an afternoon.

twayn replied on September 5, 2007 at 3:41:14 pm

We should probably all take some time to sit under a cork tree these days. Is Ubelmann's first name Ferdinand, by chance?

 
brianS replied on September 5, 2007 at 4:03:08 pm

it was either that, or this would have happened to his head:

E-6 replied on September 5, 2007 at 4:14:07 pm

Come on. Try a little harder...

brianS replied on September 5, 2007 at 4:16:29 pm

oohh. very nice.

 
twayn replied on September 5, 2007 at 4:17:23 pm

Scanners?

E-6 replied on September 5, 2007 at 4:31:26 pm

The one and only.

(Comments wont nest below this level)
 
 
 
 
 
 
SBG replied on September 5, 2007 at 4:40:17 pm

Sorry to, you know, comment on the substance of this post, but good, average, and bad based on runs/game? What, no park factor or league factor adjustment?

ubelmann replied on September 5, 2007 at 5:15:08 pm

Using value tags like good and bad wasn't really what I was going for. I think I probably should've gone with high, average, and low run scoring teams.

It seems as though high-scoring teams are more inconsistent by nature--how much of that you want to attribute to the run-scoring environment and how much you want to attribute to the offense itself isn't something that I have enough data to address right now. The role of park factors here seems to be tricky and I spent quite a bit of time on this the way it was, so I decided to sweep it under the rug, or at least table it until later.

 
 
brianS replied on September 5, 2007 at 5:12:21 pm

His 12/75 BB/SO ratio doesn’t look terribly appealing, though, even considering that the league as a whole had 175 SO and 393 BB. (And yes, perhaps I’m distracting myself with the oddities of baseball in the 1800’s, but at the moment it’s more amusing than perusing Twins statistics.)

But foul balls were not counted as strikes pre-1900, right? Having a K/BB ratio of 6:1 when the league was 1:2 overall looks pretty darned impressive to me.

ubelmann replied on September 5, 2007 at 5:15:49 pm

Sorry, I typed it backwards. It was 75 walks and 12 strikeouts.

 
 
GreekHouse replied on September 5, 2007 at 8:09:56 pm

I agree that you need to somehow account for the number of runs a team scores per game. I'm not an expert on stats (yet), but I did have one idea. If you try normalizing the data before calculating the standard deviation by dividing all your data points by R/G, this might give you the metric you want. This way, all teams will be averaging 1 "scoring unit" per game making the SDs more comparable.

GreekHouse replied on September 5, 2007 at 8:42:27 pm

On second thought, I think this gives the same answers as you have above. Not really much help. :)

GreekHouse replied on September 5, 2007 at 8:56:20 pm

On an unrelated note, I learned the hard way that baseball runs aren't Poisson distributed (through a failed experiment in sports betting). The reason is very similar to Sharks vs. Piranhas. The Sharks have a run distribution that is very well approximated by a Poisson distribution because all of their runs are coming one at a time. On the other hand, the Piranhas tend to score runs in bunches and so Poisson fails.

Team run distribution can be very closely approximated by an average of Poissons. The result is pretty close, but I think that a better approximation can be made by somehow accounting for both runners LOB and R/G.

brianS replied on September 6, 2007 at 12:44:26 am

I have the runs by team-game-inning data (downloaded from retrosheet) and will be working out the observed distributions (tomorrow perhaps). then maybe we can talk.

ubelmann replied on September 6, 2007 at 12:48:04 am

How are you harnessing the retrosheet data? I've been meaning to use that stuff forever now, but I'm having trouble getting over the initial hump of figuring out what format their files are in.

brianS replied on September 6, 2007 at 12:58:46 am

they give the formatting for the game logs here

the data is just ASCII with a bunch of comma-delimited fields. Unfortunately, the line score data is a single field whose length varies (extra innings; short games) and includes both integers, the letter "x" for last half-innings not played, and left and right parentheses (surrounding double-digit run totals).

so far, I've done nothing all that intelligent.

I should have said I have the game log data for 2006. I just grabbed the file, extracted the "visitor", "home", "visitor line score" and "home line score" fields and cleaned them up by hand.

right now, I have a text file where each record is a game and the fields are visitor name, visitor runs in each of innings 1-max observed innings for the season, home name, home runs in each inning.

I'm sure there are vastly more efficient ways to do this, but I haven't dedicated the clock cycles to think it through yet.

(Comments wont nest below this level)
 
 
 
ubelmann replied on September 6, 2007 at 12:46:30 am

That's a good link. I'll have to look into that more later.

 
brianS replied on September 6, 2007 at 12:48:18 am

Team run distribution can be very closely approximated by an average of Poissons.

of course, the sum of N independent Poisson random vars is itself a Poisson random var with a parameter equal to the sum of the N component parameters. So, saying that team run distribution can be closely approximated by an average of Poissons is pretty much saying that team run distributions can be closely approximated by a Poisson.

brianS replied on September 6, 2007 at 1:09:07 am

so, why does Poisson fail?

here's why. Underlying the observed runs data are batter-by-batter "bases advanced" processes. Poisson provides a good model of bases advanced for crappy hitters but not so much for sluggers.

try it with Nick Punto for 2007; I did -- lambda of 0.33; you get the following
0 bases: 71.9 pct of the time
1 base: 23.7 pct of the time (singles + walks + hbp)
2 bases: 3.9 pct
3 bases: 0.43 pct
4 bases: 0.07 pct

that produces a predicted distribution verrrrry close to the observed distribution of outcomes

Not so much with Morneau (lambda of 0.5575)
0: 57.26 pct
1: 31.9 pct
2: 8.9 pct
3: 1.65 pct
4: 0.25 pct

too few failures and HRs (particularly HRs), too many one-, two- and three-base outcomes.

so, the more the offense relies on HR hitters, the poorer an "average of Poissons" model of bases advanced by the offense will be because the tail of the distribution is just too thin.

 
GreekHouse replied on September 6, 2007 at 10:04:27 am

Are you sure about that?

Just look at the example from the article. We're taking the average of Poissons with mean lambda, lambda-2, and lambda+2. Suppose this is equal to another Poisson with mean mu. Then, the means of the two distributions must be the same and so mu=lambda, but looking at the graphs of the two show that they're clearly not the same.

brianS replied on September 6, 2007 at 10:58:00 am

Well, I'm pretty sure about the first part (sum of independent Poisson-distributed random vars is a Poisson-distributed random var).

I haven't thought hard about the more general linear combination thing (I'm not a good enough mathematician to do it without thinking hard). But it sounds about right, don't it? the average is just a special case of a linear combo.

(Comments wont nest below this level)
GreekHouse replied on September 6, 2007 at 4:25:13 pm

You're right about the sum part, but the linear combination part doesn't work. You need not look further than my example above. The difference is that a linear combination is just a straight sum, whereas the sum of Poissons is really a convolution. For example, with two independent random variables X,Y we have:

P(X+Y=2) = P(X=0)*P(Y=2) + P(X=1)*P(Y=1) + P(X=2)*P(Y=0)

This will not in general be the same as [P(X=2)+P(Y=2)]/2.

 
 
 
 
 
 
 
brianS replied on September 6, 2007 at 1:28:17 am

Speaking of inconsistent offense...

Here's one theory. My alternative: everyone thinks they will suck.

Rhubarb_Runner replied on September 6, 2007 at 6:37:40 am

I knew this would happen. You've been talking Poisson, and now you've gotten into poison.

 
 

Sorry, the LTE form is closed at this time.

Feed

http://stickandballguy.com/blog / WGOM Headlines