Baseball strikeouts analysis

Stuart Zussman
4 min readAug 28, 2018

In my last story on a time series of strikeouts in baseball, I noted that one potential explanatory factor for the increase in strikeouts over time is the introduction of specialty pitchers. The hypothesis is that tired starters are being removed earlier in games and fresh relief arms brought into games. I threatened another post to consider this hypothesis, and here it is!

First, for this analysis, the summary-level MLB data won’t suffice, as we will want to look at strikeouts by inning (rather then just in aggregate) to determine:

(1) do starters get strikeouts more or less frequently than relievers?

(2) do strikeouts by starters wane as a game progresses?

We will use the incredible Retrosheet* data (event files specifically) for this analysis. Because I am a Mac user, I do not use the Retrosheet event readers (.EXE files) but instead have written pretty simple file parsers in both Ruby and Python.

First, looking at 1976 games, here is the strikeout per inning rate as games progress, by type of pitcher:

A few observations:

(1) Starters have high strikeout rates earliest in the game, and it tails off throughout the game

(2) Relievers have consistently higher strikeout rates per inning than do starters.

(3) Obviously as the game progresses, the overall average rate moves away from the starter trend toward the reliever trend.

Let’s look at 1996 now.

Same trends as before except item (3) above is more pronounced now due to the heavier use of relievers earlier in games. Also, strikeout rates are higher now. Finally, we also see a climb in the reliever strikeout rate as the game gets into the late innings, possibly due to the increasing use of relief specialists who can “give it their all” knowing they are not generally pitching for more than an inning or two.

Now let’s look at 2016.

Again, all of the trends are present. Also, very clearly, strikeout rates are higher now. Starters continue to tail off during the game.

In conclusion, it seems indeed that the higher strikeout rates are at last partially due to the trend toward relievers, who get higher strikeout rates than starters, entering earlier in games and the use of relief specialists that can “give it their all” when in the game.

I also feel obligated to examine that pesky trend we see in 1976, 1996, and 2016 where the strikeout rate for starters ticks up in the 8th or 9th inning . Why does that green line tick up in each case? My hypothesis: the starters that remain in the game are probably winning and, knowing this could be their last inning of the game, are letting it rip knowing they do not need to conserve energy for additional innings. Also, maybe the losing team is swinging for the fences with the hope of catching up in the late innings.

Finally, for some added fun, let’s look at a couple of trends — strikeouts by month and strikeouts by day vs. night games.

First by month and pitcher role for 2016:

Some variability is seen across the months and a next step would be to normalize by the number of games in each month. It would be interesting to draw this out across several seasons to see if May and August really have higher strikeout rates generally. I am doubtful but open to the possibility!

Then by day vs. night:

More recently, strikeout rates have been higher in day games than in night games. That coincides with the common wisdom regarding the difficulty in hitting in the shadows, i.e. the pitcher’s mound is in the sun and the batter’s box is in the shadows.

Note that I have cross-checked my total strikeouts against Baseball Reference season totals. Always a good idea when working with huge volumes of bottoms-up data and creating one’s own data processing methods. I also utilized data from the Chadwick Baseball Bureau in order to confirm mid-game partial inning counts for starters and relievers.

I am using Python again for the bulk of this analysis, but this time. for the challenge of it, only using Python Lists rather than the Numpy ndarray or Pandas dataframe.

*The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

--

--

Stuart Zussman

Retired finance professional passionate about technology