Chickens, Eggs, & Sequencing

Feb 19, 2024

Bill James' Runs Created formula (and by extension, Win Shares) credits batters with an additional runs created for each hit and homer they hit with men on base, beyond how many he would have been expected to have based on his overall batting average/home run rate and number of at bats with men on base. It has been about twenty-five years since James added this situational component to the formula, so I don’t recall the exact explanation, but I believe he found that making these adjustments at the team level improved the accuracy of the runs created formula.

At the team level, if you find excess production with men on base, you have detected evidence that the team sequenced their offensive events in a more optimal manner than they otherwise might have. This is intuitively easy to grasp, much harder to see when the examples are commonplace rather than extreme. Any baseball fan can appreciate that three singles in an inning will usually produce a run, while singles and nothing else in three consecutive innings will rarely produce a run. Traditional runs created formulas just look at seasonal or game totals, and have no information regarding the way in which the events were clustered, so it’s not surprising that the accuracy of the estimate can be improved by taking this information into account.

However, extrapolating this to the individual level to conclude that individual batters who hit well with runners on base have contributed more to their teams takes a leap of faith that I believe glosses over some important mathematical and philosophical considerations. At the team level, it doesn’t matter which individuals are responsible for the events that place runners on base and which are responsible for the positive events that occur once they are there. The team will benefit with more runs scored than they otherwise would have had they not clustered these events in the same inning.

However, the runners reaching base is a necessary precondition for the batter to come along and drive them in. If one built a team of players who performed better with runners on base, they would have fewer opportunities with men on base than a team that had players who performed equally well regardless of whether men were on base, and many fewer than a team that had players who performed worse with men on base but had equally good statistics overall. You might begin to see the philosophical rabbit hole here, because the assertion of “overall” presupposes some mixture of bases empty/runners on base. But if batters have different abilities in this situation (and for the purpose of this post I am assuming that they do), the proportion of bases empty/men on base that combine in any particular situation to produce “overall” statistics will vary based on the composition of the team.

Are you confused yet? I am, and I think you should be too. I do not think that the purveyors of situation-infused value metrics have properly grappled with the question of how high leverage situations arise (this also applies to late game relievers and the value of their innings relative to those of other pitchers, particularly starting pitchers). One might be tempted to say “let’s look at what RE24 or WPA say”, but these metrics cannot untangle the knot – they inherently accept a “real-time” view of value in which the value of any particular event is defined by its value as best we can measure it in the moment. While this is the context in which the human beings involved, from the players to the fans to the reporters (if they are human) experience it since we are neither time travelers nor God-like beings free from the constraints of linear time, there is no obvious inherent reason why we must adopt such a perspective when retroactively assigning value to events that transpired in a previously completed game.

To retreat from haughty talk that might belong in a late night sophomore dorm session, I will close by admitting that I do not have an easy answer to offer to the very ill-defined issues that I’m raising. But I think this is an issue where one must carefully consider whether extrapolating from the team to the player level is warranted. In order for Player A to demonstrate his superior ability to hit with runners on base, Player B must reach base. If Player A is deficient in his ability to hit without runners on base, he will cash in more of the opportunities created by Player B but will also on occasion create less opportunities for Player C to produce with runners on. In giving extra credit to Player A for performing better with men on base, are you asserting that Player B’s contributions are less value even though they were a necessary precondition for Player A’s performance? Are you sure that you have accounted for all of the ways in which the performance of one player in the lineup impacts the opportunities provided to the other eight? When you pass the credit for good sequencing at the team level down to the player level, why does the batter who cleans up get credit but the batter that set up doesn’t, even though both were necessary for optimal sequencing to be achieved?

Guy

Feb 20, 2024

This is a great point, and I think it applies to some efforts to use WPA and similar metrics to measure player value as well. When offensive events cluster to a team's advantage (or disadvantage), why should only the back end players be credited/debited?

To take Bill's favorite example, Dave Parker led the NL in runs created in 1978, in part because he hit very well with men on base (and RISP). But it's also true that Moreno (45%) and Taveras (38%) finished 1st and 8th in the league in frequency of scoring when not hitting a HR (RS%). Why not reward them for getting on base as such propitious times?

And to the extent that LH hitters like Parker so benefit from having a runner on 1B, why give that credit to Parker rather than the player who gave him that edge?

Expand full comment

Walk Like a Sabermetrician

Discussion about this post