[Yes, it's been nearly two years since I started in on this topic. But, hey, better late than never.]
In Part 1, we set out the basics for a comparison problem: who’s better, Mickey Mantle (maybe the best switch hitter ever) or Chipper Jones (maybe the best switch hitter today)? Part 2 addressed the problem of peak value (advantage: Mantle) versus career value (advantage: to be determined — though Jones has added to his stats a bit in the two seasons since I wrote the original installments).
Now we turn to the real core of the problem — how to judge two players in the context of different eras of baseball.
Look at the picture of Goose Goslin here. Goslin was a fine outfielder for the Washington Senators, and he was good for a long time. When he retired at age 37 in 1938, he had collected 2,735 hits, 1,483 runs, 1,609 RBIs, 500 doubles, and 4,325 total bases while batting .316 in 2,287 big-league games. Those numbers are outstanding in any era off the game.
But the baggy flannels remind us just how different the game was in Goslin’s day. Higher batting averages were common. (The entire American League averaged .300 in 1930.) Pitchers recorded far fewer strikeouts on average than they do today, but pitched complete games far more often. With the exception of a few Native Americans and light-skinned Latinos, the major leagues were the province of white men. Physical therapy and reconstructive surgery to heal injuries were rudimentary and haphazard at best. Few players lifted weights, and most of them needed off-season jobs to make ends meet. (Even in the 1960s, young Nolan Ryan pumped gas over the winter while he was a member of the Mets organization.)
To understand how numbers from then compare to numbers from today, we have to make the same kinds of adjustments that economists make when they recalculate nominal dollars as real dollars to account for inflation (or deflation) over time. If economists didn’t make these adjustments, prices or salaries from earlier eras would seem ridiculous when put alongside values from today. If we don’t make these adjustments for baseball, we’ll end up thinking that Ty Cobb and Rogers Hornsby — each of whom batted above .400 for a season three times — were incomparably better than today’s hitters, which is similarly ridiculous.
At this point, I’m going to not write several hundred or several thousand words talking about the evolution of park-adjusted, era-adjusted, and similar stats. I will just say that, if you look under the sections labeled “Player Value–Batting” and “Neutralized Batting” in any Baseball Reference player record, you’ll discover a whole new world of analysis. That analysis reflects many years of hard work by people who love baseball — and who have serious mathematical skills — as they have tried to figure out how to make fair comparisons between players of different eras.
Using Statistics as Blunt Instruments
In the old days, batting average was the ultimate measure of batters. If a guy was a lifetime .300 hitter, he was considered a good hitter. If he batted .310, he was considered to be that much better than a .300 hitter, Q.E.D.
Ah, but baseball isn’t so simple as that. Sure, if all I know about two players is their batting averages, then I’ll take the guy with the higher number — and hope that Mr. .310 isn’t a pure singles hitter while Mr. .300 is leading the American League in extra-base hits and walks (neither of which is measured by batting average). If all I know about two power hitters is how many RBIs they’ve each racked up over the past five years, of course I’ll pick the guy with the higher total — and hope that it isn’t a case of an above-average hitter looking better than a great one by virtue of the teammates who bat in front of him.
But we don’t live in that hypothetical information-starved world, and our analysis of baseball is just much, much better now. Our understanding of what hitters do that puts runs on the board is better. Our knowledge of how certain home parks artificially inflate or deflate hitters’ numbers is better. Our knowledge of how pitching and defense keep runs off the board is better. So it doesn’t work to stick with the blunt instruments that Grandpa used 70 years ago to compare Mel Ott to Goose Goslin.
Sticking with the old way would be like meeting a neurosurgeon, here in 2011, who’s not interested in these “newfangled” CAT and PET and MRI scans for diagnosing aneurysms. Just think about what your reaction would be if your doctor said, “X-rays and exploratory surgery were good enough when I started in medicine in 1960, and they’re good enough now.” You’d walk out immediately, because it would be insane to marry yourself to lesser tools just for the sake of . . . what, tradition? Habit? Obtuseness?
We’re better than that.
In Praise of Nuance
If you want to compare a bunch of hitters in the simplest way possible, start with these two statistics:
- On-Base Percentage — It has been demonstrated beyond reasonable doubt that OBP is the stat that tracks most linearly to runs scored for a team, because everything that improves OBP gives your team another baserunner without costing it an out. Everything that detracts from OBP costs your team an out, shortening or ending your team’s turn at bat and therefore reducing its chances of scoring more runs. Related: the list of the top 100 players ever by career OBP is a decent starting point for making your shortlist of the best hitters ever (once you use historical context to eliminate the players who compiled big numbers in the crazy world of 1800s baseball).
- Slugging Average — This number expresses the average number of total bases that a hitter achieves per at-bat. So a .610 slugging average means that the hitter — and this would be a great one like Albert Pujols — averages 61% of a base for every at-bat. Singles help this number just like they help batting average, but doubles help twice as much, triples three times as much, and home runs four times as much.
If you’re comparing hitters from one era, you can go with the raw numbers — unless one or more of the men played his home games in a park that had a big effect, positive or negative, on these numbers. (This is a big deal, for instance, in considering the current Hall of Fame candidacy of Larry Walker, who piled up big numbers in the hitters’ haven of Coors Field in Denver.) We know beyond any doubt that some ballparks favor hitters and some favor pitchers, and in fact the analysis in this vein has gotten sophisticated enough that it’s easy to tell which fields, say, are neutral in terms of batting average but depress batters’ ability to collect doubles and triples.
If you’re comparing hitters across eras, you’re well-served to consider the broader environments they played in. As mentioned above, the American League as a whole compiled a .300 batting average in 1930. Yet in 1968, Carl Yastrzemski won the A.L. batting title by virtue of being the only man in the league to crack .300. That’s not because the American League forgot how to hit in the intervening 38 years, but because conditions radically favored hitters in 1930 and radically favored pitchers in 1968.
Ever More Context
It’s tempting to look for the single magic number that encapsulates the value of a player in a nutshell. But it’s dangerous. Even the best of the modern stats — VORP, WAR, Win Shares, etc. — have their drawbacks, despite trying to take into account players’ ballparks, eras, defensive positions, levels of defensive skill, baserunning abilities, and so on. (Again, I’ll save you several hundred words on why it means a good deal more to hit like Mickey Mantle when you’re playing superior defense in centerfield, as Mantle did, than it does to hit like Chipper Jones when you’re playing solid-but-unspectacular defense at third base, as Jones did. But it matters.) Yet these stats, at the very least, have the virtue that they try to account for context.
Baseball fans will never quote book, chapter, and verse on a player’s “park-adjusted Wins Above Replacement” like they do for batting titles, home runs, RBIs, and the like, which is fine by me since the older numbers are (a) simpler to remember, (b) totemic in some cases (Ted Williams’ .406, Babe Ruth’s 714), and (c) reflective of the game’s history. Just so long as we don’t kid ourselves that the old, raw numbers and the blunt-instrument thinking behind them — the X-rays from 1960 — are as good as the newer, sharper, contextualized modes of analysis. They aren’t. They can’t be.
It’s not the X-ray’s fault that it conveys less information about the aneurysm than the MRI does. But it’s true.
Mantle vs. Jones
Confession time: when I started this series of posts in 2009, I was smarting from a blunt-instrument verdict delivered by an acquaintance of mine — a big Braves fan — who hit me over the head with Jones’s batting average, waved away Mantle’s many other achievements, and told me that I couldn’t just assert Mantle’s superiority in the face of the numbers (like lifetime batting average) that proved that Chipper was better than The Mick. Part of the reason I let this final post languish for so long was that I wanted to let go of the idea of convincing this hard-bitten fan of the inescapable errors in that position. This was less about baseball analysis than it was about trying to pick my battles, and not trying to argue with someone’s whose mind was already closed to alternate interpretations.
I tried to approach the writing of the earlier posts — and this one — with an open mind. What I found verified the opinion shared by me and, I’m going to guess, well over 99% of serious baseball analysts: that Mantle was clearly superior to Jones in terms of peak value (see the previous posts for more on that). But I also discovered that Jones was a lot closer to Mantle than I thought in terms of career value. So I did learn something, even if my initial conclusion — that there’s no way Chipper’s as good as Mickey — still holds up.
What I’d like to convince you of now is that there’s room in baseball analysis for all kinds of numbers, including old favorites like batting average, RBIs, and even pitcher’s wins. They tell us what happened: Greenberg hit a double and drove in Gehringer from first. But the old favorites — love them though we do — simply don’t tell us as much about a player’s performance as the newer, more contextualized numbers do, especially across eras. They can’t.