2021 Fantasy Baseball Draft Prep: How should we weigh stats from the shortened 2020 season?

USATSI

For as long as I've covered Fantasy Baseball, the one factor I'd say most separates the pros from the normies is their understanding of sample size.

In the first few weeks of every season, certain players would have outlier performances, people would write in saying they're ready to overreact to them, and I'd tell them they can't because of -- you guess it -- sample size.

Lather, rinse, repeat.

But that's in a normal season that spans 162 games. What do we do about the weird season that was, covering only 60? It still goes in the ledger as a complete season, and everything that came before it feels like ancient history. (Need a way to keep busy? Try jotting down 10 things that happened in 2019.) But normally, we'd say that anything that happened over the course of 60 games doesn't pass the sample size test.

Or would we?

Eno Sarris wrote an excellent piece for The Athletic last June, back when the most common proposals being bandied about were 48 games and 89 games. He wondered how much more "legitimate" an 80-game season would be than a 50-game season, and while I won't go into all of his thorough and convincing analysis, I will point out that he put the tipping point at 60 games. By his data, that's when a season becomes more or less legit, and by total coincidence, that's how long the season ended up being.

Of course, his analysis was done on a team level and not an individual level. Introduce more examples of something, and you're sure going to get more exceptions. Still, it stood out to me because it backed up what I myself noticed when trying to come up with examples of crazy stat lines that would illustrate to my audience just how messed up things would be in a greatly reduced season. Back when 50 games seemed most likely, I was able to cobble together a list of players whose stat lines looked unrecognizable a mere 50 games into the previous season, but then when I tried to repeat the process for the eventual 60-game proposal, I came away frustrated.

Between 50 and 60 games in 2019, Martin Perez's ERA had jumped from 2.95 to 3.72. Rafael Devers' home runs had gone from six to nine, putting him on a more reasonable power pace in a season that ended with 32. The leaderboards no longer looked like a jumble of names but showed the right players on the right pages, even if not in exactly the right order. Sure, there were exceptions. I can cite well-worn examples like Yu Darvish and Jack Flaherty going bananas in the second half or Jose Ramirez's extended blip, but bottom line is that the league trends through the first 60 games of 2019 weren't so overwhelmingly nonsensical that it sufficiently backed up my point. I ended up illustrating it some other way.

Turns out that for the vast majority of players, 60 games may be enough to establish the general tenor of their season. And just to be intellectually consistent, I've long said that the appropriate time to begin questioning your assumptions about players (as opposed to kneeling before the altar of sample size) is six weeks in. A 60-game season put us closer to 10 weeks.

But wait ... should the 60 games in question be subject to the same scrutiny as any other 60-game stretch? There's also the small matter of the 2020 season being quite unlike any that preceded it. Players like J.D. Martinez and Josh Bell were among those who complained about a lack of video access due to health and safety protocols, and the absence of a firm timeline made a mess of training regimens. Players were forced to ramp up, then shut down, then ramp back up suddenly, and some were able to handle it better than others.

And then you have those who themselves suffered the effects of COVID, like Yoan Moncada and Austin Meadows. Are we supposed to take what they did at face value?

We're talking in circles here. It goes without saying that shaving off 60 percent of a normal season robbed the numbers of some of their legitimacy, but your response to those numbers ultimately depends on how much of a sample size zealot you are.

Let's consider some of the approaches you could take.

Treat 2020 like it didn't even happen

This approach has been adopted by "the smartest among us," which I put in quotation marks because I'm not so sure they're on the right side of the argument this time. They're the true zealots who believe that reality only manifests on their terms. And a 60-game season ain't it.

It comes from principled place, but it's principled to a fault, treating players like a dice throw or something else with a predictable probability when players are, of course, living, breathing people. And as people, they're constantly making adjustments, putting their outcomes in a perpetual state of flux.

Every year, a considerable number of players get markedly better, and a considerable number get markedly worse, and by far the most common interval during which this change occurs is the offseason. It doesn't mean that every player doing something different through 60 games is in fact changed, but any analyst worth his salt should be able to sniff out where a substantive change actually took place and where it didn't.

If resting on principle means pretending like Corey Seager, Randy Arozarena, Marcus Semien and Madison Bumgarner are exactly who they were leading up to 2020, then it's clearly the wrong track to take.

Treat 2020 like you would any other season

If the previous approach to 2020 numbers is lazy, then this one is just plain silly, but I suspect it's a trap many will fall into. It goes beyond just that recency bias is human nature -- we're most inclined to believe what we just saw, after all -- and owes more to the idea that we're most comfortable doing what we've always done. And I don't think it's a stretch to say that when preparing for the start of a new season, the previous season's stats are, if not always predictive, most certainly revealing. Typically, you wouldn't be steered too terribly wrong by relying solely on them.

While I've already made a case for the idea that they're still pretty revealing after only 60 games, to whatever extent that's true, it's not true with a great deal of precision. The general direction a player was headed may be reflected in the numbers, but some remain cartoonish in magnitude, having not been tamped down by the normalization of a full-length season. Trevor Bauer may indeed be great again, but a 1.73 ERA is most certainly too good to be true. Adam Duvall may be a noteworthy source of cheap power, but he's nonetheless unlikely to sustain a 45-homer pace.

And then there's someone like Teoscar Hernandez, whose career as a fringy starter with modest power upside took a sudden turn for the studly in 2020:

Teoscar Hernandez RF

LAD L.A. Dodgers • #37 • Age: 31

2020 Stats
AVG .289	HR 16	SB 6	OPS .919	BB% 6.8	K% 30.4

Given that it came in spite of some ghastly plate discipline, including a 30.4 percent strikeout rate that would normally put success out of reach for all but a select few, it's possible he's a directional outlier whose numbers would have come crashing down if the season had played out like normal. Chris Bassitt, whose 2.29 ERA came in spite of some underwhelming strikeout numbers, may fall into the same category.

Look to each player's past 162 games

I'm calling this one the Frank Stampfl approach since our intrepid podcast host is the first one I heard propose it. And if you're looking for a catch-all that allows you to assess every player on equal terms, this one makes more sense than the previous two. It accounts for the 60-game season that was but packs on sample size by including the final 102 games of 2019.

Of course, it would be a more elegant solution if it was a more practical one, but to my knowledge, there isn't a website that can provide that information with just a few clicks. (If you know of one, please share.) Frank himself has resorted to calculating players individually as needed, which isn't a workable plan for the entire player pool. Maybe if you're good with spreadsheets, you can piece together enough leaderboards to make it a viable option, but it's a reach for the average Joe (or Frank, as it were).

Plus, there's still the matter of when players undergo their biggest transformation, which is typically in between seasons, so even though a 162-game sample is preferable to a 60-game sample, the particular breakdown being used here oversamples the 2019 version of each player — one that may now be obsolete.

My take is that it's fine as a starting point if you can reasonably come by the data, but as with any system that puts every player in the same box, eventually you're going to have to go case by case.

Take it player by player

Ultimately, this is the method I've come closest to implementing, being cognizant of the reduced sample size and taking big changes in production with the appropriate grain of salt, but also having the confidence to identify the makings of a more lasting trend.

Part of it's relying on built-in assumptions, like if a pitcher begins earning more swings and misses, particularly if it corresponds with a velocity spike or new pitch, I take it as strong evidence of a breakout. But those assumptions may also be player-specific. If a player started down a path in 2020 that confirmed what I already believed about him, I might lean into those beliefs even harder. Some might call that "confirmation bias" -- which isn't totally fair since true confirmation bias would also mean ignoring countervailing evidence -- but even if you want to call it bias of some sort, it's a pretty useful one. All supporting evidence only bolsters a claim.

And likewise, if something a player did during this amorphous season seems totally out of character for him — particularly if it's a blip shared by others — I wouldn't be so quick to ascribe it to him permanently. The bloated strikeout rates for Christian Yelich, Devers, Gary Sanchez, Nick Castellanos and Josh Bell come to mind. It stands to reason it would be more widespread in a season with the sort of unique challenges that might have contributed to it (lack of video access, hurried buildup, etc.).

Just let ADP guide you

Might it be that there's no actual secret? Could it be that the key to success coming off a 60-game season is what it always is in Fantasy Baseball?

Has it ever been about who crunches data the hardest or has the most predictive prognostication powers? Those factors are most often the focus of Fantasy Baseball analysis, thus making it seem so, and I wouldn't deny they can help around the margins.

But they also miss the forest for the trees, overcomplicating a game that's simple enough to be accessible to those willing to give it the requisite attention. To succeed in Fantasy Baseball, you need only a loose idea what each player could do, a general concept of where each is most likely to go and then, armed with that knowledge, a willingness to capitalize on whatever inefficiencies avail themselves.

To put it another way, it's about understanding the marketplace and then seizing on value wherever it presents itself. You don't actually have to know much because you can just rely on conventional wisdom and your leaguemates to occasionally drift from said wisdom.

Part of my approach to Fantasy Baseball analysis is to remind myself I can't actually predict the future. Nobody can, because if he could, he'd use that power for something more lucrative than this silly pastime. My calls aren't going to be more right or wrong than those of anyone else who approaches this exercise rationally.

So the bottom line is I shouldn't take my individual assessments of players so seriously, regardless of the season's length. I shouldn't be so convinced of a bust pick that I can't embrace an obvious discount. I shouldn't be so convinced of a breakout pick that I jump ahead three rounds to grab him. I have to stay open-minded about the full range of a player's outcomes and not just the singular outcome I've prescribed for him. An example: I wasn't supremely confident in Lance Lynn coming off an age-32 breakout last year but ended up drafting him in a lot of leagues anyway just because nobody else was either. It turned out OK.

So we can tie ourselves up in knots trying to pinpoint the most optimal way to attack the weirdest season on record and still wind up with our usual share of misses, or we can just draft according to the same process that's always brought us success.

No need to reinvent the wheel here.