Back in July 2014, I wrote for Newsweek about Statcast, the latest conquest spearheaded by MLB and its collection of technological whizzes, known appropriately as MLB Advanced Media. The big conclusion back then: "Big data is about to change how baseball is managed, analyzed and consumed." Nearly three years later, it's worth asking: has big data -- has Statcast -- changed how baseball is managed, analyzed, and consumed?
Using the most literal definition of the word "changed," the answer is yes. Teams are incorporating the Statcast numbers into fancy simulations to evaluate prospective acquisitions; fans and writers are dissecting elements of the same data to judge those acquisitions; broadcasters are talking their way through replays and overlays sprinkled with Statcast graphics; and so on. Heck, the next Dickson Baseball Dictionary will almost certainly feature Statcast-inspired terms like these:
"The speed of a baseball after it is hit by a batter."
"The vertical angle at which the ball leaves a player's bat after being struck."
"The likelihood that a batted ball to the outfield will be caught."
"The rate of spin on a baseball after it is released."
"Assigned to batted-ball events whose comparable hit types (in terms of exit velocity and launch angle) have led to a minimum .500 batting average and 1.500 slugging percentage."
Easy peasy, end of the story. Except, while Statcast has delivered us new prisms through which to view baseball, it has also attracted critics. Here, then, is the tougher question to answer: why isn't everyone on board with The New Big Thing? Much of it comes down to accuracy and availability.
"It is incredibly powerful, when it works and is released to the public," FiveThirtyEight baseball columnist Rob Arthur said to CBS Sports. "The trouble is that it doesn't always work, and often isn't released to the public."
Arthur has firsthand knowledge of Statcast's shortcomings. Last August, he uncovered Statcast's difficulty with tracking batted balls that had "atypical trajectories," a group comprising more than 10 percent of all balls put into play. Perhaps another equally concerning controversy surfaced this April when a league-wide velocity bump was revealed to be a byproduct of MLB changing its measurement preferences from PITCHf/x to Statcast without notice. There's also the matter of ballpark bias as it relates to exit-velocity recordings. Still, MLBAM remains optimistic.
"What gets a little bit lost is while there are certainly some variations and park effects, it seems like it's actually less than it was previously with PITCHf/x," said MLB.com's Mike Petriello. "I've seen some of the articles you talk about, I do think some of the criticisms were fair, we certainly had a little bit of a bumpy transition as we were going from the old system to the new this year. But overall, I think things have really settled down. We know there are certain things the system does very well and certain things the system doesn't do as well. The goal is to be continuously improving on that."
Petriello joined MLB.com after years at his own Los Angeles Dodgers site, as well as FanGraphs. He's familiar with the baseball analytics community, and presumably with its desire for a scientific approach to all things. Trust is earned through independent replication and verification, nothing else -- even God has to provide p-values. Predictably, neither Statcast's inaccuracies nor its near-black-box ensconcing have been embraced. Though the unwieldiness of the data (a single game produces more raw data than the Library of Congress adds to its web archive data every month) would make it difficult for most public-facing analysts to manage, everyone wants the opportunity to look for themselves.
"I do understand why people would want the entire database opened up immediately, and the truth is a lot of this stuff is so new that we need to vet what it means," Petriello said. "I think over time, we're absolutely going to be opening up more and more."
Precedent doesn't help MLBAM with its calls for patience, given PITCHf/x's widespread availability. Tech-savvy outsiders have examined, compiled, and corrected that data to good effect and findings -- be it catcher framing, the left-handed strike zone, or the injury nexus. Teams may not necessarily approve of giving the public the keys to the Statcast kingdom, but they've benefited from allowing plebeians to play with PITCHf/x -- the discoverers of the aforementioned findings were each hired by front offices.
To MLBAM's credit, they have steadily released new features and measures. Prior to joining MLBAM, Daren Willman was the hobbyist behind the Baseball Savant website. These days, Savant houses countless Statcast data, including real-time exit velocity, launch angle, and spin rate -- an inconceivable thought not long ago. That consistent roll-out has encouraged some, like Arthur, that MLBAM will deliver on its promise to someday make everything accessible.
"I'm more hopeful than I was a year or two ago. Before they released [exit velocity and launch angle], I thought it was possible they'd keep all of the data to themselves, and it looks like that's not going to happen. That they keep releasing bits here and there -- even though it's less than I'd want -- makes me believe that eventually, we are going to be able to see more of the full data set, if not perhaps the whole thing."
There are other debates to be had about Statcast. A popular one is whether the same supposedly wise analysts who used to preach about small-sample size are now falling victim to it in a rush to look smart by using the newest (and most untested) metrics. Another is whether Statcast's purpose is to be educational or entertainment -- and whether it can be both. But some have accepted Statcast for what it is at face value: a well-branded, technological marvel that can improve the baseball-consuming experience while remaining a work-in-progress.
Take Casey Boguslaw. Growing up, he'd sort his baseball cards by ERA, batting average, whatever fit the day's mood. His interest in stats led him to earn a mathematics degree at the University of Illinois, and nowadays he works as a financial analyst for an auto salvage company. Boguslaw's passion for baseball has led him to write about it at RO Baseball, where he tracks Barrel FIP -- a Shelley-esque stitching connecting an old sabermetric concept with a new one. The inspiration behind the metric predates even Shelley: the desire for greater understanding.
"Kyle Hendricks was somebody that really, I think, boggled the minds of a lot of analytics people because he was doing great things on the field but none of the numbers were really showing that," Boguslaw said.
"What Statcast really showed you ... when I watched Kyle Hendricks pitch, sure, he was not striking out as many batters as, let's say, Max Scherzer, but when batters were hitting the ball, it wasn't hit very hard. He was getting weak contact, he was getting pop-ups. Now, we can really pull to the tenth of the degree and the hundredth of the angle of what each batted ball he is doing, and we can compare them to players like Scherzer."
Boguslaw's observation would've remained just that in the past. Nowadays, it's testable, tangible, and real. More baseball skills uncaptured by traditional stats should be in the near future, too. "It's like the bottom of the second inning really, so far as all this goes," Petriello said. "We just have such a huge backlog of ideas and it's really on us to prioritize what we want to get out there next."
Figuring out what comes next and how to approach it is as much a challenge as anything for the MLBAM team. Petriello ticked off a number of items on their to-do list: infield defense, the non-framing parts of catcher defense, baserunning.
Relatively, satisfying everyone's desire for more might be the easy part.