Baseball Prospectus director of technology Harry Pavlidis took a risk when he hired Jonathan Judge.
Pavlidis knew that, as Alan Schwarz wrote in The Numbers Game, "no corner of American culture is more precisely counted, more passionately quantified, than performances of baseball players." With a few clicks here and there, you can find out that Noah Syndergaard's fastball revolves more than 2,100 times per minute on its way to the plate, that Nelson Cruz had the game's highest average exit velocity among qualified hitters in 2016 and myriad other tidbits that seem ripped from a video game or science fiction novel. The rising ocean of data has empowered an increasingly important actor in baseball's culture: the analytical hobbyist.
That empowerment comes with added scrutiny -- on the measurements, but also on the people and publications behind them. With Baseball Prospectus, Pavlidis knew all about the backlash that accompanies quantitative imperfection. He also knew the site's catching metrics needed to be reworked, and that it would take a learned mind -- someone who could tackle complex statistical modeling problems -- to complete the job.
Pavlidis had a hunch that Judge "got it" based on the latter's writing and their interaction at a site-sponsored ballpark event. Soon thereafter, the two talked over drinks. Pavlidis' intuition was validated. Judge was a fit for the position -- better yet, he was a willing fit. "I spoke to a lot of people," Pavlidis said, "he was the only one brave enough to take it on."
Judge was more than brave enough -- he proved capable at the ensuing wonk work, allowing BP to unveil their revamped catching metrics in February 2015. Months later, Judge topped this accomplishment by revealing an unusual metric, Deserved Run Average (DRA), that could change life for the fan and hobbyist alike by fundamentally altering how baseball is analyzed.
Specifically, Judge has introduced to the community a framework that allows us to view baseball as more complicated than the usual one-on-one affair between a pitcher and a batter. The sport has always been more complex than that, even when evaluative tools suggested otherwise.
Judge has redefined the boundaries of what analysts can quantify by demonstrating how a metric can incorporate seemingly countless variables, ranging from the catcher and umpire to the weather and stadium. A knowledge-hungry optimist can look at Judge's work and see a future in which other, previously opaque areas are explored using similar techniques.
Yet the most fascinating part of Judge's story might be the man behind the metric.
Jonathan Judge: The stathead lawyer
During the day, Judge is a 41-year-old litigation partner at Schiff Hardin, a prominent law firm headquartered in Chicago. The first sentence of his biography notes he "believes that analytics are an important part of cutting-edge legal advice." (One example Judge offers of his belief: evaluating the statistical equivalence and fairness of government fines imposed on companies, insurers and the like using historical data.) Mosey on down the page and you can read his papers on various subjects -- predicting consumer product safety commissions; an examination of multistate market conduct penalties; and similar gossip like that.
But when the sun drops, Judge focuses on baseball. Though that might sound like a normal cycle for most fans -- especially those who dabble in statistical research and writing -- there is a difference: Judge is possibly, kinda sorta, almost certainly a genius.
Big claims require big evidence. Consider, then, that Judge has never received certification in a mathematical field. His degrees are in law and music -- he's a trained pianist who lists Martha Argerich and Murray Perihia among his favorite performers.
How did a lawyer-cum-pianist become a significant player in baseball analytics? Mostly through self-teaching.
"A lot of Google, Wikipedia, Stack Exchange, a few textbooks and then papers I continue to find interesting," Judge said of his learning materials. "I work on it almost every night in some form or another."
Judge, who rediscovered his childhood baseball fandom thanks to the 2008 Milwaukee Brewers, tiptoed into statistical principles a few years back, when he was looking for a "non-BS" way to solve problems. From there, the relationship between the game and numbers became evident.
"Baseball and statistics kept reinforcing each other," Judge said. "I wanted to know more about how baseball worked, which required more statistics knowledge, which in turn led to being better at statistics, and then understanding more about baseball."
Perhaps that inherent reinforcement between hobbies explains how Judge picked up on sophisticated methods as quickly as he did. Or perhaps there's something else in play.
One person who spoke with me about Judge described him as an autodidact (self-taught) -- seemingly more of a fact than an opinion. Others familiar with Judge and his work expressed similar sentiments.
"He's probably the fastest learner I've ever been around," Pavlidis said. "He freaks us out."
That Judge did all this while holding down his demanding job shouldn't be lost on anyone, either.
"He picked up a huge amount of statistical and programming knowledge in a very short amount of time -- while maintaining a full-time job as a lawyer," said Rob Arthur, a former colleague of Judge's and now a staff writer at FiveThirtyEight. "How he did it, I will never know."
Judge's education went beyond running a linear regression in Excel, or learning how to code in SQL or R. His proverbial fastball is the concept of mixed modeling -- "a statistical model containing both fixed effects and random effects," according to Wikipedia. In layman's terms, it's complicated.
Though Judge wasn't the first to introduce mixed modeling to baseball -- Max Marchi, now with the Cleveland Indians, used it to quantify framing -- he has become the most visible practitioner due to DRA -- a metric that, if only in methodology, could change everything.
The metric that could change baseball analytics
"I think it's the most sophisticated and probably the most accurate measure of pitcher quality that's publicly available right now," Arthur said. "The mixed models that make up the core of DRA allow you to adjust for a lot of the factors that we've known to affect pitching, but haven't been able to measure or integrate into our pitching metrics. So factors like the framing skill of the catcher, the quality of the opposition, home or away, the park, et cetera.
"That level of statistical rigor hasn't been in the mainstream of sabermetrics," Arthur said about DRA's mixed modeling application. "Until now."
You can find a full explanation of DRA (and expansive leaderboards) elsewhere. But the gist behind DRA is that pitchers are tougher to analyze than hitters due to the confluence of variables involved in every play. Even a called strike requires a passive batter, a well-positioned catcher and an accurate umpire. There's significant room for error, and the most commonly used pitching statistics -- ERA and FIP -- are flawed in large part due to their incorrect assumptions about the elements a pitcher controls.
DRA, which attempts to adjust for every possible factor, was borne from Pavlidis' desire to upgrade BP's one-stop pitching metric. Judge was tasked with channeling the same magic on pitchers that he had used on catchers. His vision for the new measure was relatively straightforward.
"I felt like existing statistics were not answering the real question people wanted to know," Judge said. "Too often, sports statistics just summarize the outcomes of plays on which the player was one participant, and give certain players all of the credit for what happened. What we really want to know is each player's most likely contribution to those plays.
"Mixed modeling allows us to do something that was otherwise very difficult: put all players into one structure, regardless of how many or how few plays they've been involved in, and put a value on all of them. It incorporates some natural skepticism about each player, and treats them as average until they prove they are sufficiently above or below average to justify being treated differently."
You might question why any baseball statistic needs to be so complicated -- or so rooted in esoteric mathematics. But it's the latest in a march away from calculation convenience and toward evaluative accuracy -- basically, everyone wants to be a general manager, hence the need for near-industry-caliber measurements.
"It's better than where the industry was five to 10 years ago, and it's up there among the better public metrics," said one front office member who works on the research side. "I think it's a little below where we are and at least some other teams."
But perhaps not all teams. Baseball Prospectus sells data streams to nearly a third of the league. These packages -- purchased by the usual hardcore quant suspects -- tend to include improved PITCHf/x classifications, player projections and DRA and its components -- an aspect that CEO Sean Neugebauer called a "big selling point."
"We don't use DRA, but I think that's because we have more data sources than are out there in the public," the front office member said. "We try to do the same exact thing with more sophisticated data.
"I think it's tremendous work that Judge has done in this realm. It gets a lot closer to what we're trying to accomplish than what has previously existed."
While other front-office members interviewed for this piece were split on DRA, the public-facing side seems to approve. The top sabermetric websites, Baseball Prospectus and FanGraphs, are often viewed as competing in the same market. (Full disclosure: I've worked for both.) The sites seldom share anything, including proprietary metrics. Yet DRA's allure is such that FanGraphs recently acquired a license to display the metric (among others) on its site, just months after expressing interest in such an arrangement.
"Jonathan Judge, Harry Pavlidis and the team of guys over at BP have been doing really interesting work for years now, and we've always had a lot of respect for them and what they've been doing," FanGraphs managing editor David Cameron said. "DRA is certainly interesting, in that it is attempting to solve the difficult puzzle of separating responsibility for run prevention between pitchers and fielders.
"That nut has been particularly hard to crack in baseball over the years, and we want to promote any quality work that attempts to help us understand how to evaluate pitchers and fielders better than we have in the past."
The desire to promote quality work also inspired the deal on BP's side.
"Our main mission is to further the understanding of the game of baseball," Neugebauer said. "We think DRA is the best metric for evaluating pitcher performance, and the more we can spread the word about it the better. Keeping it exclusive or locking it up behind our paywall seems against the interests of baseball fans."
The context game
If Bill James' theory is correct that good metrics are those that support common sense 80 percent of the time and surprise you the rest of the time, then DRA passes the test. Take a look at the best individual pitchers and the list jibes with common sense -- you would expect to see a bunch of elite closers on a list like this, same as you would with ERA, strikeout rate or any other measure that inherently rewards quality over quantity. The surprise is that Clayton Kershaw -- the world's best starter, but a starter all the same -- snuck into the top five:
Here is the leaderboard for the best teams:
|New York Yankees||4.16||3.76|
|Los Angeles Dodgers||3.70||3.90|
The Indians' pitching staff -- banged-up rotation and all -- showed how dominant it could be during their run to the World Series, so their placement at the top isn't a surprise. Ditto for the Yankees' and Dodgers' rankings -- both teams, it should be noted, employed two of the top five pitchers for more than half the season.
The Cubs finishing that low, however? That's a little surprising, and illustrative of the key difference between DRA and ERA -- the former takes into account Chicago's all-time great defense and attempts to suss out how much credit the gloves should receive; the latter makes no attempt at such nuance.
But what about DRA's downsides? As you would suspect, there are concerns about the math, if the data is being correctly interpreted, if the data is being "overfit" and whether it's a descriptive or predictive measure -- is the intent to say what happened, or what should have happened? That numbers from previous games are being re-evaluated as a pitcher's season progresses seems to suggest the answer is a combination of the two. There are other nits to pick at, too.
"It's the only metric I know of that makes a principled, sophisticated effort to stitch together what we know about pitching without taking shortcuts," said Frank Firke, a data scientist at Oscar Insurance who moonlights as a baseball statistics hobbyist. "Whatever flaws it has are much less glaring than those contained in ERA, FIP or the other pitching metrics.
"That sophistication definitely cuts both ways, however. It's not easy for even a knowledgeable fan to understand which factors cause a pitcher's DRA to diverge from the more conventional statistics."
The proprietary aspect hurts, too, since it requires maintaining a high level of trust in the model and the modelers.
"I'm disappointed that the code and underlying data aren't easily available for other researchers to work with," Firke said. "Opening it up for others to play around with would likely help fix some of the issues about transparency and flexibility."
Other obvious drawbacks to DRA are the higher barrier to entry, both in analyzing and understanding the methodology and numbers involved, and its newness to the world -- its actual reliability remains to be determined. Still, DRA's unusual mixed-modeling approach does allow for one positive in that vein: A more protean nature -- an important quality, given the complexities of DRA require Judge and company to continue tweaking with every bit of new data that informs them on the measure's pros and cons.
"The crucial thing is that your metric be flexible. When you have a design that can't or won't change, it's destined to fail eventually, if only because baseball is constantly changing, and our knowledge of it is as well," Arthur said. "And in some ways, the framework of DRA is better equipped to handle that kind of shift, simply because of that complexity -- there are more knobs to be fiddled with, more parameters to be tuned."
It's not just that the methodology behind these metrics is changing, either -- the goal of them is changing, too.
To hear Judge explain it, old stats were too focused on player outcomes; DRA is more concerned about player likelihoods. That change could improve our understanding of players' so-called true talent levels, thereby making projections more accurate.
"DRA and our other recent statistics focus on trying to extract that piece of information," Judge said. "Not surprisingly, when you can isolate a player's actual individual contribution, your numbers get more reliable and more accurate in forecasting what comes next."
So, what does come next? In theory, DRA is built to last -- or, at least, to evolve with the times; meanwhile, the metric's mixed-modeling methodology is likely to serve as inspiration for new creations -- the plan, per those in the know, is to tackle catcher game-calling and other defensive aspects next. That's well and good, but what does the future hold for Judge, DRA's father and, for all intents and purposes, the progenitor of the mixed-modeling movement?
Everyone knows about Bill James, who ascended from bean-counting on graveyard shifts to self-publishing books to working for the Boston Red Sox, and most know that before Nate Silver became famous for predicting elections, he sharpened his forecasting claws on ballplayers.
But in recent years, the analytical hobbyist -- the best ones, anyway -- haven't been permitted time to build their legend. Rather, when someone like Dan Turkenkopf or Mike Fast (two other former BP researchers) unearths an important discovery, they and their work are ushered into a front office before we can get to know them.
The bad news for those hoping Judge becomes the new James or Silver -- a highly visible linchpin -- is that he already works for a team. Multiple sources have confirmed over the past year that he consults with the Tampa Bay Rays. The good news, however, is neither Judge nor his work have disappeared from the public sphere, which in turn has allowed Judge to make a greater, longer-lasting impact than those made by comparable contemporary analysts.
Better yet, Judge might not be going anywhere in the near future.
"I enjoy what I do now, and it's important to remember that front-office work involves a tremendous time commitment," Judge said about joining a team on a full-time basis. "But in the right situation and at the right time, I think that could be an interesting thing to do."
If or until that right situation and right time arise, Judge will continue to serve as a different kind of mixed model -- one who exists in and impacts multiple realms, in and outside of baseball.