The surprising places MLB teams get their information from in the post Moneyball era

When Brian Cartwright was a teenager, he spent his gap year biking around Johnstown, Pennsylvania. He would go road by road, verifying the lay of the land. Once he finished surveying the city’s six-plus square miles, he produced a street map that would become of particular interest to the local utility companies.

“My sales pitch was to stress the accuracy,” Cartwright wrote in an email. “I was the guy known around town for having the answers.”

Unlike many teenage predilections, Cartwright’s commitment to accuracy and to having the answers endured as he aged. He went off to the University of Pittsburgh, where he majored in geography and minored in math and computer science. Those degrees have since led him to a 30-plus-year stint with a Virginia-based photogrammetry company that is part of Quantum Spatial, the largest geospatial firm in North America. His responsibilities include managing “a group specializing in preparing aeronautical obstruction surveys” for the Federal Aviation Administration.

“We’re way past the ‘Moneyball’ era.”Jeff Long

Geography is not Cartwright’s only lifelong fascination. Rather, his obsession with baseball statistics dates back to his Little League days, when he would borrow the scorer’s notebook and calculate metrics -- splits and situational numbers, mostly. He would later become the head scorer and statistician for a summer league, all the while learning more about sabermetrics by reading Bill James, Pete Palmer, Baseball Prospectus and the like. Eventually, he took to writing about baseball online, and even developed a projection system (“Oliver,” named after the chimpanzee) that has been featured on The Hardball Times and FanGraphs.

Cartwright’s preoccupation with geography and baseball intersects in an obvious way, since both are about discovery -- of land, of sport -- but also in a subtle way. A way most would not assume: Each involves compiling and shipping data to larger, more powerful entities. Just as he spends his days supplying the FAA with information through geodatabases, he focuses at night on accumulating, cleaning, organizing and routing data to Major League Baseball teams.

That’s because Cartwright is one of baseball’s many unexpected data sources.


The hidden game

gettyimages-482821668.jpg
Billy Beane has been the face of the modern GM since the publication of Moneyball Getty Images

In recent years, major-league teams have done a poor job hiding the fact they are incorporating more data into their decision-making processes -- some of it so complex that it sails over observers’ heads like a well-struck line drive, and some of it (like MLB’s agreement with WHOOP) reported on as news. It’s all but common knowledge that the modern front office loves data, and loves to chase the competitive advantages that can arise from properly yielding and interpreting that data. Yet, while most everyone has heard about the industry’s budding obsession, there is a secretive aspect to it: Just where, precisely, is the information coming from?

To an extent, the answer is from league partners and sources -- like MLB Advanced Media, the cash cow responsible for MLB.tv and Statcast. Each morning, teams receive data bundles from the league that contains play-by-play files from the previous night’s major- and minor-league games, as well as vital player contract and health details. Teams supplement that internal information through external means, leaning on individuals -- including Cartwright -- and companies, both traditional and new-school, to stock their larder.

“For us, in a lot of the things we do now, external data is extremely important,” Minnesota Twins director of baseball research Jack Goin told CBS Sports. “As you get Trackman and PITCHf/x and all those things that are not just in the majors but at your minor-league affiliates, it becomes extremely important -- whether it’s player development plans or evaluating players, things like that. So, it’s become incredible for us, really.”

The tools Goin references, Trackman and PITCHf/x, are just two examples of baseball’s increased emphasis on data, albeit two of the most popular ones. Teams use Trackman’s radar-based system to learn more granular details about pitchers, like their spin rate and the depths of their release points. PITCHf/x, meanwhile, is familiar to anyone who has watched or read about a baseball game over the past decade -- it’s the pitch-tracking system that informs those fancy strike-zone displays used on broadcasts and in blog posts. Both are upgrades from than the less-than-halcyon days when STATS Inc. provided limited pitch-related information.

Those who pay attention to such things are likely familiar with other data sources. Numerous teams buy projections (and proprietary metrics) from stats-rich websites like the aforementioned Baseball Prospectus. Some teams rely on a subset of Prospectus’ stats team for their pitch classifications. There are services out there like TruMedia -- an analytics platform used by teams and bloggers alike.

Multiple front-office types interviewed for this piece pointed to Baseball Info Solutions and Inside Edge as other examples of popular, necessary data vendors -- both companies employ a legion of video scouts who collectively chart an exhaustive amount of details from every big-league game. Asking the teams to build staffs capable of doing the same work would be impractical. As such, teams are content to pay for the services, especially since they can purchase data for less than it would cost to hire even an underpaid intern.

Practicality is another one of the main reasons teams pursue external data. The explanation is straightforward: Time and manpower are finite resources; ergo, any shortcut that can help teams maximize their supply of one (or both) of those is invaluable. Teams are making more money and adding more baseball operations employees by the year, yet the velocity of incoming data exceeds the hiring speed -- the individuals already on board, meanwhile, are needed to work on projects more important than filling their teams’ rudimentary data needs.

“There’s only so many people these teams have on staff, and they keep growing and growing,” said Graham Goldbeck, who, before becoming the manager of data analytics and operations at SMT (née Sportvision), the provider of PITCHf/x, interned for the data-savvy and cash-strapped Oakland Athletics and Tampa Bay Rays. “But even then, there’s probably a lot more they’re doing aside from this other external data. There are other companies where they collect data and just put it in an easy-to-understand form and deliver to teams. It’s easier to pay that person a one-time fee, whether having to have someone on staff go through and figure out how to do all that when they could be doing other stuff.”

Asking another company to clean and organize data might seem like a waste of resources, but teams are leaps and bounds ahead of where they were five, seven or even 10 years ago, when many lacked the computing power, technological savvy and/or interest in processing the internal data. 

“Before, we needed that third party to package, to calculate, to distribute to us in reasonable amounts of time because most teams weren’t employing people to do those items -- there were probably just a handful of teams that were doing that,” Goin said. “We needed to get this information distributed to us quickly -- precalculated -- for our internal databases.

“That would be the biggest change, really. We used to pay companies to regurgitate MLBAM data -- data that we already owned. But we needed somebody to do something with it, then send it to us. Now, probably all 30 teams, or close to all 30 teams, have at least one or multiple people on staff to do that portion of it.”

Baseball’s changing landscape has altered the role and products of third-party data sources in multiple senses. Consider Goin’s Twins, who underwent a front-office makeover during the offseason. Teams in similar situations have been known to lean more on external data sources while building up their own staffs and systems. Creating a modern quantitative infrastructure is a time-consuming process, after all, and requires hiring the right individuals and crafting the right vision.

Part of that right vision includes knowing what information has legitimate meaning.

“The kinds of data we report and have been reporting for years is, I don’t want to say useless, that’s too strong of a word,” said Vince Gennaro, who presides over SABR while serving as a team consultant and MLB Network regular. “But outcome statistics like OPS, strikeout rates, walk rates, that data has very limited informational value compared to what I would call the process data, which is things like the velocity, spin rate, spin axis of the pitch, or the exit velocity of a batted ball.

“I think the reason they would buy information is because the information available in the public domain is simply inadequate to be predictive. And they’re looking for competitive advantage -- a competitive advantage versus other people who might be relying upon that data.”

All explanations are derivatives of the basic truth: Teams buy external data because they don’t have a choice.


The specialists 

puk60716.jpg
Top prospect Andrew Benintendi had outstanding statistics at Arkansas.  USATSI

It sounds like a bad joke, but it isn’t: Cartwright’s first MLB client slid into his DMs.

First, some background. Cartwright supplies teams with Oliver projections and statistics from various leagues, including the collegiate and foreign ranks. (Cartwright also provides the latter to Baseball-Reference.com.) His original team client, surprisingly enough, was Tigres del Licey, of the Dominican Winter League. To this day, he supplies Licey with reports throughout the year on players they have interest in, as well as the in-season quintessential advance scouting package (spray charts, splits, projections, etc.) focused on their Caribbean League opponents.

Cartwright’s big-league break came after he tweeted he was developing code to pull Korea Baseball Organization statistics. He received that fateful direct message minutes later, which read, “Hi -- we met at a conference. I work for a team and we’d be interested in acquiring your data.”

Cartwright and the team struck a deal. He received enough money from the arrangement to complete his coding, while the team obtained the statistics and a copy of the code -- he was able to avoid exclusivity, and now provides information to 15 teams, with others showing interest.

Predictably, the league’s familiarity with Cartwright has led teams to offer him employment.

“If I was 30 years younger, I’m certain I’d be working for a team, but it was much harder for me with a good salary, a mortgage and growing kids. I did one special project for a team that lasted several months and required me to to sign a NDA,” he wrote. “That team did offer to hire me as an analyst, but their budget would only allow about half of what I was already making.”

Like Cartwright, Jeff Sackmann and Kent Bonham are hobbyists-turned-vendors who have passed on exclusivity. The aptly named College Splits came about when Bonham, an Omaha native and avid college baseball fan, asked Sackmann if it would be possible to create an NCAA version of his now-defunct Minor League Splits website. It was. Sackmann, who is responsible for the technical wizardry, waved his handy wand and presto -- College Splits was born. Teams began inquiring about licensing the data the day the site was announced on The Hardball Times. Almost a decade has passed since, and Sackmann and Bonham now provide information to 25 teams.

 “[It’s] almost like the stock market: Simply having access to real-time quotes doesn’t in itself make you competitive, but not having that access would be a real hindrance.”Jeff Sackmann

Those who pull statistics from other leagues, foreign or collegiate, often employ a technique referred to as “scraping.” There are tutorials all around the internet on how to scrape data -- for example, here’s one authored by Houston Astros employee Mike Fast. The gist is this: With a little technical training and enough time and patience, just about anyone can learn to pull data from websites.

So why do teams pay Sackmann and Bonham? Again, it goes back to practicality -- it’s easier to pay the existing specialists than to turn generalist staffers into specialists, especially when the specialists are willing to work with the teams on technical matters.

“It’s a huge amount of work,” Sackmann wrote in an email to CBS Sports. “Some Saturdays and Sundays during the college season, there are over 500 games among NCAA teams alone. Parts of the process can be fully automated, but providing a clean data feed for all of those teams in a timely manner is a daunting task. We’ve spent a decade developing our software to make all this possible, which means teams don’t have to worry about it, and they can task their in-house developers with more important, higher-level concerns.”

What do teams do with the data they receive from Sackmann and Bonham? At this point, anyone who has read Moneyball has visions dancing through their heads of Paul DePodesta frothing at Jeremy Brown’s on-base percentage. But in addition to raw data, Sackmann and Bonham offer more detailed packages. Think along the lines of proprietary defensive and base-running metrics, for starters. Their vision of how teams ought to employ their data is similarly nuanced.

“At the amateur level, scouting is a big part of the equation, and as a fan it’s easy to make the mistake of seeing scouting and analytics as separate spheres. Many clubs use college data as much to facilitate the work their scouts do as to create the next uber-stat to maximize their return on draft picks,” Sackmann wrote. “With nearly every MLB team getting up-to-date amateur data, it’s almost like the stock market: Simply having access to real-time quotes doesn’t in itself make you competitive, but not having that access would be a real hindrance.”

Teams appear to feel the same way about avoiding strategic jeopardy. Every data provider interviewed for this piece reported increased demand for their product over the past several years. That uptick can be credited to front offices’ desire to cover tail or to look statistically hip in the Big Data era -- and perhaps there is some of that going on. But Cartwright and Sackmann and Bonham had another theory: The beefier client lists are due to the league’s cross-pollination.

“We often pick up new clients, or expand a relationship with an existing client, when somebody switches jobs,” Sackmann wrote. “Sometimes those are the high-profile moves you hear about in the press, when some big-name analytics-friendly assistant general manager takes his new employer in a more stats-driven direction. More often it’s something under the radar.

“It’s basically the story of analytics in general -- it takes time for every org to get on board, but once they do, they never fully go back.”


The future

usatsi9430429.jpg
Theo Epstein and the Cubs are one team known to have invested in Kinatrax technology.  USATSI

In fact, teams are marching forward, upward and onward into uncharted territory -- territory where the information supplied veers from the preconceptions of baseball data. It reasons that in order for third-party providers to remain necessary, they have to stay ahead of teams, offering products that the organizations cannot easily produce themselves. This includes technology that could in turn birth new data.

“In many ways, the league has leveled out in terms of skill sets and approach to analyzing data,” said Jeff Long, who covers the technology and gear beat for Baseball Prospectus. “We’re way past the Moneyball era, and now the differences between teams are much more nuanced when it comes to leveraging the data that’s around the game.

“As a result, one of the ways that teams can get an advantage over others is by finding new data sources, and applying their analytical skills to these new inputs. Each new data source could provide insight into performance, injury or myriad other areas of interest.”

Perhaps more so than the fruits of neurological analysis or virtual-reality training, providing insight into injuries could be a game-changer -- Billy Beane himself has said that “staying healthy is the new inefficiency” -- particularly as it pertains to pitchers, since clubs are losing millions every season due to arm trouble. Gaining understanding of the secrets of pitching mechanics, or anything that can lead to better conservation and/or forecasting efforts, would give a team a huge leg up in negotiations and acquisitions. Teams are determined to do a better job with their pitchers but, as Jeff Passan chronicled in The Arm, the fix is unclear.

It has been said that when the student is ready, the teacher will appear; naturally, companies have surfaced in recent years, each hoping they can provide teams with the necessary instruments -- and data -- to solve the body’s riddles. Arguably the best known among those companies is Motus, which counts 27 teams on its client list. Motus’ biomechanical chops have come in handy in the physical therapy and workplace safety fields, and seem to be catching on in baseball. One reason: the advent of new technology.

In the past, Motus would set up a mobile biomechanics lab in a team’s facility, where they would analyze the team’s pitchers for potential red flags. Everything has been simplified since Motus rolled out the closest thing it has to a killer app -- the motusTHROW.

The motusTHROW is essentially a compression sleeve embedded with a 3D sensor. The company’s website touts the sleeve as the “first tool aimed specifically at combating UCL tears that lead to Tommy John Surgery.”

“With every throw, it records biomechanical data, such as the force on the elbow joint,” wrote Joe Nolan, Motus’ co-founder and CEO. “As a pitcher wears the sleeve throughout training and their season, they gain invaluable workload data to guide their training volumes to optimize their arm health and strength.”

KinaTrax_NY_Mets_Pitch_Side_Analytics.avi by KinaTrax on Vimeo

Whether the motusTHROW is preventing UCL tears is unclear at this point, but the bloated client list suggests teams think it’s worth buying all the same. Other companies are hoping to achieve similar popularity, including Kinatrax, a startup that sells a markerless motion capture system reliant upon high-speed imaging sensors that are mounted throughout a stadium to capture video at 300 frames per second. In theory, such equipment will help teams develop a better grasp on biomechanical aspects that could improve training and injury prevention.

Finding a signal in the resulting noise is going to be difficult in the short term. Founder Steven Cadavid, who first encountered the technology while attempting to improve early detection of autism in children by tracking their movements, noted that in 2016 “we processed the kinematic data for over 60,000 pitches performed by both home and visiting team pitchers.”

Here’s a promising sign for Kinatrax: Two of their clients are the trailblazing Rays and the World Series champion Chicago Cubs. Both are suspected to have dedicated specialists handling the data.

While there are obvious advantages to be had in beating everyone else into the pool, Long warned about the potential risks associated with being first -- and not just the ethical ones.

“The rub here though is that teams need to figure out how to communicate the information and insights they’re capturing more effectively to players and coaches,” Long said. “You can have the most analytically savvy front office in baseball, but if you can’t use that information to help your players get better, you’re severely limiting the upside of your efforts. It’s one thing to have a fleet of Motus sleeves or Rapsodos capturing data left and right. Figuring out how to use that data to add value and being able to get coaches and players on board with changes, tweaks, etc., is another.”

In other words, the technology will not yield a high return on investment by itself. Instead, the benefits will not be enjoyed until years down the road, when teams have better systems in place for processing and communicating the new information. This could include adding more staff.

“If you’re blowing it out you need people managing the devices where they’re being used, people QAing and analyzing the data, people working with coaches and players to understand the data/insights, and people identifying new devices to use and/or how existing devices fit into the organization’s player development process,” Long said. “Early on a small, scrappy team can manage this, but as it scales it can become a cumbersome effort. The potential payoff is massive so, in my opinion, teams would be smart to look into these opportunities sooner rather than later.”


Hitting oil

silverman-friedman.jpg
Matthew Silverman and Andrew Friedman each chase inefficiencies using new-school data.  USATSI

Knowing where teams get their information from beckons a follow-up: where are the competitive advantages to be found if everyone is gravitating toward the same data and the same technology?

“That goes to the core of what every baseball researcher and analytics team’s job is,” Goin said, “and that’s to extract unique pieces of information from data sources and then provide that to the decision makers in an easy-to-digest, quick way that they can look at these things, look at these pieces of info that we pull out and hopefully make the best decisions based off of that portion of the evaluation process.”

Besides, the idea that everyone is on equal footing is misplaced, per some in the know. For instance, Gennaro believes the information gap between teams is larger now than during the Moneyball days -- due to the amount of information available, as well as the shift from outcome-based data to process-based data. That change has put less-sophisticated teams further behind the curve due to how they think and how they’re able to implement their thoughts.

“When I was at PepsiCo, I was running the Doritos brand. I’ll never forget what our CEO said to me. He said, ‘An ad agency can never deliver you a better ad than you demand.’ In other words, if you ask the wrong question, or pose the question in a way that lowers the bar, you’re not going to get anything better than that,” Gennaro said. “For example -- and this is probably the most basic example -- but figuratively I’ll say it this way: if you say who’s a better player, Player X or Player Y, that’s not the right question -- it’s who is the more effective player in the context of your team.

“Right now, we have teams out there, who, when they evaluate a player, they’re taking their 2017 schedule, they are prototyping the opposing pitcher array -- perhaps, if they’re really sophisticated, even assuming what the seventh, eighth, and ninth innings look like against those teams -- and they’re simulating a batter’s performance, a prospective acquisition, his performance against that pitching opposition, in those ballparks. Because they’re not looking at his stats, they’re looking at his exit velocity and his launch angle.

“If you hit the ball 86 mph to let’s call it straight-away right field at Yankee Stadium at a 32-degree launch angle, depending on the wind, that probably drops into the first couple of rows. If you hit that same velocity and launch angle at AT&T Park, Hunter Pence is taking two steps in to field it. So, all of those things are being incorporated into the analytics of the most sophisticated teams.“

In the same vein, there are advantages to be had other than asking the right questions and applying the data in a smart, informative way. For instance, getting the data, or the technology required to get the data, before everyone else gets the data is viewed as a key -- and is a reason why teams will seek exclusivity with certain individuals or companies. Occasionally, data providers serving the entire league will hear from teams that they’re offering too much information, thereby undercutting savvy teams’ potential advantage.

“We generally provide teams with raw data -- the x, y position of players and the ball,” Goldbeck said. “When you start delving further with, well, we can provide teams with times to first base, or catcher pop times or something, you take the x, y, z data and make something more of it … some teams want that, some teams will tell us, ‘Hey, we don’t want you to do that because the more stats and analytics you provide on top of the data, that takes away from our competitive advantage. We feel our strength is in our people who can go through this data and make more sense of it and find more hidden value in it.’ I think that’s where they feel the advantage comes from.”

Each team’s makeup comes into play, too. Conviction and an appetite for risk are required for a team to chase new findings -- to literally put its money where its research and development staff says to put its money. Resolve is a necessity to keep plugging away, long after the latest competitive advantage has been wiped out by the rest of the league. The lone constant for front offices, especially those on the bleeding edge, is the hunt for the next big thing, which encompasses seeking out any and all potential advantages.

In a way, third-party data has created a collective action problem. Every team is behaving rationally by gobbling up as much information as possible. But if everyone is acting rationally, then, as the theory goes, everyone is getting worse results for their troubles.

Even so, the teams who resist partaking in the hunt are falling behind. Any front office doing the bare minimum, possibly to save face, is missing the point. Pursuing external data is no longer a choice, it is a necessity -- not to get ahead, but to keep up with most of the league.

The same forces that compel teams to push forward ensures that there’s a place for third-party data providers. Whether it’s old-school methods like game-charting and website scraping, or new-school technology ripped from a science fiction movie, teams will continue to seek out their products. All with the hope that they provide a street map to every team’s intended destination: the World Series.

Show Comments Hide Comments