For me, writing a blog is something I rarely do all in one sitting. I usually start brainstorming ideas and writing an outline of what I want to cover on Tuesday, start fleshing out the post on Wednesday and Thursday, and finish it up on Friday and over the weekend. This week I knew I wouldn't have much time to write on Tuesday and Wednesday, so I probably should have started writing on Monday night, but I was a little busy watching the Yankees game.

But my night was not entirely without the mention of technology - throughout the game, there were mentions of statistics and other interesting information from MLB Statcast AI powered by AWS. As a baseball fan who watches game broadcasts sometimes, I've heard of MLB Statcast and seen/heard Statcast data used by broadcasters to discuss a player's capabilities and performance, as well as to predict what may happen with a given batter or play. I know that Statcast has changed the way people talk about game statistics, but I never really thought about what the technology behind it is. In fact, I always assumed that "powered by AWS" meant that AWS was a sponsor of the technology and used it as advertising, but didn't think the technology had anything to do with AWS.

Boy, was I wrong about that. After some encouragement from my brother, I decided to do a little quick research into the technology behind Statcast. It turns out, when they say "powered by AWS," they really mean powered by AWS - Statcast uses Amazon Sagemaker, Amazon EC2, Amazon S3, AWS Lambda, Amazon Cloudfront, and Amazon ElastiCache. In fact, AWS lists Major League Baseball as one of their featured case studies for Machine Learning/Artificial Intelligence, and if you go to https://aws.amazon.com/statcastai/, you can find a lot of information about how Statcast uses AWS products to discover new insights into the game. Powered by AWS isn't just a chance for AWS to get their name in front of baseball fans - it's Major League Baseball using AWS's machine learning products to create a new experience for fans, which is pretty interesting to me.

I'm not the biggest of machine learning enthusiasts, but I do find it somewhat interesting, so when I discovered this connection between machine learning and my favorite sport, I knew I had to find out a little more. One of the best resources I found about this was a video included in the MLB Advanced Media case study on Amazon, in which Joe Inzerillo, the CTO of (what was then) MLBAM, talks about the technology.

One of the first questions I had about the technology was how they get the information to the AWS services - and that question was answered in the video. The stadiums each have two stereoscopic machine vision devices and a Doppler radar installed, and data is collected from these devices at the ballpark and sent to the AWS cloud for processing. From what I understand, MLB had considered installing equipment to process the data at each stadium, but ultimately that would have been costly and the equipment would have sat unused from November (October in some stadiums) through April, so they decided instead to use a cloud solution.

Once the data is sent to AWS, points of data become players and plays through computations done in EC2 (Elastic Cloud Compute). The raw data is stored in Amazon S3 (Simple Storage Service) buckets, and information is temporarily stored in AWS ElastiCache to allow quick retrieval for analysis. AWS Lambda is used to trigger event-based computing (because a lot of the computations being made depend on what data is coming in) to analyze the data. The work being done in AWS Lambda is supplemented by some data analysis done in AWS Kinesis. After the data has been collected and analyzed, it is distributed to the teams, the broadcasters, and the fans using via Cloudfront.

Some of the information broadcasters get from Statcast AI involves things that haven't even happened yet. Want to know the likelihood of the batter on first successfully stealing second? What about how a batter matches up against a pitcher he has never seen before? Statcast uses Amazon SageMaker to predict this and many more baseball probabilities. Machine Learning opens up a new set of statistics to be discussed and analyzed, and MLB is using SageMaker to train models to predict new factors in the game.

This is just a basic overview of the technology that I was able to get through a little bit of research (it's hard to find time to do more detailed research during the playoffs). There's a lot more going on than I've described here, and if, like me, you're a baseball fan who loves technology, I think it's definitely worth doing some additional research into MLB Statcast AI and how it works. I've listed some resources below, but for a further deep dive into the technology, it may be worth doing your own google search to find out what information is available. If you've used AWS's machine learning products before, you probably know a lot more than I do about how this all works, and can probably make some interesting inferences about how MLB is using the technology and what they can do with it next.

One of the things I love about technology is how it can have an impact on so many things, big and little. There are some things about the game of baseball that (in my opinion, at least) should never change, but the use of technology and the addition of Statcast to how the came is analyzed and broadcast has enhanced my experience as a fan (for example, while I think it is overused, exit velocity is an interesting statistic, and I'm glad it's something we can measure now - and that's just one of the many new pieces of information I hear during every game). I'm hoping that over the coming weeks, as the Yankees continue to work their way towards their 28th World Series Championship, I can continue to learn more about Statcast and how Major League Baseball leverages technology to continue to change and enhance the game experience.

Major Resources Used In This Post