Analyzing Baseball Data with R

Analyzing Baseball Data with R

4.27 (44 ratings by Goodreads)
By (author)  , By (author) 

Free delivery worldwide

Available. Dispatched from the UK in 2 business days
When will my order arrive?


With its flexible capabilities and open-source platform, R has become a major tool for analyzing detailed, high-quality baseball data. Analyzing Baseball Data with R provides an introduction to R for sabermetricians, baseball enthusiasts, and students interested in exploring the rich sources of baseball data. It equips readers with the necessary skills and software tools to perform all of the analysis steps, from gathering the datasets and entering them in a convenient format to visualizing the data via graphs to performing a statistical analysis. The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the traditional graphics functions in the base package and introduce more sophisticated graphical displays available through the lattice and ggplot2 packages. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and fielding measures. Each chapter contains exercises that encourage readers to perform their own analyses using R. All of the datasets and R code used in the text are available online. This book helps readers answer questions about baseball teams, players, and strategy using large, publically available datasets. It offers detailed instructions on downloading the datasets and putting them into formats that simplify data exploration and analysis. Through the book's various examples, readers will learn about modern sabermetrics and be able to conduct their own baseball more

Product details

  • Paperback | 352 pages
  • 154.94 x 233.68 x 27.94mm | 476.27g
  • Taylor & Francis Inc
  • CRC Press Inc
  • Bosa Roca, United States
  • English
  • 50 black & white illustrations, 18 black & white tables
  • 1466570229
  • 9781466570221
  • 529,568

About Max Marchi

Max Marchi is a baseball analyst with the Cleveland Indians. He was previously a statistician at the Emilia-Romagna Regional Health Agency. He has been a regular contributor to The Hardball Times and Baseball Prospectus websites and has consulted for MLB clubs. Jim Albert is a professor of statistics at Bowling Green State University. He has authored or coauthored several books and is the editor of the Journal of Quantitative Analysis of Sports. His interests include Bayesian modeling, statistics education, and the application of statistical thinking in more

Review quote

"There are some great resources out there for learning R and for learning how to analyze baseball data with it. In fact, a few pretty smart people wrote a fantastic book on the subject, coincidentally titled Analyzing Baseball Data with R. I can't say enough about this book as a reference, both for baseball analysis and for R. Go and buy it." -Bill Petti, The Hardball Times, September 2015 "The authors present a potpourri of well-conceived case-studies that give insight into both the game's complexity and R's simplicity. Virtually no previous knowledge of statistical theory and software is required to master the data analyses and to follow the explications in this book ... The authors' style of writing is pleasurable and bespeaks their passion for the game. Narratives and R commands are so smoothly intermingled that the source code hardly disturbs the flow of reading, and a wealth of graphs break up the grey. ... A great asset of the book is that it encourages the reader to learn the ropes of sabermetrics by actually running the example analyses on one's own computer." -Journal of the Royal Statistical Society, Series A, 2015 "If you are interested in statistics, especially baseball statistics, you will find this book fascinating and very useful. It provides many details. websites, and useful descriptions for using the R programming environment. This is not only a book on statistics; there are many references to famous player statistics, making this a very enjoyable book to read. And even if you don't like baseball but still find statistics very exciting, then this book provides a great introduction to R that can be used for any other type of statistical data set." -IEEE Insulation Magazine, November/December 2014 "I have spent most of the past decade working in baseball as a statistical analyst for the New York Mets. ... This type of employment can be highly valued, especially among quantitatively inclined college students who are coincidentally passionate baseball fans. It is from these students from whom I am most frequently asked, 'what book would you recommend for someone who wants to get started in sabermetrics?' Invariably, my response has been [Jim Albert and Jay Bennett's] Curve Ball. I have a new response. ... I always felt that Curve Ball was the best place for a budding sabermetrician to start ... However, it later dawned on me that while Curve Ball provided a sound framework for thinking probabilistically about baseball, I devoted a huge proportion of my time at work to computer programming. ... In their new book, Albert and Max Marchi, a native Italian who now works for the Cleveland Indians, have closed the loop by offering the aspiring sabermetrician a blueprint. ... The reader who digests this book alongside her keyboard will emerge as a practicing sabermetrician-having knowledge of the key ideas in sabermetric theory, a historical understanding of from whence those ideas came, and the practical ability to compute with baseball data. It is a sabermetric workshop in paperback." -Ben S. Baumer, International Statistical Review (2014), 82show more

Table of contents

The Baseball Datasets Introduction The Lahman Database: Season-by-Season Data Retrosheet Game-by-Game Data Retrosheet Play-by-Play Data Pitch-by-Pitch Data Introduction to R Introduction Installing R and RStudio Vectors Objects and Containers in R Collection of R Commands Reading and Writing Data in R Data Frames Packages Splitting, Applying, and Combining Data Traditional Graphics Introduction Factor Variable Saving Graphs Dot Plots Numeric Variable: Stripchart and Histogram Two Numeric Variables A Numeric Variable and a Factor Variable Comparing Ruth, Aaron, Bonds, and A-Rod The 1998 Home Run Race The Relation between Runs and Wins Introduction The Teams Table in Lahman's Database Linear Regression The Pythagorean Formula for Winning Percentage The Exponent in the Pythagorean Formula Good and Bad Predictions by the Pythagorean Formula How Many Runs for a Win? Value of Plays Using Run Expectancy The Runs Expectancy Matrix Runs Scored in the Remainder of the Inning Creating the Matrix Measuring Success of a Batting Play Albert Pujols Opportunity and Success for All Hitters Position in the Batting Lineup Run Values of Different Base Hits Value of Base Stealing Advanced Graphics Introduction The lattice Package The ggplot2 Package Balls and Strikes Effects Introduction Hitter's Counts and Pitcher's Counts Behaviors by Count Career Trajectories Introduction Mickey Mantle's Batting Trajectory Comparing Trajectories General Patterns of Peak Ages Trajectories and Fielding Position Simulation Introduction Simulating a Half Inning Simulating a Baseball Season Exploring Streaky Performances Introduction The Great Streak Streaks in Individual At-Bats Local Patterns of Weighted On-Base Average Learning about Park Effects by Database Management Tools Introduction Installing MySQL and Creating a Database Connecting R to MySQL Filling a MySQL Game Log Database from R Querying Data from R Baseball Data as MySQL Dumps Calculating Basic Park Factors Exploring Fielding Metrics with Contributed R Packages Introduction A Motivating Example: Comparing Fielding Metrics Comparing Two Shortstops Appendix A: Retrosheet Files Reference Appendix B: Accessing and Using MLBAM Gameday and PITCHf/x Data Bibliography Index Further Reading and Exercises appear at the end of each more

Rating details

44 ratings
4.27 out of 5 stars
5 43% (19)
4 43% (19)
3 11% (5)
2 2% (1)
1 0% (0)
Book ratings by Goodreads
Goodreads is the world's largest site for readers with over 50 million reviews. We're featuring millions of their reader ratings on our book pages to help you find your new favourite book. Close X