College Splits > About

FAQ ] [ Glossary ]

 Search for Player:
 First:    Last:   
 
 

About > About the Data

Our splits & play-by-play data doesn't match official totals perfectly, and in many cases it may never do so. However, our foremost goal isn't to
perfectly replicate official totals, it's to provide of value something beyond that: hence the exhaustive split and situational statistics.

That said, you may be interested to know why our data doesn't match, and whether the problems can be solved. There are three main issues involved here:

1. Data agreement. Most the time, the play-by-play descriptions we get our data from weren't compiled by the official scorer. Thus, judgment calls (errors, fielder's
choices, earned/uneared runs and RBIs) may not be correct. Further, it's likely that there's more than a handful of mistakes, simply because humans are scoring the games.

2. Missing data. Hard as we try, we don't have play-by-play for every single Division 1 game. A number of even the most prominent Division 1 baseball schools still
do not provide the necessary data. But we're close: a month into the season, we have well over 95% of games accounted for, and we're constantly hunting down missing games.
Best of all, there are very few missing games among the top 30-50 teams. But despite all that, 100% isn't going to happen.

3. Processing errors. Obviously, it's a lot easier to score games and count up runs, hits, and RBIs than it is to parse a play-by-play description, then write a
program to extract those stats from a play-by-play log. We're constantly tweaking our software, but there are probably mistakes in there, particularly with judgment-call
stats such as earned runs and RBIs. Events on the bases (SBs, CSs, etc.) can also be problematic.

Navigation

College Splits Home

Minor League Splits Home