Published on January 28, 2019 8:41 PM by dbo.
CFLdb is happy to introduce Canadian Football Datasette as a resource for Canadian Football researchers. The datasette provides a data source to perform queries on game data and export the results, or export the available games for study and research using their own (advanced) tools and techniques. In addition, the datasette site provides a simple API to interface with the data programatically for advanced users. This experiment is provided to the community with the request for feedback to determine the value and feasibility of expanding the information in the datasette.
The datasette consists of regular season and playoff games of the IRFU, WIFU, CFC and CFL along with playoffs games for other leagues competing for the Grey Cup (including service teams during the Second World War years), all Grey Cup games, and the Dominion Championship games for 1907 and 1908 for a total of 6,264 games. This is roughly equivalent to the regular season games for the year-by-year standings and playoff game results listed in the CFL Guide and Record Book (previously CFL Facts, Figures and Records). There may be playoff games included or missing in the datasette compared to the 2018 CFL Guide and Record Book; see below for how to report corrections.
The information available for each game includes year, game number, week, game event, game date/time, visitor team, visitor score, home team, home score, overtime, quarter scores by team (when available) and more. See below for detailed list of fields and explanation and caveats on their contents.
The information available consists of the 1907-2011 standard game information provided to CFLdb Statistics by the CFL in 2012. Data from 2012 to 2018 has been compiled from official CFL sources. Updates in upstream sources for historical data may not have been updated in the published data. Where data is not available, fields will be empty. See the field list below for details on which fields may not contain information for all games. Data is provided as-is and without warranty. Consumers of the data accept the terms of the site, and accept their responsibility for validating data or accept an unknown level of inaccuracy in the data.
The data is licensed with the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. This requires all published usage of the data to attribute the source, and if you remix, transform or build upon the data, you must distribute your contributions under the same license. This is not optional, using the data in any works requires attribution and distributing the data under the same license if you remix, transform or build upon the data in any way. If this is not agreeable, then the Canadian Football Datasette data should not be downloaded and used.
Canadian Football Datasette is a read-only database of Canadian Football game data fronted by an interface for querying the information. At the introduction of the datasette, there is a single source of information to query — games. From the home page, click on the games link to see the data in the games source.
From the games source, all games are shown by default. The results can be paged through with 100 results displayed per page. To get the most out of a datasette, filtering the data source provides a way to explore the data, search for patterns and examine smaller result sets.
There are two methods of filtering or querying information in sources: the simple interface and the SQL interface.
The simple interface allows filtering by multiple fields and ordering the data. At the top of the page, select a column, comparison and value to filter by. Multiple fields can be filtered on to build complex queries; click the Apply button to apply the filter, and another field filter row will be added to further narrow the results if required. Sorting can be applied to a single field only. There is no limit on the results for the simple interface, all results will be available, paged in 100 game increments. The page forward and backward buttons are found at the bottom of the result table.
By default all columns are returned in the result set, and this cannot be changed.
Facets provide an interface for quickly exploring and filtering the data. A list of fields suggested for facets appear above the result table. Clicking on a facet will show a list of values for that field along with a count of rows with that value. Clicking on a value from that list will show only records for that value, allowing fast drill-down into the data. Suggested facets are listed for fields with few unique values. If more than 30 unique values exist, only the most frequent 30 values will be shown and an ellipsis will appear at the end of the list to indicate there are more values not listed. Facets can be combined with filters to quickly explore subsets of data. For instance, the year field is not facetted (there are over 100 years in the data source), but a filter can be used to select all years less than 1958, for example, and then the home team facet used to show the counts for each team, with the ability to quickly filter to each team’s home games.
Note: The SQL Interface is for advanced, technical individuals only. No support is provided. While technical minded self-learners are encouraged to explore the SQL interface using the many resources available on the Internet, others should avoid frustration and stick to the simple interface.
To initiate the SQL Interface, click the View and Edit SQL button from the game data source page. This can be done after any simple interface filter is applied to use those filters and sorting as a starting point for the SQL statement in the editor.
The SQL statement editor allows the use of any valid SQL SELECT statement to show results. This includes limiting fields displayed, filtering using WHERE clause, ordering results by multiple fields, using aggregate functions and GROUP BY, and performing calculations.
For example, the following query will show all games where the home team score more than 42 points than the visiting team, ordering the results by year, restricting the returned columns, and adding a calculated column with the score differential.
select year, game_event, game_datetime, visitor, home, (home_score - visitor_score) as score_differential from game where (home_score - visitor_score) > 42 order by year
SQL queries are limited to a maximum 2000 non-paged results. Queries should be used for drilling into the data, not returning all results. If a condition occurs more than 2000 times, it is considered not unique enough to be targeting the condition. Queries can use the LIMIT statement to limit results, and page through results manually. See the documentation links below for more information. If this limitation is a problem, consider downloading the data and importing into the tool of your choice to perform the data manipulation needed.
Experienced SQL users will be able to use the SQL interface to create advanced queries to explore and mine the data source for answers to almost any question found in the data.
The complete data or subset from any filter/query can be downloaded in json or csv format. This is recommended for technical individuals only and how-to instructions are not detailed here.
For more information on advanced datasette topics, see the following (for advanced, technical individuals only):
The Canadian Football Datasette is provided as an experiment. As such, there are no guarantees over the availability of the site nor its long term future (see below). While open to the public, the site is prevented from being indexed by search engines at this time. If the datasette initiative continues after its experiment period, I commit to updating it with the results of new seasons following the conclusion of each season.
Civil and thoughtful feedback is requested from the research community. Is the datasette a helpful resource? What information is missing that would make the datasette more valuable? What other areas, beside games, would provide the most value in a public datasette?
Think that data is included or excluded in error? Found an issue with data in the collection? Courteous and civil corrections are welcome. Here is an example:
I’ve been using the Canadian Football datasette and believe it is missing these games from the collection that the CFL considers part of its and Grey Cup history.
[list of games]
My source for this is the [name/link source]. I believe these games should be included to make the datasette match official sources.
Make adjustments to the above if the correction is a deletion or report of inaccurate data, and feel free to paraphrase, but be civil, humble, concise and complete. Reports can be made by email or the contact form found on the main CFLdb contact page. All reports should clearly indicate the games by id, date and participants if in the database, and include all data if a requested addition (a csv file is acceptable for this). I will not perform research to fill in data for partial addition requests, nor guess or read minds as to what games in the database are being referred to.
A list of fields, their purpose and other details on the data contained is found below. All fields without values are blank/null. This may be because the information is not available (attendance, quarter scores), or because it is not applicable (game_series for regular season games).
CFLdb owes its existence to the Canadian Football research community and there have been many requests for making the data on CFLdb Statistics available to researchers and developers to create their own works. The Canadian Football Datasette is the first step in providing the data and an API for researchers to try. Feedback from this initiative will drive whether the datasette direction is continued and expanded. Finding a way to make the data public and accessible with the minimal amount of technical knowledge is the goal, and an assessment on the results of this experiment will be made in June, 2019. A lack of feedback will indicate a lack of interest and usage, and the datasette will disappear and I will return back to the drawing board in search of a better idea to accomplish this goal. Feedback that provides improvements, better ideas, or clearer understanding of the needs will lead to improvements in the datasette or a pivot towards something better.