February 2019 Release Notes

Published on March 2, 2019 3:42 PM by dbo.

The February 2019 release notes for the CFLdb family of sites details the sparse number of updates to the CFLdb family of sites. Read on to learn more.

The following changes have been made to CFLdb Statistics since the last report:

  • Data: Fixed September 13th, 1980 Edmonton @ BC attendance number (from 21,980 to 30,793), recompiled 1980 attendance aggregate data (thanks Colin!).
  • Data: Various other data updates (900 updates, additions and deletions this month)
  • Bug: Fix bug causing text name search to be falsely parsed as date and return date results.
  • Bug: Fix two bugs catching possible error conditions rather than returning error.

Along with numerous other cosmetic, data, performance, administrative, or infrastructure related bug fixes and enhancements (5 commits this month).

Thanks to contributor Colin this month.

Portrait of a Contributor

This month’s contributor Colin Gardiner showcases the traits of a community of strangers that are building a source of Canadian Football data available to all, owned by all.

Like many across this nation, Colin attended one CFL game a year during his teenage years. Each year Colin and his dad would travel from Victoria to a Lions game at Empire Stadium against the powerful Eskimos, who always drew a crowd. Colin took notes of the experiences, including the attendance and atmosphere. When reviewing CFLdb Statistics, Colin noticed the attendance listed for the Sept. 13th, 1980 game he attended did not match his memory and referred to his notes kept all this time. They confirmed a “crowded” stadium and a virtual sellout, which didn’t jive with the number listed for that one game, though all others he had notes on matched the site. Colin reached out and graciously explained the situation, providing the game and details. I was easily able to confirm there was an error by checking sources which matched Colin’s number, and then correct the information.

Finding this data entry error in thousands of games through audits of all kinds is an almost impossible task for one person. Such an error, unbeknownst to me, would never be found. Yet it was found and corrected, because one person saw it, suspected it was wrong and reported it. That game page now has the correct attendance, and the 1980 season attendance figures are updated to reflect the corrected info. Such a story for how it was found and corrected is fantastic to me and proves the power of crowd sourcing.

CFLdb has a base of information sourced from the CFL, from my records, other major contributions by researchers, and contributions by many individuals. Game data provided with the CFL has been electronically audited against existing data and is the most reliable. Attendance data has been contributed by CFLdb and other individuals, which leads to potential issues in the original capturing of the information and data entry into CFLdb Statistics.

Individuals can contribute so much by finding these small errors, or contributing missing information. In theory, the small contributions from all individuals could add up to or exceed the total contributions from the major contributors. Crowd sourcing the information in this way will complete and validate the record in a way no individual can. I do not profit from this data, I sell no ads or memberships and pay all site costs out of my pocket. This ensures that contributors feel they are at the same level as the steward of site, and not being exploited for their time and research. The Canadian Football Datasette is the first experiment to provide this data back to the community for use and consumption and I encourage all to test it out and provide your feedback before the June deadline.

What about you? Maybe you don’t have any notes. Maybe the data you have collected is already present. There are other ways to contribute. Wonder why the site is missing information like Miss Grey Cup winners, lists of team trainers, broadcast crews or awards like Friday Night Gladiator (the requested list goes on)? If you have the information, contribute it. I believe the number of data collectors and note takers like Colin out there are greater than the number made known to others. If you aren’t/weren’t a collector, but feel that need to complete the record, go do the research to contribute to the site. If a considerably complete amount of data is contributed, rather than a single contribution for an area not currently present (e.g. a single Miss Grey Cup winner), I will see it gets added so others can benefit and add to it. In this way, one source for the football record gets completed. Not interested in historical research, but still want to contribute? Validation of the information in the FAQs and other resources is always welcome. Please reach out, make yourself known and name your interests. Finding roles for the willing is never an issue. Contributors can always decide to remain anonymous if they choose, and make no commitment to complete any work they express interest in.

Datasette Update

About one month into the Datasette experiment, there are some things to report. First, in terms of usage, 35 unique individuals visited the Datasette site, with only a handful of these spending enough time to take the site through its paces.

One visitor provided feedback on the site. Dial H. respectfully made a case for normalizing (a database term) the score data so it could be searched by team without distinguishing between home or visitor location.

The request was considered, but ruled out at this time for a number of reasons:

  1. Presently, a team can either be in the home/home_id or visitor/visitor_id columns (but not both) for a single game with the corresponding score data in home_score/visitor_score columns. Moving this data to another table that had team_id, score and location (home/visitor) columns would allow the query Dial wishes, but means two rows would be returned for each game without some fancy querying (outside the capability of the web interface). This takes use of the games data out of the expected experience of most people.
  2. The main concern was needing two queries to find all games for a team with specific score criteria. This is true for the web interface, but the effort needed to run the query on visitor_id and then change to home_id is minimal. There are two result sets (and two exports, if needed), however, this additional effort provides valuable information (occurrences distinguished between home and visitor), even if not required by the researcher. Through the SQL interface, an OR clause in the WHERE statement or a UNION query can accomplish the task in one query, so those with the knowledge or who are willing to seek help on the advanced interface have a way to conduct this research.
  3. Changing the structure of the data schema for the Datasette is not desired at this time. If more people express value in this data format, it may be added as a transform beside the existing format. However, I would prefer others download the Datasette, transform the data themselves, and publish that to the community as their contribution. I believe working with the data helps expose patterns and challenge assumptions of researchers opposed to being able to ask questions (queries) to confirm presumed answers (confirmation bias). This may not be true for simple queries like the proposed example, but comes into play with amateur researchers taking on more complex questions without the thought of challenging their assumptions or confirming their results. I prefer researchers do some of the work themselves rather than wishing others provide the data and tools to query for their own use, which also allows them to recognize the effort made in making the data available versus naively believing all data is captured and available on the Internet without any human labour. This understanding makes them more likely to contribute back in contrast to disregarding the efforts of others that they benefit from because they don’t believe a person was involved.
  4. With the above reasons, the data is presented in the traditional format used by the CFL and other researchers. That format must be made available to maintain the ability to exchange data and updates.

Overall, 35 trials and feedback from one individual is not a bad start considering it was only mentioned in an introductory article and release notes. Part of the experiment is to see how this offering will spread through the research community, or if the community is too loosely knit and protective to work together on source information available to all.

For suggestions to be acted upon, I need substantial indication from other researchers that the suggestion is valuable before I commit time to make changes. I want to spend my limited time delivering the most value to all researchers, not making changes useful to a single person only. This means those with suggestions need to drum up support in the community and get others to voice their support for the proposal to CFLdb (see below for contact info).

Contributions, Corrections and Suggestions

Send all contributions, corrections and suggestions to me via methods found on the Contact page. All contributions are credited, with more prominent credit for the largest statistics contributors on the CFLdb Statistics home page. The contributor list will, therefore, reflect the true owners of the site, those that research, compile and contribute data and corrections.

Conversation

Comments are closed. Continue the conversation on Twitter.

Meta

February 2019 Release Notes was published on March 2, 2019 3:42 PM by dbo.

1,575 words.

This article is categorized under cfldb.ca and tagged with release-notes.