Join Bridge Winners
Exploratory Data Analysis of Chicago Club Attendance

Ping Hu was gracious enough to share the data set that he used for some of his recent work on bridge ratings. I used this to create an Exploratory Data Analysis (EDA) that is contained in the PDF file that is linked in this article

Here is the To Long / Didn’t Read (TLDR) version

I focused my analysis on a data set that contains four year’s worth of data that tracks which partnership’s played at which clubs during this stretch of time. The dataset included

  • 4,029 unique games that occurred at different clubs
  • 13,039 unique partnerships

Based on this dataset, it appears that - at the partnership level - there is very little diffusion across clubs. Even in a fairly dense urban center like Chicago bridge is played in a relatively small number of closed groups, with the same partnerships contesting in the same events. [Note: To some extent, this is an artifact of the data set which ONLY includes club games. It doesn’t include regionals, sectionals, and the like where we would expect to see much more mixture across players]

To me, at least, this leads to a number of observations

First: This significantly increases the difficulty of creating useful ratings systems and, is potentially an argument against creating a ratings scheme.

Second: If bridge is (primarily) being played in isolated populations, then this should be reflected in the ratings structure. For example it might make sense to have a tri-part ratings structure. Players would have

  • One rating that is specific to their local club
  • A second rating that is specific to performance in regionals / sectionals
  • A third rating that was specific to results in National events

It’s entirely possible that the sectional / national level ratings should be specific to the class of events that you are playing in. [You don’t have the same player base competing in both the Gold Rush and the Blue Ribbon pairs].

Please note: I know that lots of people are going to hate this suggestion and say that its too complicated, yada yada yada. Balanced against this,

  1. This is consistent with reality. It reflects the way that the game is actually played
  2. This will yield more accurate results within the various bands
  3. This will simplify the data storage and processing requirements by orders of magnitude


Third: I think that this illustrates why it is critical that the accuracy of various predictive models be evaluated against data sets that are consistent with the way in which the model is being applied. It’s much much easier to train a model against a data set like the Chicago set than it is to get good results against a data set that contains a radically unbalanced mixture of club games and more even distributed events like sections and regionals.

Getting Comments... loading...

Bottom Home Top