(Page of 8)

Early this year I was asked by the ACBL BOD Bridge Committee to use ACBL tournament data to do a ratings study. The details of my rating system were presented in a couple of posts on this site earlier. For people who are interested in the theoretical aspects of rating systems, the following article from Bridge World is a good start.

https://www.bridgeworld.com/indexphp.php?page=/pages/readingroom/esoterica/bridgeratingsystem.html

My methodology is similar but has a slightly different formula. This system has been used by the Common Game for 4 years. Now I'm using it on ACBL tournament data. I'm going to present a part of the results from this study here since another recent post has generated a lot of interested discussion about ratings.

First, I want to highlight some characteristic of this rating system as the following:

- Rating is at pair/partnership level, not for individual players.
- The data rating is calculated is from board level result. Each board is compared against other tables that played the same board.
- IMPs are separated from Matchpoints. Players can have a different IMP rating and MP rating.
- The rating is a relative scale used to measure players’ strength. For example, a difference of 400 points in rating will give the higher rated pair about 10% chance to win a board in Matchpoints (or 1 IMP/board for IMP score).

The data used in this study is from ACBL tournament database on AWS. Thanks to the ACBL IT department, in particular Julia McGaughy, Tim Crosas, and Mark Turnage for helping me to get access to this data and make this study possible.

For this study, the data used includes ACBL tournament data from 2014 Fall NABC to 2019 Summer NABC. The tournaments include NABC, NAP District final, regional, and sectional. Cruise tournament and any online games are excluded. Tournaments that were limited to NLM and Seniors are also excluded. Due to the fact only pairs games had board level results available in the ACBL database, the rating only has data from pair games. Further filters were applied to limit data to selected events. For NABC all national rated events are included. For NAP district finals, only flight A and flight B are included. For other NABC, regional, and sectional events, only open events and mid-flight events with at least 3 flights are included. Gold Rush and any I/N games were excluded.

This data set covered about 88,600 players, and 173,000 different pairs, of which 28,000 had regular rating (see definition below).

**Rating Basic and Initial Seeding**

As previously described, the rating algorithm starts with calculating each player/pair expected scores per board based on their pre-game rating. Then it will compare with the actual score and generate an adjustment. After each game the total adjustment applied to each pair and created a post-game rating. Once a player/pair has played 200 boards, the rating becomes a regular rating. With fewer than 200 boards, it is considered as a provisional rating.

For initial seeding, players were assigned an initial rating based on event type. A typical regional open pair event will have an initial rating of 1500. If there is a concurrent Gold Rush event, then the initial rating is set to 1550. A typical sectional open pair is set to 1300. Different nationally rated event has a different initial rating. I had a study in my previous post which used regression to study ratings from different national pair events. Here I used the result from that study. Once there are sufficient players in an event that already have a pre-game rating, this initial rating becomes less important: an unrated player's pre-game rating was estimated based on the boards he/she played against rated players, making the default initial rating irrelevant.

**Rating History of a Pair**

Rating history is kept for every pair. The following figure shows one pair's rating over a one-year span. They started around the beginning of 2015 (the early days of data coverage). Even if their initial rating was assigned by a default value, after one tournament it quickly converged to a value that is close to their real strength and did not change much for the rest of the year.

In the following study, only data from players with regular rating are used. Since most of ACBL pair games are scored using matchpoints, the rating is only derived from matchpoint data.

**Expected Score vs Actual Score**

As previously explained, a rating algorithm could calculate each pair's expected based on their pre-game rating and movement data (who they play against and who else plays the same board). We could calculate an expected score (in percentage for a session) and compare it with their actual score. The next figure shows a 3D histogram of Actual Score vs Expected Score.

The same data is shown in next figure as a contour plot. It shows there is a strong correlation between actual score and prediction.

The next figure shows a 2D histogram of (Actual Score - Expected Score). The sigma of this distribution is about 5.93%, agreeing with some other studies showing there is an uncertainty of about 6%. This is probably intrinsic due to the fact that each event/session only plays 24-28 boards. If there were more boards, the sigma would be smaller. Theoretically it should be proportional to 1/sqrt(N).

**What does this mean?**

The above result means that even the best predictive system will still have a large uncertainty. This is true as in other sports. It also means if players' strength are within a certain range, they all have a fair chance to win. For reference the 6% spread in session score corresponds to a rating difference of 240 points. If an event has players' strength spread in a large range, the low-rated players would have no chance at all. Tournament organizer might want to limit the entries of event to players of certain strength range to make it competitive.

The next figure shows the pair's rating distribution from the entire data set. Most of them fall into the range of 1000-2200. This is a very large range.

**Are masterpoints related to rating?**

Masterpoints is a frequently used metric for players in the ACBL. However it is not a measure of player's strength. Some study suggest that players' strength could be better measured with the geometric means of their masterpoints. I have experimented a few different ways to look at this relationship and found it basically true. I would present my results in the following ways.

For each player I defined a S variable,

S = 100 * ln(MP)

where MP is player's masterpoint, ln is natural log, 100 is an arbitrary scale factor I choose in order to make plot in Excel. For a pair, S value is simply the average of two players (this is same as geometric mean).

In following plot, players' masterpoints are taken from ACBL Aug. 2019 publication, their rating is from their last tournament in the data set. Unfortunately the ACBL tournament database does not have a player's masterpoint total at the time they played the tournament. So this relationship is not an exact match in time but should be good enough if we just study their relationship.

The results show rating does have a relationship with S value (related to MP) statistically. However the range of rating for each S value is still large especially for low masterpoint players. For reference, 500 masterpoints has an S value of about 620, 3000 masterpoints has an S value of 800. Because of data selection (only open pair games, and each pair needs to have regular rating) this data set only has some frequent tournament players.

Another observation from this distribution is the lower right side is almost empty. Either these players do not exist, or they exist but do not attend tournaments.

There are also many other possible implications could be drawn from this study and I welcome any suggestions.

101 Comments

.

OR

Benefits include:

- Create and share convention cards
- Create/answer bridge problems
- Connect with other bridge players
- Write and comment on articles

Plus... it's free!