Join Bridge Winners
All comments by Ping Hu
You are ignoring the author of this comment. Click to temporarily show the comment.
@John, Clubs running Commongame have their player ratings in Commongame data base. It has beening running for over four years.

When I first developed my system Chris Champion and I had tried to compare results from same set of game files. Our results had some differences. We know they would be different because he only uses the game score and I use board results. I'm not very familiar with his methodology to know if he derives individual rating from pair ratings or vice versa.

If you are intersted in your rating, PM me and I'll send you some files.
Oct. 27, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
@Robert, if a pro only plays with one client I double any rating system to distinguish who is a better player. Power rating also needs you to play at least 3 different partners. If you play with multiple clients and your clients also play with other partners, it is not difficult to figure out who is better.

The problem with individual rating is I could not find a way to determine/measure it objectively. This does not means that it is not possible to define/derive some “individual rating” from pair rating for any specific purpose. The things I could come up to measure individual player are looking at their declarer play of the same contract, and the results on defense when the player is opening leader. However these only measures a specific skill. If you want to know who is a better player in a partnership, just look at the scores when each one is declarer.

The last question you ask is why we need such a rating system when we've already had a Power Rating and NGS. Certainly this depends on what you are going to use it for. With other rating system you probably could get a good estimate on a player's strength. However it I want to get handicap on board basis, it is not possible (Power rating only use game level data so the handicap is for entire game). In commongame we have observed that some club game had a very strong pair, if the movement was in such a way that some pairs played against this pair, but others did not, those who didn't would have a 2% advantage over those who did. A single stregth of field factor applied to entire game, not board to board would not be able to distinguish them.

In my OP I also discussed the uncertainty of score prodiction and it implication to tournament organizer. It is well known that if you put very strong players and very weak players in the same event, no body would be happy. It would drive away the weak players. I could see this in our regional KO because it basically could not have enough teams to start a game now. The lack of high masterpoint but not high rating pairs in my OP data plot may also indicate those players stopped playing in tournament. I think one of possible solution is to bracket players by rating. In my OP the data showed if the rating spread is 240, the players expected score would be in one standard deviation. The tournament organizers could choose their own criteria, but a good rating system could provide a quantified measure to sovle their problem.

Their could be other usages I have not thought of.
Oct. 27, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
@Brenda, contact L4C support about your problem. I know it could not handle USEBIO file created from my Swiss team program as well. Right now the IT department is busy to handle all different type of game from ACBLscore game file, USEBIO is not a high priority.
Oct. 27, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
@Adam, if you read my reply to Richard (3rd comments of this thread), you could see in your scenario the pro expected score would be 50%.
Oct. 27, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
This is a long story. I first proposed this system to ACBL in 2015. The ranking committee reviewed it but did take any action. Their interest at that time was to create a couple of new masterpoint ranks of Ruby and Sapphire to keep people playing.

When Bahar was CEO, Chris Champion and I got e-mail from him and it seemed we could have some study but we never had a concrete plan. Furthermore I think Bahar had some idea of his own to rating individual players based on how they play certain boards. After his abrupt departure, nothing happened.

From my point of view, ACBL has known different rating system for years. Chris had maintained his system for years. However my system needs board result and that data was never available to me. So MP committee would have an interest to see results from my system when they want to decide some kind of measures for Strength Of Field.

As to how a rating system could affect on players, there are several EBU players have said their NGS had generated a lot of interest from players and did not see a large amount of players stop playing as some people speculated.
Oct. 25, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
I think you are asking to see the data used in making 3D histogram. If you are interested PM me and give me your e-mail, I could send you the file.

Here I just give you some numbers for expected score of 45-46% (could not find a good way to insert a table).
Actual Scores | Counts
40-41 856
41-42 981
42-43 1088
43-44 1143
44-45 1244
45-46 1231
46-47 1163
47-48 1189
48-49 1099
49-50 993
Oct. 25, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
Masterpoint Committee is interested in this study. They are considering to award masterpoint based on strength of field.
Oct. 25, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
In my early study I have included Gold Rush and other NLM events. You could found them in my technical note from Commongame website.

For this study asked by BOD Bridge Committee they are only interested in Open games.
Oct. 25, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
No. Online tournaments and some other special tournaments like cruise are not included.
Oct. 25, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
My understanding of NGS is that it uses weighted results of players for a fixed period of time. So even you don't play, your rating could change over time.

Elo methodology is different. All your past results are reflected in current rating. If you don't play it does not change. In Chess rating there is also a “floor”. If a player reach certain level, he would have a floor that his rating would never drop below it. The “floor” raises as player's highest rating reaches a new level.
Oct. 25, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
@Jeff, give you a quick update. I tried to filter on the date range to see how the mean value would change. It does move the value. I tried most recent 2.5 years and 1 year and the mean value went from original 0.38 to 0.36 then 0.34.

I also tried to filter on only players with less than 200 boards and the mean value changed -0.33%. This confirms it is a bias introduced by data selection.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
I prefer just calling it Elo rating for now to distinguish from other rating methods. In the future it should be THE BRIDGE RATING as a worldwide system, not repeating Chess' mistake to have multiple rating systems based on country that created conversion problem.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
The quality of field is built in. The first day Platinum pair this year had an average rating of 2170. You need about 2300 to score 54% in that field. A club game could vary a lot. Suppose it has an average about 1300, you only need 1700 to score a 60% game.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
@Jeff, the Excel spreadsheet gave me a mean of 0.38% and sigma 5.93%. There might be an additional skew when Excel made plot, I'm not sure how it generates the counts for each bin. I think it tends to shift to the right. Someone familiar with Excel could comment and check if I did anything wrong.

Removing some board results and recalculating scores will need a lot of computing work. It could be a good research project for graduate student. Some other things I'm thinking to try is to use only last two years' data or somehow filter on events where majority players in event had regular rating. Finally it is also possible that my K-factor is small and it had an effect of taking longer time for high rated players to reach their proper rating.

You are absolutely correct with your observation. I also noticed it and intended to investigate it further.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
I'm glad someone noticed it. The actual mean is about 0.3%. My guess is that this is due to data filtering. Here I'm focused on pairs with regular rating (>200 boards). So the pairs did not play a lot were not counted. As I mentioned in OP this is only 27K pairs out of a total of 173K.

Since the expected scores calculation not only depends on players themselves but also their opponents. My guess is that there was a systematic over-estimate for those pairs that played <200 boards.

I showed one pair history that started with an initial value about 1600 and went up to 2300. Imagine another pair who played poorly and their curve will go down. They probably would stop playing soon. However their rating at those games also contributed to the calculation for regular pairs here. Their true strength was probably lower. If I made the same plot for these players, there would be a negative means.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
It is a histogram. X-axis is expected score, Y-axis is actual score, Z is the counts. For each pair each event, it is a data point. There were 400K+ data point and each one is a count. As you would expected, most of them are centered around 50-50.

The standard deviation is calculated just using delta = actual score - expected score without filtering on what range the actual score and expected scores are.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
You are talking about the usage of this system. That's not the purpose of my study.

If you want to see how good a player is, you only need to look at your own rating in the system. You may also have a dozen different partners and have different rating with each one of them. This is usually in a range. What I see from the data is most of them would only have a few games. If you play with someone and did not scored well you probably would not play with them again. The regular rating usually come from a partner with higher rating in this range. The only exception is partnering with their spouse.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
Yes, the actual score depends on a lot of factors. “Luck” is part of game. The rating system does not try to uncover each factors, it just takes the score “as is”, completely objective.

I would like to explain a little bit more in details on how the predicted score is calculated here because this is a key difference of my system comparing with others.

In chess game, you could simply calculate the expected results based on two players' rating. If a player's rating is higher than opponent by 400, he is expected to win 90% of time, so the expected score is 0.9 (win is 1, tie is 0.5 and loss is 0).

In bridge it is much more complicated, when you make a 3NT and scored 400, is it a good score or bad score? In duplicated game it needs to be compared against at least one other table. So it involves 4 pairs. If we score by matchpoint, the possible score is 1, 0.5 and 0, similar to chess. The expected score must be calculated from 4 pair's rating. Let's say we are calculate for pair A on table 1 and their rating is 400 higher than their opponent. This gives an expected score of 0.6 from the formula in my system. However this is with assumption that the other table had two pairs with same rating. In reality this is not true. If the pair sitting at same direction at other table also has a rating 400 higher than their opponents, then these factor would cancelled and the expected score would be 0.5.

In duplicated bridge, a board could be played many times in a session. We could generate many comparisons from one board. The statistics could accumulate very quickly. To calculate expected scores for a session, you just have to add these scores on each board, each comparison. This should address your concern #3 and your question #2 in previous comments.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
@Richard, my algorithm is based on Elo rating methodology. The detailed formula could be found from the Commongame reference page.
http://thecommongame.com/PingHu/PingHuRatingTechnical.html

It is similar to Chess rating, but not like Power Rating or EBU grade system. In the later two only games from certain timeframe are used. In Elo system, all player's game are considered and the rating reflects them. When a new game result is evaluated, it calculate expected score and compare with actual score to generate an adjustment, which results in a new rating. I'll explain the detail about how to calculate expected score in my answer to your next set of comments. The adjustment is controlled by a K-factor like in Chess rating. In my algorithm the K factor depends on the number of boards player had played, and player's current rating. Player's with less than 200 boards and with low rating could have large K factor. If K factor is too large, the rating change from one game could be large and makes rating unstable. If it is too small it would change very slow and may take a long time for rating to be adjusted to reflect player's strength if the pre-game rating is off (i.e. new player or after a long break).

You could think the adjustment from most recent game carries more weight than any previous game in this system, as it accumulates the old game's weight is less and less. When we build a real system the K-factor needs to be tuned but will be a continued process. USCF has used a system for chess for over 60 years but still changed the K factor a few years ago.

I'll answer your question 2 below. For question 3, the data is from 27K pairs with regular rating as I described in OP. The actual vs predicted score plot has over 400K data points.
Oct. 24, 2019
You are ignoring the author of this comment. Click to temporarily show the comment.
The algorithm is developed by me and Jay owns the common game data.

I think the ideal solution for bridge should be a worldwide rating system, not like chess where each country has its own system. When you have multiple systems you have to deal with the problem to convert one from another. Chess players know this problem.

If we were to have one system, I hope ACBL takes the lead since most of the top players in the world play in its tournaments.
Oct. 17, 2019
.

Bottom Home Top