I prefer just calling it Elo rating for now to distinguish from other rating methods. In the future it should be THE BRIDGE RATING as a worldwide system, not repeating Chess' mistake to have multiple rating systems based on country that created conversion problem.
Oct. 24
The quality of field is built in. The first day Platinum pair this year had an average rating of 2170. You need about 2300 to score 54% in that field. A club game could vary a lot. Suppose it has an average about 1300, you only need 1700 to score a 60% game.
Oct. 24
@Jeff, the Excel spreadsheet gave me a mean of 0.38% and sigma 5.93%. There might be an additional skew when Excel made plot, I'm not sure how it generates the counts for each bin. I think it tends to shift to the right. Someone familiar with Excel could comment and check if I did anything wrong.

Removing some board results and recalculating scores will need a lot of computing work. It could be a good research project for graduate student. Some other things I'm thinking to try is to use only last two years' data or somehow filter on events where majority players in event had regular rating. Finally it is also possible that my K-factor is small and it had an effect of taking longer time for high rated players to reach their proper rating.

You are absolutely correct with your observation. I also noticed it and intended to investigate it further.
Oct. 24
I'm glad someone noticed it. The actual mean is about 0.3%. My guess is that this is due to data filtering. Here I'm focused on pairs with regular rating (>200 boards). So the pairs did not play a lot were not counted. As I mentioned in OP this is only 27K pairs out of a total of 173K.

Since the expected scores calculation not only depends on players themselves but also their opponents. My guess is that there was a systematic over-estimate for those pairs that played <200 boards.

I showed one pair history that started with an initial value about 1600 and went up to 2300. Imagine another pair who played poorly and their curve will go down. They probably would stop playing soon. However their rating at those games also contributed to the calculation for regular pairs here. Their true strength was probably lower. If I made the same plot for these players, there would be a negative means.
Oct. 24
It is a histogram. X-axis is expected score, Y-axis is actual score, Z is the counts. For each pair each event, it is a data point. There were 400K+ data point and each one is a count. As you would expected, most of them are centered around 50-50.

The standard deviation is calculated just using delta = actual score - expected score without filtering on what range the actual score and expected scores are.
Oct. 24
You are talking about the usage of this system. That's not the purpose of my study.

If you want to see how good a player is, you only need to look at your own rating in the system. You may also have a dozen different partners and have different rating with each one of them. This is usually in a range. What I see from the data is most of them would only have a few games. If you play with someone and did not scored well you probably would not play with them again. The regular rating usually come from a partner with higher rating in this range. The only exception is partnering with their spouse.
Oct. 24
Yes, the actual score depends on a lot of factors. “Luck” is part of game. The rating system does not try to uncover each factors, it just takes the score “as is”, completely objective.

I would like to explain a little bit more in details on how the predicted score is calculated here because this is a key difference of my system comparing with others.

In chess game, you could simply calculate the expected results based on two players' rating. If a player's rating is higher than opponent by 400, he is expected to win 90% of time, so the expected score is 0.9 (win is 1, tie is 0.5 and loss is 0).

In bridge it is much more complicated, when you make a 3NT and scored 400, is it a good score or bad score? In duplicated game it needs to be compared against at least one other table. So it involves 4 pairs. If we score by matchpoint, the possible score is 1, 0.5 and 0, similar to chess. The expected score must be calculated from 4 pair's rating. Let's say we are calculate for pair A on table 1 and their rating is 400 higher than their opponent. This gives an expected score of 0.6 from the formula in my system. However this is with assumption that the other table had two pairs with same rating. In reality this is not true. If the pair sitting at same direction at other table also has a rating 400 higher than their opponents, then these factor would cancelled and the expected score would be 0.5.

In duplicated bridge, a board could be played many times in a session. We could generate many comparisons from one board. The statistics could accumulate very quickly. To calculate expected scores for a session, you just have to add these scores on each board, each comparison. This should address your concern #3 and your question #2 in previous comments.
Oct. 24
@Richard, my algorithm is based on Elo rating methodology. The detailed formula could be found from the Commongame reference page.
http://thecommongame.com/PingHu/PingHuRatingTechnical.html

It is similar to Chess rating, but not like Power Rating or EBU grade system. In the later two only games from certain timeframe are used. In Elo system, all player's game are considered and the rating reflects them. When a new game result is evaluated, it calculate expected score and compare with actual score to generate an adjustment, which results in a new rating. I'll explain the detail about how to calculate expected score in my answer to your next set of comments. The adjustment is controlled by a K-factor like in Chess rating. In my algorithm the K factor depends on the number of boards player had played, and player's current rating. Player's with less than 200 boards and with low rating could have large K factor. If K factor is too large, the rating change from one game could be large and makes rating unstable. If it is too small it would change very slow and may take a long time for rating to be adjusted to reflect player's strength if the pre-game rating is off (i.e. new player or after a long break).

You could think the adjustment from most recent game carries more weight than any previous game in this system, as it accumulates the old game's weight is less and less. When we build a real system the K-factor needs to be tuned but will be a continued process. USCF has used a system for chess for over 60 years but still changed the K factor a few years ago.

I'll answer your question 2 below. For question 3, the data is from 27K pairs with regular rating as I described in OP. The actual vs predicted score plot has over 400K data points.
Oct. 24
The algorithm is developed by me and Jay owns the common game data.

I think the ideal solution for bridge should be a worldwide rating system, not like chess where each country has its own system. When you have multiple systems you have to deal with the problem to convert one from another. Chess players know this problem.

If we were to have one system, I hope ACBL takes the lead since most of the top players in the world play in its tournaments.
Oct. 17
@Randy, As long as each player has a unique ID, you could identify them and rate them properly. In commongame players who are ACBL members are identified with their ACBL number. Those who are not ACBL members are identified by their Commongame ID.

The rating is at pair level I as described in my letter to Bridge Bulletin. So each player could have multiple ratings.
Oct. 17
An alternative solution is a design a playing table where players should place a card in designated location. A camera could capture the card and have a program to decode it and automatically display it.

Having the playing card in a fixed spot could also reduce potential cheating.
Oct. 17
I use 4NT as play if the response is 0 or 1 keycard. The next suit bid (in this case 5) as asking for trump Q and Kings similar to your continuation after RKC.
Oct. 16
@Monty, if you are interested in your rating, just log into your Commongame account. For each game you played, you'll see a couple of handicap value at the end. PR_Handicap is based on Chris Power Rating. CG_Handicap is based on my rating. You could compare them with the actual score and draw your own conclusion.

As others had noted there is a large uncertainty in predicting scores for one session game. My study showed it is 5-6%. So it needs a lot of statistics to draw conclusions. I think this is an inherent factors in bridge game. The only way to narrow the uncertainty is to increase the number of board played. Theoretically the uncertainty is proportional to the inverse of square root of N (number of boards).
Oct. 16
Kind of, but I doubt my company would spend its resource to maintain a black list of websites. It must use some 3rd party services for this list of blocked websites. They classified Power rating as a Gambling site.
Oct. 16
When I clicked on the power rating's web link from my office, it was blocked by our company's network. It gave me a message with the following information.

User: 10.8.88.125
URL: www.bridgepowerratings.com/
Category: gambling
Application: web-browsing

It thinks I'm trying to reach some gambling site. I recall that when my son was in high school I had him to ask if it was possible to have a bridge club there. School administrator denied it because they think bridge was associated with gambling. We have a serious problem with our image. Where are our PR guys?
Oct. 16
The rating used by Commongame based on my system is based on per board data. It takes into account who you played against and who else played the same board.
Oct. 12
This does not sound like a good rating system to me. It does not measure the strength that should be relatively stable.
Oct. 11
In that case the best strategy is just playing your own game and not worry about others. I think mixed scores will create some variations for players ranked in the middle.

There were discussions about players getting lucky on certain boards. If they were lucky they could scored top in either MP or IMP. Some players commented that if they got “fixed” in IMP score, they could not recover. So I suggest let them choose their scoring method. Did you find any case in your game players MP score would be consistently better than their IMP score?
Oct. 10
Thomas, your real opponents are not those who sit at your table but those who play the same board at other tables. It is more important to know how they would do than the pair you play against.
Oct. 10
Could we set up a pair game and let players choose what scoring method they want to be scored? Some players will choose MP and others IMP. That would be very interesting.
Oct. 10
