You are ignoring the author of this comment. Click to temporarily show the comment.

To any who don't agree with me, let's just do another test. Let's just concentrate on the case West has 4 hearts and East has 2. The initial odd for West to have J is 2:1. If you think after first round of heart the odd should not change, how about we play another round of heart and both followed low. Could you agree now that the chance West has J is 100%? Or by your argument it is still 2:1?

I did misread Charles cases where opponent always play the lowest spot cards. I agree with this rule the probability will change from 75%. However once you see the cards played, your are in one of the path (“case”) and it eliminate the other possibilities that existed before the cards played. If you don't agree, just think about what happen if you play another round of heart.

You are ignoring the author of this comment. Click to temporarily show the comment.

Yes, I think you have a valid point that I should study rating results from per board result vs using session results. I could do that once I accumulate a lot of data and get the details of how other motheds using session data calculate exactly. However my understanding is other rating methods like EBU only uses session data. In ACBL, Knockout game does not keep per game IMP score so it is hard to find a lot of IMP game data.

You are ignoring the author of this comment. Click to temporarily show the comment.

Charles, let me use you scenario but restate is as following: there are 6 possible cases what spot cards are played, then let's calculate for probability.

Case 1: West plays 4 & East plays 2, the E1 would be J4xx with East having 2x, E2 would be West having 6543 and East having J2, here Pr(E1) = 75%

Case 2: West plays 3 & East plays 2, E1: J3xx vs 2x, E2:6543 vs J2, again Pr(E1) = 75%

Case 3: West plays 2 & East plays 3, E1: J2xx vs 3x, E2:6542 vs J3, Pr(E1) = 75%

Case 4: West plays 2 & East plays 4, E1: J2xx vs 4x, E2:6532 vs J4, again Pr(E1) = 75%

Case 5: West plays 2 & East plays 5, E1: J2xx vs 5x, E2:6432 vs J5, Pr(E1) = 75%

Case 6: West plays 2 & East plays 6, E1: J2xx vs 6x, E2:5432 vs J6, again Pr(E1) = 75%.

We could make more cases for what spot card played on first round, but you could find out in each case, the probability for West to have J (E1) is 75%.

The issue is once the first round cards are played, your are in one of these case. The rest of the cases could not happen any more. We are living in an universe, not multiverse. Once an event happened and you are in this “case”, you could not revert time and get into other “cases”.

You are ignoring the author of this comment. Click to temporarily show the comment.

Lior, I got your meaning of IMP score implied a comparison between different players.

However when the score is not calculated at board level, but an average per session of 0.2 IMP/board, a lot of things are indistinguishable. For example a 12 IMP score on board level would averaged out when you looked at session score. In addition you don't have a single opponent to compare and have average their effects.

At session level, an average+ game would have 0.2 IMP/board and 53%, but they could be quite different at board level. I did calculate both IMP rating and MP rating. They are correlated but NOT strongly correlated. I'm not surprised that EBU find their results at session level strongly correlated because they lack of data granularity to see the difference. The problem I have now is I don't have enough IMP game data. Most of my study are MP data computed for IMP rating. Ideally you want to compute IMP rating using IMP game data, and comparing it against MP rating computed from MP game data.

You are ignoring the author of this comment. Click to temporarily show the comment.

Wei-Bung, you mixed probability with fact. When an event has not happen, it is a probability. When an event has happened, it is a fact. Fact is 100%. Before East played ♥2, it could be either in East hand or West hand. You could calculate each one's probability. Once he played ♥2, it is a fact. It is 100%. You could not take this card back and put it in West's hand. So no matter how you calculate, the probability for that card in West's hand is ZERO. If you calculation is still counting West could have this card, it is wrong. That 1/6 and 1/4 are now both changed to 1 after the card is played.

You are ignoring the author of this comment. Click to temporarily show the comment.

Charles, you just made a self-conflicting statement. First, you said “every card played may change the odds, yes”, then you said the probability (after playing one round of heart) “must apply in advance”. If I read it correctly, you meant the odd should not change.

I have explained in previous response why the odd changes after one round of heart. You have seen two cards and that eliminate the possibility these cards located in opposite side. If you would rather think in term of original position, you know the position of two out of six cards. The unknown is the 4 cards left. It does not matter how many choices opponents could select to play on first card. Once they played whatever the card they selected, it is 100%. Other choice do not exist any more.

You are ignoring the author of this comment. Click to temporarily show the comment.

Lior, as a former particle physicist, I concur with your comments about statistical model and variance. I disagree with your last part of comments because bridge is different from chess.

In chess, both players are starting with the same position (you could argue white and black makes a difference but the chance to draw white/black is equal). A chess game's result is definitive, win, loss or draw. So you could compute the chess rating from single game result because both player starting with equal and the result could be compared with a prediction by rating.

Bridge is different. Every board is a different hand. If you only know a score of 170, you could not judge it is a good score or bad score. It has to be compared against others who played the same board. In order to calculate rating, I chose to compare at board level where the players has exact the same position(same hand). The only difference is they face the different opponent but this is factored into the calculation. If you calculate it using session aggregated results like EBU, you introduce another systematical uncertainty that increases the variance. Finally MP and IMP has different strategy and I don't think one could be equivalent to the other in a definitive way. If you put these two types of game into one rating, it introduces another systematical uncertainty. When the variance due to these systematical uncertainty is large enough, it makes the rating result less meaningful.

You are ignoring the author of this comment. Click to temporarily show the comment.

The probability definitely changes after you played 1 round of heart, as it eliminated a lot of possible choices that were in initial position but becomes impossible once 4 cards are played.

For example, with initial 4-2 distribution, there are 5 possible choice for J with 2 cards, 10 with 4 cards. Once you played one round of heart, each side has showed a card. The possibility that these cards are in the other side are eliminated. Now you've already know the positive of 2 cards out of original 6. So the remaining possibility changed. There is only 1 choice J could be in the short side, and 3 choices otherwise.

You are ignoring the author of this comment. Click to temporarily show the comment.

I'm not sure this is a case you could use the normal probability calculation because of the discard. If opponent discards are random and we assume you know East has ♣Q, at this 3 card ending the odd West having the other ♣ is 3:2. In addition there is 25% chance East started with ♥J so it favors playing for drop.

However if you read West discarding of ♣9 as the last ♣(why not discard ♣2 if he could), then ♥ is 4-2 and the chance West having ♥J is 3:1 and you should play finesse.

What is West discard a ♥ instead of ♣9? If he starts with !Jxxxx, by restrictive choice he has to discard a ♥, it seems it is a stronger evidence for finesse.

You are ignoring the author of this comment. Click to temporarily show the comment.

Tim, as a matter a fact, USCF needs 26 games to get a regular rating. Before that it is consider “provisional”. Let's assume at their 26 games, one was rated 1220 and the other 1180 but they were really the same level. As long as they continue to play, their rating will continue to change and fluctuate around 1200. As Robin pointed out, there is an inherent measurement “error”. Everyone could have a good game or bad game in chess or bridge. So when we use game results to adjust rating, there is a built-in uncertainty that would make the rating fluctuates around the “true rating”.

The website below is a section's rating from last year's chess World Open. http://www.uschess.org/msa/XtblMain.php?201407068692.7 As you could see, a rating change of +-20 is very normal. For players that performed very well, their rating could go up by over 200 points (they probably did improve, if you click on their name, you could find their rating history).

You are ignoring the author of this comment. Click to temporarily show the comment.

Tim, it is true rating should measure the current ability. The problem is how to measure it. What we could see is just how well you did in a game, like you score 420 on board 1, -100 on board 2 etc. You might have a good game one day and a bad game next day. Even we average them over a large amount of games, there would still be an uncertainty.

There is a concept of “current performance rating”. You could think of it as what an unrated player would get after his first game. However this will have a very large uncertainty. Some rating algorithm that only takes recent certain number of games will have player's rating fluctuate widely.

What I have in my system is to calculate an expected score/handicap for every board based on the player's pre-game rating, his opponent rating and ratings from every other players that played the same board. Then I use the actual score comparing with this expected score/handicap. If player's score is better he gets a positive adjustment. If it is worse, it is a negative adjustment.

The idea is rating should reflect player “current ability” and we use his most recent game to validate it. If he performs exactly at what he is expected, there would be not adjustment. If his performance is better than his rating predicts, his rating get increased. If it is lower, it decreases. So the rating should eventually converge on player's “ability”.

You are ignoring the author of this comment. Click to temporarily show the comment.

This depends on what rating is set to measure. I think the rating should measure the playing ability. If a player with established rating stopped going to tournament for 3-5 years and resume to play again. Does he playing ability changed? Should he/she be treated as a “new” player? I think the reasonably assumption is his/her ability did not change.

For player that continue to play, their most recent game results are used to adjust their rating so older game automatically getting less and less weight. However there is a formula that produce different values for high rated players comparing with low rated players. It allows high rated player's rating changes less than low rated.

As Robin pointed out, part of rating is error/uncertainly. One good game or bad game would result an rating change but the player's ability did not change much. The next game it will change back. This variance is “error” and could be controlled by K factor.

You are ignoring the author of this comment. Click to temporarily show the comment.

Robin, I agree with you in general that rating consists two parts: a “true” rating and an error/uncertainty. However handling historical game is an different issue.

In EBU scheme, historical game will have a weight automatically decayed with time. In my system and chess rating system, a player's rating keeps what it is if he stops playing. So if you stop playing for a year, your rating will stay the same.

Once a player (in my system it is the pair) has an established rating, new game has less weight especially for higher rated players. So it tends to be stable. Only when players consistently performs above/below their expected level, their rating will change in one direction. If you are interested I could e-mail you my document with details. The degree how much rating changes from most recent game is determined by a K factor. I think I mentioned US chess federation adjusted their formula two years ago. They have had this rating system for 70 years but still changing the parameters. So even we had a rating system in place we could still fine tuning some parameters.

You are ignoring the author of this comment. Click to temporarily show the comment.

In chess rating and my rating system, the weight of a game based on time is implicit. All historical game effects are already in the current/existing rating. The most recent game will result in an adjustment, that carries more weight than historical games. However we don't go back to re-rate all historical games.

EBU is explicitly rating historical games with different rate. Other rating methods may only count a certain most recent games.

When I said all games are counted equally, I meant the games from different events (either NABC or club game) were treated the same when they were calculated. A board played by 3 tables and 100 tables will have a different weight. Because the calculation is done by comparing score against all other tables. So 3 tables only results in 2 comparison but 100 tables has 99. So the later automatically carries more weight. This just comes from the fact that how many times a board is played, no matter it is a national or club event. So a world wide game using same duplicate board could carry most weight. However I also have a normalization formula to set a limit on how much weigh a single board could have so it does not skew the result.

Rating should have some predictive value. In chess rating if a player is rated 400 points higher than opponent, he/she is expected to win 90% games (if I remember correctly). In my system, I chose 400 point different to give higher rated pair handicap of 1 IMP per board, or 10% in match point (55% vs 45%). These are parameters used in the calculation and could be adjusted during study period. A valid study would be to rate a universe of pairs and see how their rating change over time. The assumption is a pair's playing ability does not change over time, so if they are rated correctly the rating will never change. In practice, this may not be true because some pair improve over time.

In term of performance variance, they should be averaged out with statistics. This is one of the reason I calculate the result at per board level. If you only have one score per game, you only have one data point like 55%. When I calculate at board level, it has 20-30 data points. Even so there will be some variations. A new pair with provisional rating is allowed to vary their rating by large amount. This is determined by a K factor. Once they have established rating this factor will reduce. It will also reduce for higher rated pairs. The assumption is strong/establish pair are more stable and their ability changes slowly. So one bad/good game is more likely a statistically anomaly than real change in ability.

Ping Hu

I did misread Charles cases where opponent always play the lowest spot cards. I agree with this rule the probability will change from 75%. However once you see the cards played, your are in one of the path (“case”) and it eliminate the other possibilities that existed before the cards played. If you don't agree, just think about what happen if you play another round of heart.

Ping Hu

Ping Hu

Case 1: West plays 4 & East plays 2, the E1 would be J4xx with East having 2x, E2 would be West having 6543 and East having J2, here Pr(E1) = 75%

Case 2: West plays 3 & East plays 2, E1: J3xx vs 2x, E2:6543 vs J2, again Pr(E1) = 75%

Case 3: West plays 2 & East plays 3, E1: J2xx vs 3x, E2:6542 vs J3,

Pr(E1) = 75%

Case 4: West plays 2 & East plays 4, E1: J2xx vs 4x, E2:6532 vs J4,

again Pr(E1) = 75%

Case 5: West plays 2 & East plays 5, E1: J2xx vs 5x, E2:6432 vs J5,

Pr(E1) = 75%

Case 6: West plays 2 & East plays 6, E1: J2xx vs 6x, E2:5432 vs J6,

again Pr(E1) = 75%.

We could make more cases for what spot card played on first round, but you could find out in each case, the probability for West to have J (E1) is 75%.

The issue is once the first round cards are played, your are in one of these case. The rest of the cases could not happen any more. We are living in an universe, not multiverse. Once an event happened and you are in this “case”, you could not revert time and get into other “cases”.

Ping Hu

However when the score is not calculated at board level, but an average per session of 0.2 IMP/board, a lot of things are indistinguishable. For example a 12 IMP score on board level would averaged out when you looked at session score. In addition you don't have a single opponent to compare and have average their effects.

At session level, an average+ game would have 0.2 IMP/board and 53%, but they could be quite different at board level. I did calculate both IMP rating and MP rating. They are correlated but NOT strongly correlated. I'm not surprised that EBU find their results at session level strongly correlated because they lack of data granularity to see the difference. The problem I have now is I don't have enough IMP game data. Most of my study are MP data computed for IMP rating. Ideally you want to compute IMP rating using IMP game data, and comparing it against MP rating computed from MP game data.

Ping Hu

Ping Hu

Ping Hu

I have explained in previous response why the odd changes after one round of heart. You have seen two cards and that eliminate the possibility these cards located in opposite side. If you would rather think in term of original position, you know the position of two out of six cards. The unknown is the 4 cards left. It does not matter how many choices opponents could select to play on first card. Once they played whatever the card they selected, it is 100%. Other choice do not exist any more.

Ping Hu

In chess, both players are starting with the same position (you could argue white and black makes a difference but the chance to draw white/black is equal). A chess game's result is definitive, win, loss or draw. So you could compute the chess rating from single game result because both player starting with equal and the result could be compared with a prediction by rating.

Bridge is different. Every board is a different hand. If you only know a score of 170, you could not judge it is a good score or bad score. It has to be compared against others who played the same board. In order to calculate rating, I chose to compare at board level where the players has exact the same position(same hand). The only difference is they face the different opponent but this is factored into the calculation. If you calculate it using session aggregated results like EBU, you introduce another systematical uncertainty that increases the variance. Finally MP and IMP has different strategy and I don't think one could be equivalent to the other in a definitive way. If you put these two types of game into one rating, it introduces another systematical uncertainty. When the variance due to these systematical uncertainty is large enough, it makes the rating result less meaningful.

Ping Hu

Ping Hu

For example, with initial 4-2 distribution, there are 5 possible choice for J with 2 cards, 10 with 4 cards. Once you played one round of heart, each side has showed a card. The possibility that these cards are in the other side are eliminated. Now you've already know the positive of 2 cards out of original 6. So the remaining possibility changed. There is only 1 choice J could be in the short side, and 3 choices otherwise.

Ping Hu

You had an assumption of 50% ♣9 is by restrictive choice. You also had an assumption of 50-50 East having ♣2.

Ping Hu

Ping Hu

However if you read West discarding of ♣9 as the last ♣(why not discard ♣2 if he could), then ♥ is 4-2 and the chance West having ♥J is 3:1 and you should play finesse.

What is West discard a ♥ instead of ♣9? If he starts with !Jxxxx, by restrictive choice he has to discard a ♥, it seems it is a stronger evidence for finesse.

Ping Hu

The website below is a section's rating from last year's chess World Open.

http://www.uschess.org/msa/XtblMain.php?201407068692.7

As you could see, a rating change of +-20 is very normal. For players that performed very well, their rating could go up by over 200 points (they probably did improve, if you click on their name, you could find their rating history).

Ping Hu

There is a concept of “current performance rating”. You could think of it as what an unrated player would get after his first game. However this will have a very large uncertainty. Some rating algorithm that only takes recent certain number of games will have player's rating fluctuate widely.

What I have in my system is to calculate an expected score/handicap for every board based on the player's pre-game rating, his opponent rating and ratings from every other players that played the same board. Then I use the actual score comparing with this expected score/handicap. If player's score is better he gets a positive adjustment. If it is worse, it is a negative adjustment.

The idea is rating should reflect player “current ability” and we use his most recent game to validate it. If he performs exactly at what he is expected, there would be not adjustment. If his performance is better than his rating predicts, his rating get increased. If it is lower, it decreases. So the rating should eventually converge on player's “ability”.

Ping Hu

For player that continue to play, their most recent game results are used to adjust their rating so older game automatically getting less and less weight. However there is a formula that produce different values for high rated players comparing with low rated players. It allows high rated player's rating changes less than low rated.

As Robin pointed out, part of rating is error/uncertainly. One good game or bad game would result an rating change but the player's ability did not change much. The next game it will change back. This variance is “error” and could be controlled by K factor.

Ping Hu

In EBU scheme, historical game will have a weight automatically decayed with time. In my system and chess rating system, a player's rating keeps what it is if he stops playing. So if you stop playing for a year, your rating will stay the same.

Once a player (in my system it is the pair) has an established rating, new game has less weight especially for higher rated players. So it tends to be stable. Only when players consistently performs above/below their expected level, their rating will change in one direction. If you are interested I could e-mail you my document with details. The degree how much rating changes from most recent game is determined by a K factor. I think I mentioned US chess federation adjusted their formula two years ago. They have had this rating system for 70 years but still changing the parameters. So even we had a rating system in place we could still fine tuning some parameters.

Ping Hu

Ping Hu

Ping Hu

EBU is explicitly rating historical games with different rate. Other rating methods may only count a certain most recent games.

When I said all games are counted equally, I meant the games from different events (either NABC or club game) were treated the same when they were calculated. A board played by 3 tables and 100 tables will have a different weight. Because the calculation is done by comparing score against all other tables. So 3 tables only results in 2 comparison but 100 tables has 99. So the later automatically carries more weight. This just comes from the fact that how many times a board is played, no matter it is a national or club event. So a world wide game using same duplicate board could carry most weight. However I also have a normalization formula to set a limit on how much weigh a single board could have so it does not skew the result.

Rating should have some predictive value. In chess rating if a player is rated 400 points higher than opponent, he/she is expected to win 90% games (if I remember correctly). In my system, I chose 400 point different to give higher rated pair handicap of 1 IMP per board, or 10% in match point (55% vs 45%). These are parameters used in the calculation and could be adjusted during study period. A valid study would be to rate a universe of pairs and see how their rating change over time. The assumption is a pair's playing ability does not change over time, so if they are rated correctly the rating will never change. In practice, this may not be true because some pair improve over time.

In term of performance variance, they should be averaged out with statistics. This is one of the reason I calculate the result at per board level. If you only have one score per game, you only have one data point like 55%. When I calculate at board level, it has 20-30 data points. Even so there will be some variations. A new pair with provisional rating is allowed to vary their rating by large amount. This is determined by a K factor. Once they have established rating this factor will reduce. It will also reduce for higher rated pairs. The assumption is strong/establish pair are more stable and their ability changes slowly. So one bad/good game is more likely a statistically anomaly than real change in ability.