My nitpick would be that I don't think you need the spectators. Shireen and Andy make sense as the hosts, but the others mostly seem to just sit there in silence, and it's a bit weird. Or they forget to mute their microphones, and become a distraction. If they have any questions, they can type them into the chat.

But I think the format is great overall - bridge has needed this sort of thing for years! Thanks to Shireen and everyone for setting it up.
April 2
The EBU grades individuals, whereas Richard's looking at partnerships. I can believe that people mix between clubs a whole lot more than partnerships do.

Worth pointing out that diffusion isn't a binary “this will work / this won't work” situation. If the diffusion in the ACBL is worse than in the EBU then that just means it'll take longer. And it'll still get better over time. And, in a way, if it doesn't improve then that's proof that it doesn't matter! It's only an issue if people come to the club from outside and are met with unrealistic local grades, but that's the very thing that's not happening.
Feb. 18
Are there really only 19 games per week in Chicago?
Feb. 17
That's at the tournament level. We don't grade individual boards.

We keep this data for all sessions played, so that we can display a nice green plus or a nasty red minus next to a player's results to show whether they did better or worse than expected. I simply averaged it all.
Feb. 14
Mean of (actual score - predicted score) = -0.003%
Standard deviation = 6.45%
Mean of abs(actual score - predicted score) = 5.13%

This is since the start of 2019, so entirely a test set. Just under 2 million data points. Mature grades only.

I'm pretty sure the mean will always be close to zero, though, no matter how bad the algorithm is. So it's the variance that's more interesting and it seems NGS is about half a percent worse in this respect than Ping's system. Probably not enough for anyone to ever notice, but certainly not nothing.
Feb. 14
I do think you're more interested in the “cool stuff”, Richard, than actually getting a system off the ground. And there's nothing wrong with that - some of the tech you've been talking about is fascinating - but people should take your advice with a pinch of salt.

The NGS is not perfect, but it does have the advantage of actually existing.
Feb. 11
They get used for seeding, stratification and handicapping to some degree but really the main use is that the members like them and find them interesting.
Feb. 11
What you're talking about is something we call “diffusion”. If you have an isolated pocket of players, say a club whose members never play elsewhere and who don't welcome guests, then the grading scheme will assume they have the same average strength as the rest of the population. Without people to compare with, you don't know how good they are.

This isn't a problem in England - no such clubs exist, although it took some clubs longer than others to get there in the early days. It may well be more of a problem in the USA, though. Or it might take longer to become a non-problem.

We did some things in the early days to help diffusion along. Like giving increased weight to events where populations are likely to mix - national and county events.

We also had the luxury of two years' head start on the NGS. We started capturing session data in 2010, but the NGS itself wasn't ready until 2012. This meant we could launch it with everyone's grades fully formed (well, for the regular players anyway).
Feb. 11
I looked into this once, and you're more likely to go 5 off in 3NT than to make it opposite a passed hand.
Feb. 7
Some information here: https://www.brianbridge.net/cafe/

But basically, you play a few boards in a cafe with a cup of tea and a bun. Then you have a nice walk to the next venue and repeat.
Feb. 1
One thing worth flagging up is how the top four pairs all contained players who are… I hesitate to use the word “young”, but certainly young in a bridge context - 40ish or below. And in stage two of the trials it was the top five pairs.

Could be the start of a new era in English bridge.
Jan. 8
They went up a small amount - less than half a percent on average.

No reason not to think that's because they've improved in the last two years, though. After all, the ones who didn't improve might be less likely to make it this far, so there's some selection bias.
Jan. 7
This was the strongest event the EBU has run since the NGS started, with an average field strength of 68.06. The previous strongest was the same trial two years ago, with a weak-sauce 67.14.
Jan. 7
“Perhaps the L&E should consider apologising for having a rule that only benefits Secretary Birds as well.”

Are you for real? You just admitted that there was no Secretary Bird and you'd made most of the story up!
Dec. 6, 2019
Not a good argument.
Nov. 25, 2019
Full details of the NGS are here:

https://www.ebu.co.uk/documents/miscellaneous/ngs/full-guide.pdf
Nov. 22, 2019
Steve: NGS grades are the percentage you would be expected to score playing with an identical partner in an average field. So a difference of 0.6 isn't very much.

If you look at only pairs who differ by 5+ (23% of the total) then the average distance of the partnership grade from the players' grades goes up, but only to 0.74. If you look at those who differ by 10+ (4%) it's 0.78. So still very close.

It seems to work out pretty neatly. The most disparate partnership in the EBU is someone with a 60 grade playing with someone with a 28 grade. Their partnership grade is 45, which is pretty close to the exact midpoint.
Nov. 21, 2019
For what it's worth, the NGS partnership grades are extremely close to the average of the two individuals. The mean difference is around 0.6, and something like 96% of them are separated by less than 2.
Nov. 20, 2019
I don't know if I'll change your mind, and I can't speak for how things would work in other markets, but the experience of the EBU has been positive. The NGS is very popular here, and there certainly hasn't been any big drop-off of players since it launched in 2012. The NGS ranking page is the third most visited page on our website (behind the home page and My EBU) and we still get loads of queries from people eagerly investigating their grade history.

Obviously not everybody loves it, and there was one person on the other thread who said he had quit playing because of it. That's a shame, of course, but thankfully he seems to be an outlier.

I'm curious, though, why your arguments against a bridge rating system don't apply equally to chess? Why are chess players happy to find out how bad they are, but you want to protect bridge players from this reality?
Oct. 25, 2019
“I can not help but believe that it's possible to improve over these sorts of naive algorithms.”

I'm sure you're right, Richard, and I certainly defer to you in the realm of data science, but do you really think it could improve by enough that anyone would notice? We hate to admit it, but bridge is a pretty random game.

The average NGS prediction compared to score (which we call AbovePar) is pretty close to zero, so by that measure we're very accurate. But the standard deviation of AbovePar is about 6.5, which is huge, and means our ability to predict your score on a given night is kinda poor. If we say a pair will score 50% it's definitely not weird to see them scoring both 40% and 60% in a fairly short period of time.

Can you do much better than that? Maybe if you can say things like “there are a lot of N/S slam hands in this set, so this pair will do worse than normal” or “this pair always choke against pair X, but this time they're skipping them so they'll do a bit better”, but beyond that level of detail I don't think so. I think that's just the nature of bridge.

“Our bots have hacked into your Facebook profile and noticed you've just been dumped; according to our analysis, that's worth half a top.” :-)

I see the appeal of running your multi-year evaluations, hiring big data analysts and machine learning experts and buying up a bunch of cloud processing time to crunch the numbers, but I'm sceptical that you'll end up with something that the average member will be able to distinguish from the boring old NGS. Which actually exists. I'd be fascinated to be proven wrong, though!
Oct. 16, 2019
