Join Bridge Winners
All comments by Richard Willey
1 2 3 4 5 6 7 8 9 ... 192 193 194 195
You are ignoring the author of this comment. Click to temporarily show the comment.
We could do this, but

1. Designing and building one off hardware is expensive
2. Shipping these tables all over the place is even more so (look what the WBF spend shipping screens around)

If you want to go down this path, having people play on tablets is the way to go, especially since you can physically players as well…
12 hours ago
You are ignoring the author of this comment. Click to temporarily show the comment.
If you are in the world of non parametric modelling, you let the Neural Network or the Dynamic Bayesian Network or whatever figure out what is what.

If you are in a world of parametric modeling, you need to specify how you want to go and treat this.

In either case, this will almost certainly include a mixture of

1. Assigning a provisional rating to the unknown pair (or the pair that contains an unknown player)

2. Decreasing the level of certainty that you have regarding the provisional pairs rating which means that

A. The ratings change of the unknown pair will be more dynamic
B. The ratings change of the “known” pair should be less dynamic
13 hours ago
You are ignoring the author of this comment. Click to temporarily show the comment.
There are all sorts of companies that provide blacklisting as a service.

The company that I work for (Akamai) has several products in this space.
Oct. 16
You are ignoring the author of this comment. Click to temporarily show the comment.
I agree that there is an enormous amount of variance in bridge.

I also agree that simpilier models are to be prefered to complex ones. If it turns out that the NGS system / Power Ratings / whatever is able to provide comparable accuracy to a more sophisticated approach then I am all for using the simpilier model.

However, if we are going to pretend that we care about an accurate ratings system then I think that we should also evaluate the accuracy of the various options.
Oct. 16
You are ignoring the author of this comment. Click to temporarily show the comment.
Once again, before trialing either of these proposals I think that it makes sense to evaluate their accuracy compared to alternative methods.

There have been incredible advances in data science over the past 20 years. I can not help but believe that it's possible to improve over these sorts of naive algorithms.
Oct. 16
You are ignoring the author of this comment. Click to temporarily show the comment.
@marty

First: Some, but certainly not all ACBL Online games use robots. As I mentioned earlier, you could in theory develop three different models

1. Robot games only
2. Human games only
3. Both game types combined

I find type 2 most interesting

Second: The Common Game match points across a large field, however, players only play boards against other pairs in their same club. I worry that this lack of mixing would result in a model that worked great for the Common game but might encounter real problems if someone from club A was suddenly competing playing boards against a different set of opposing pairs
Oct. 14
Richard Willey edited this comment Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
I've never really taken a close look at The Common Game

If they provide scores on a per board basis I don't why it couldn't be used, however, I do have some concerns that you have a large number of separate pools of players and that it might be difficult to construct accurate global ratings. So, you might be able to develop very accurate ratings of how well player foo does at their normal club and have no idea how well player foo might do playing in some different club.

In contrast, the online games would seem to have one very large pool of players with much more mixture between players.
Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
I am suggesting that we objectively measure Power ratings, the EBU National Grading System, whatever and then be able to make an informed decision about which of these systems is worth building upon.

Why don't I trust any of the existing systems?

Well, the big reason is that none of them seems cognizant of the fact that the world has changed in the last 20 years and approaches that might have looked really good a couple decades back seem laughable…

When it comes to data science, “Hey look what we came up with 15 years ago” really isn't much of a recommendation
Oct. 14
Richard Willey edited this comment Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
Couple quick thoughts here

1. I think that some folks on this thread are gravely mistaken about the degree to which working on a ratings system would preempt other work. It's not as if the folks that you'd want doing the heavy lifting here are the same ones that you need working on propping up sagging clubs or doing marketing

2. I think that the ACBL could get a significant “bang” for a fairly limited investment.

Here's how I'd proceed

A. Partner with BBO to create a data set that folks could use to validate different approaches for generating a ratings system. From my my perspective, you'd want BBO to do two things

First: Provide you with, say, one year's worth of data from the ACBL's online pair games


Second: Anonymize this same dataset (Change the names of various players to some random string). Its probably not absolutely necessary, but this doesn't cost anything and it might preempt a few complaints

Third: Publish the data from the said events and allow anyone who wants to download them

Finally: Announce that you'll be running a contest in, say sixth months time to evaluate the accuracy of whatever ratings system that people have developed.

The first year's worth of data will be your training set.
The next six month's worth of data will be your validation set
Then use the next month worth of tournament results as your test set.

If you want to generate some eternal interest, post the contest on Kaggle or some such and put up a prize… Say, $15K or so.
Oct. 14
Richard Willey edited this comment Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
> Masterpoints was a brilliant invention back in
> the day and while it is far from perfect in assessing
> a players true ability, nothing proposed comes close.

I think that your claim is ridiculous.

1. Masterpoints accumulate over time. Player's skill levels top out and eventually decline
2. Masterpoint allocations are wildly inconsistent over time, geography, and venue

I think that it's hard to think of a worse way of describing performance
(As a way to convince people to tithe to Memphis… Probably quite a bit better)

> It seems to me that those proposing the rating
> system just want some way to show how good
> they are without having to earn oodles of MP's.

I certainly don't. Rather, I find the problem itself interesting…
Oct. 14
Richard Willey edited this comment Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
Hmmm

I have absolutely no idea, but I am going to guess that it is someone talking about Susan Rice.
Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
If you developed a system that could be used for F2F play, it would be trivial to extend this to online play. (The hard parts with all of this are the issues surrounding inertia, politics, and data collection/cleaning).

Arguably, one might want to have a separate rating for F2F play and online play. Once again, this is easily done. Hell, might as well have a third rating that includes both…
Oct. 14
You are ignoring the author of this comment. Click to temporarily show the comment.
@marty

I agree that bridge results are inherently noisy. (A year or so ago I posted an analysis that applied a bootstrap to the results from a recent championship which shows the degree of variance in these results). Where I think that we disagree is whether this excuses using methods that (potentially) add even more noise into the system.

Moreover, while I agree that parametric modeling techniques are the “gold standard”, these are really best suited for situations in which you have some kind of physical law to use when you're specifying your model. In this case, no one can provide a reasonable explanation why a two year window is better than an eighteen month window or why linear decay is better than a cliff delay or why you like this magic ration of platinum to gold to silver points… Sure, you COULD play around and test a whole bunch different potential models and see which one seems to be working best, however, at this point in time you're pretty much back to non parametric modeling. You're just doing so in an extremely slow and inefficient manner.
Oct. 13
You are ignoring the author of this comment. Click to temporarily show the comment.
I've seen all sorts of comments / complaints about the way that seeding is currently done… It happens all the time.

I have also heard that that the same set of pros are unwilling to invest in the basic type of record keeping that would e necessary to implement such a system. (For example, recording the scores of individual rounds from team games rather than simply noting the aggregate score)
Oct. 12
You are ignoring the author of this comment. Click to temporarily show the comment.
I return to my original comment in this thread:

Folks need to decide what the purpose of a rating system is.

1. If you want a rating system that is supposed to be an accurate measure of performance, then you need some way to evaluate its accuracy. And it seems self evident you want to be predicting how well people play

2. If you want a rating system to make people feel good about themselves or convince them to spend more money in order to become a platinum life master or go “Clear” or become “Operating Thetan” or … well, then you're welcome to make up whatever you want
Oct. 12
You are ignoring the author of this comment. Click to temporarily show the comment.
Sorry to have ignored these questions for a bit… Flu shot hit me kinda hard.

1. With respect to your question about black box algorithms… I don't know what the right answer is. I certainly appreciate that end users are going to want to understand how ratings are generated and may very well be skeptical of a black box algorithms. With this said and done, in my experience even simple linear models are far too complicated for the average end user to understand. So, if we're screwed either way, might as well going down with style.

2. With respect to your question about accuracy:

The approach that I suggest is intended to generate a model that can be used for prediction.

If some set of pairs who are included in the model were to play a session against one another tomorrow, I want to be able to generate as accurate as possible a set of MP score for those worthies.

As an outgrowth of this effort, it's also possible to create a “rating”. This rating would reflect the score that a given pair would expect to achieve against a random field.
Oct. 12
You are ignoring the author of this comment. Click to temporarily show the comment.
Hi Michael,

I took a look at the pages that you are referencing. I'm used to seeing much more explicit testing than what is provided here.

The only thing that is being presented is a set of tables showing that the PR ratings more accurate in ranking the results of different types of games than Match Point totals. While this is all fine and dandy, it doesn't seem like a particularly hard bar to meet.

From my own perspective, I would much rather see

1. A system that was able to generate predictions about what percentage a game people would have rather than a ordering of results

2. This information being used to parameterize the model. I don't consider these issues to be minor details. Getting these type of questions correct is critical to producing an accurate model. (And the difficulty in specifying them all is why I suspect that a non parametric modeling technique is the right way to go)
Oct. 11
You are ignoring the author of this comment. Click to temporarily show the comment.
I am shocked! Shocked I say to hear allegations that electronic signalling devices are being used to cheat in card games.
Oct. 11
You are ignoring the author of this comment. Click to temporarily show the comment.
These are all reasons why it is better to start by rating pairs rather than partnerships
Oct. 10
You are ignoring the author of this comment. Click to temporarily show the comment.
FWIW, I spent a bit of time thinking about how a bridge rating system might be implemented in this day and age. Here's the approach that I would want to experiment with.

Note that this presumes that

1. The primary goal is to create an accurate ratings system
2. End users are willing to accept a black box model

My natural inclination is to treat this as a hidden variable model

1. We have a whole bunch of categorical data: The names of different pairs
2. We have a whole bunch of board results: What were the results when pair foo1 played pair foo2 at such and such a time

Can we use this to generate a good predictive model? (What result do we expect if pair foo1 played pair foo3 and how accurate were these results?)

This problem is compounded by the fact that some boards are naturally flat while others have a lot of opportunity for swings so we'll want to include the variance of the board results as an input as well as just the player ID. (If we wanted to be really sophisticated we could even include data about the hands themselves. Is this hand suitable for a weak NT opening? Is this hand suitable for a strong club opening….)

I suspect that the best option is to use some of modern heavy duty machine learning algorithms. (A recurrent neural network or some kind of dynamic Bayesian network both could be reasonable choices)

We can train the network on some subset of the data then evaluate the accuracy of its predictions on the remainder…

Presuming that we are able to create a good predictive model, we can then use this to generate a rating scheme. Hypothetically, I could use the predictive model to generate an enormous number of virtual tournaments.

Select 54 pairs at random
Generate a set of boards (actually the variance for a set of boards)
Have the pairs play virtual rounds against one another

Repeat this a very large number of times, calculate the the player's average score across this set of events, and use this as their rating….
Oct. 10
Richard Willey edited this comment Oct. 10
1 2 3 4 5 6 7 8 9 ... 192 193 194 195
.

Bottom Home Top