Join Bridge Winners
All comments by Piotr Lopusiewicz
1 2 3 4 5 6 7 8 9 10 11 12
You are ignoring the author of this comment. Click to temporarily show the comment.
I was measuring things in IMPs (in comparison to double dummy) instead of tricks as all imported vugraph hands were played for IMPs and that seemed more relevant to me. Standard deviation was 3.5imps per hands so that would be 78imps per 500 hands. You've got 100 tricks per 500 hands. That might be around 1 standard deviation, maybe a bit more so quite possible to get that.
Feb. 23
Piotr Lopusiewicz edited this comment Feb. 23
You are ignoring the author of this comment. Click to temporarily show the comment.
It's likely sample size issues. I mentioned above the comparison done on much bigger sample (all vugraph hands of chosen top players) and Richard Pavlicek done similar work before me as well:
http://www.rpbridge.net/8j45.htm
Feb. 23
You are ignoring the author of this comment. Click to temporarily show the comment.
It's the defenders who have +0.5imp per hand not the declarer.
There is a lot of uncertainty in declarer play as you don't know much about defenders shape and key card location while they do know a lot more about declarer's hand and they can convey crucial information as well. The defenders, assuming decent signaling and level of play, usually play double dummy starting from trick 3 or 4 while the declarer is very often kept in the dark to the very end.
Feb. 23
You are ignoring the author of this comment. Click to temporarily show the comment.
I've run it in the past sadly don't have the scripts and files anymore so you will have to trust me on it.
There is no declarer's advantage in bridge at decent level. The results are very similar to double dummy play with the exception of 1NT contract where declarers do a bit better. I don't remember how grand slams fared. After the first lead defenders have an advantage . You may be interested in some statistics I published long time ago:
https://www.bridgebase.com/forums/topic/44448-who-is-the-best-declarer-in-the-world/

while they don't answer your question directly they show defender's advantage after the first lead is made of about +0.5 imps per hand in comparison to double dummy (this post: https://www.bridgebase.com/forums/topic/44448-who-is-the-best-declarer-in-the-world/page__view__findpost__p__530987)

Btw “declarer's advantage” concept makes 0 sense to me. Defenders know more about the hand from the bidding and can signal as well. It's only natural they have an advantage in card play as they have more information.
Feb. 23
Piotr Lopusiewicz edited this comment Feb. 23
You are ignoring the author of this comment. Click to temporarily show the comment.
They are not forcing but xxxx xx xx KQTxx is too weak a hand for that. Normal range is good 7 up to 11pc.
Nov. 6, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
I don't really know why they have 12-15 on the cc for 1C etc. It's incorrect. I made the analysis for the Polish forum before but I can repeat it here if there is need. The analysis is based on what actually happened at the table (on vugraph) instead of what they claim in the cc.
Nov. 4, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>If the PBU keeps the title, they speak for all of Poland. If the Polish players don't replace the PBU with people that will do the right thing, then all of Polish players will have supported the actions of the PBU.

Oh, so then every American is responsible for invading Iraq. Good to know.
Nov. 2, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
Very small nitpick but it's important to get ranges right (and in this case even more incriminating in some cases).

1C = 12-14BAL, 15+nat or 18+any (not 12-15 and not 19+)

This is important because there was a discussion about 1C - 1S - 2C - 3C hand which is different in Polish Club than in Precision (1C - 1S - 2C is weaker than strong club opening).

They play classical 15-17 1NT with some 14s mixed in. With 15PC and natural clubs they open 1C. In fact even 14PC is borderline between 1C and 2C for them with 6+ clubs.
Nov. 2, 2015
Piotr Lopusiewicz edited this comment Nov. 2, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>Suppose I divide my data into two parts (“training set” and “test set”). I use the “training set” to form a hypothetical pattern which I will search for in the test set. Since I didn't examine the test set yet, from the point of view of the null hypothesis that the data is randomly generated I can imaging the test set was generated after I chose the hypothesis

Yes but you are not going to convince anyone because there is 0 way to prove you didn't look at your “test set” before. You might have also heard some clues from people who have seen it.

>>If their attempts are genuinely independent

That would work but:
1)there is no way to assure that because anyone interested in bridge at this point already has heard some rumors - some of those rumors coming from your test set
2)you are losing some evidence you wasted for the “training set”
3)once you used your training set for testing one hypothesis you can't use it anymore for another (because according to your methodology you won't be able to “derive” hypothesis from the training set anymore

>>Even if the data are 100 % true, they do not contain the objective truth concerning cheating. It's all about interpretations of the data and there can never be truly 100% certainty about their correctness. That's statistics.

Of course.

>>How do you include untested hypotheses in your calculations? You don't know the outcome of them

I described it. I try to estimate order of magnitude of number of hypotheses which would be convincing. This way I know if finding a hypothesis of given kind is convincing enough (if that set is huge then finding just one isn't convincing even if that happens to be tested first).
I don't need to know anything about untested hypotheses. All I need to know how many of them would be accepted as evidence (more or less) this way I can know how big of a coincidence it is that one of them checks out. You may think about it as odds of one of “those” things happening by pure chance.

When you find an interesting pattern you need to know how many other patterns like that are possible to estimate how unlikely your finding is.


>>Of course two different observers can reach different conclusions if they work with different sets of hypotheses. There is no objective truth regarding conclusions in the data. That's statistics.

The problem is that same observers could reach different conclusions just using different orders of testing as one of them would stop testing after finding one that matches.
What I am describing is equivalent to testing all of them which is a better way because it avoids unnecessary luck involved when ordering the hypotheses to test.

>>If the 7th hypothesis gives a match, there is usually no need to look further.

This is the element of luck I am mentioning. It doesn't matter if it's 7th or 7 billionth one.

>> Even if the predictions are satisfactory

There is nothing to predict anymore. Things are already done, data is produced, there is no way to get more.
The job is to assess how likely are suspicious things to happen at random and then use that to adjust our apriori belief about those players being guilty - that's statistics.

>>As have been mentioned by others, the best test of a hypothesis is it's predictive capability on new data.

Yes but it doesn't apply here as we won't get any new data anymore.

>>But it is not a question of 10 billion hypotheses, not even 1000 or 100. Maybe 2 or 3.

I would invite you to read my post again. It seems we agree about everything but you somehow try to frame it in a classroom method which has obvious shortcomings here. Of course we want to test as many hypotheses as possible and how many we test shouldn't influence our own view of the matter.
Do you think it matters that we struck gold at our 7th try instead of our 700 billion'th one just because we were fooling around with silly hypotheses for a day?

It doesn't. What matter is how many plausible signals there are. Would it be evidence if Balicki shows 4spades? 3diamonds? Ace in a higher suit? Ace in a lower suit? lavinthal? It would be of course. Now you've found one of those things and you say it's say 1 in 50000 for it to happen by chance. Here comes the catch: there are 1000 similar things which might have happen and you've just found one - not very convincing. Find 16 hadns instead of 12 though and the odds are now 1 to 11000000 (or something) but there are still only 1000 hypotheses - that's convincing.

Try to think about what mechanism makes data convinving. It's not about strucking gold fast or testing 2 or 3 hypothesis only (maybe we were drunk and tested silly ones wasting our testing set, shouldn't we be allowed to make correct conclusion day after even though we know the set already?). It will become quite clear.
Oct. 15, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>Piotr, you appear to have strong opinions on this subject.

Yes because I am tilted by misapplying methods designed for a case where experiment can be repeated to the problem at hand.

>> Sadly, what you are posting is nonsense.

I've posted several arguments as to why proposed method leads to absurds. I invite you to the same with mine or maybe just point out one part I posted which is nosensical.

>>Dividing datasets into test sets, training sets, and validation sets is fundamental to statistics and machine learning

Yes but posters above misapply it. I know it's taught in some courses, I know you are used to it but it is wrong, just see how it leads to absurd conclusions.
You will lose if you argue using that because there is no way to prove you didn't look at the data and it doesn't matter anyway. You can look as much as you want and test w/e you want, the reality won't change because of it.

>>Moreover, other than you, I don't recall anyone claiming that you need to throw away a data set if you accidentally look at the whole thing.

Saying that it matters how many hypotheses you tested or how you came up with them is the same thing.

Kit says he left some matches on purpose to verify.
Many posters above claimed it matters how many hypotheses we tested. Just read those posts.

Instead of patronizing me with your usage of R this morning (as you can guess I work with probability on daily basis as well) try to make one argument why what I wrote is nonsense. Feel free to defend “it matters how many hypotheses we tested” thing as well, here is why it's incorrect:

>>Imagine you arguing that in a court of law: “Judge, we tested 7 hypotheses and 7th checked out - PROOF” and then the opposite side says: “We tested 10 billion hypotheses and 10 billion'th one was the one that checked out - NOT CONVINCING ENOUGH”. They can even produce 10 billion hypotheses they've tested (indluding location of 5 of diamonds and 8 of clubs etc.). How are you going to convince anyone that your way is better? Are you going to argue that you really, honestly, tried just 7 things and therefore your conclusion is more valid than the one made by a side which tested 10 billion things?

What you would do to win that argument would be to discredit most of those 10 billion hypotheses tested as not being plausible ways of cheating. For example finding Balicki signalling a location of a card triplet of 4c 5d 8h when vulnerable or location of 7d when not vulnerable wouldn't be convincing even if that checks out over 200 boards. Why wouldn't it be? Because there are billions of ways to create such hypotheses and they are useless when cheating at bridge. This is the real reason they wouldn't be accepted as proof - not that you haven't tested them.

Following your way would also lead to absurd situations: say you've tested your 7 hypotheses and one checked out - you are convinced. Now you are afraid to test any more because with every following one you will become less convinced! This is of course nonsense. You looking or not looking after you already used something you considered to be a valid method is meaningless to you conclusion but if we were to apply that “classical way” we would run into that paradox.



Oct. 15, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>No! That is never the right approach. You must use your data to develop a simple, working, test hypothesis, and check that against new data. Kit is trying to do just that.

I consider it incorrect although I grant you that is not clear and there is a debate about it.
What part of data you use to develop a hypothesis and what you test it on doesn't change the data nor the reality. It's a voodoo thinking. This kind of thinking means that if you by accident looked at the whole dataset you are no longer able to use that as evidence which is of course nonsense.
Your way works if there is a way to repeat the experiment - here there isn't.

>>Plausible hypothesis dont matter, its the number of hypothesis that are tested that matter

This is incorrect. That would mean that if you somehow tested 1 billion silly hypotheses (which is possible with a computer program) the data would be worthless as evidence. The thing is that you looking or not or you testing or not doesn't influence the data nor the reality.
I realize that's what they teach you on some stats courses but it's wrong.

>>Moreover, we should only consider hypotheses that we have actually tested, not hypotheses we might test.

This introduces unnecessary element of luck. What order we choose to test hypotheses in doesn't influence reality nor the data so it shouldn't influence strength of the evidence.
All that works well if we can repeat the experiment but once the data is there, there is no way to get more and we need to estimate how strong evidence it represents the way to do that is to estimate the number of plausible hypotheses.

If we were to follow the procedure “only hypotheses actually tested matter” then two observers with the same prior knowledge and the same views on bridge/life/probability or even the same exact brain looking at the same evidence could arrive at completely different conclusion depending on what order they choose to test the hypotheses in.
It therefore can't be a correct procedure.

Imagine you arguing that in a court of law: “Judge, we tested 7 hypotheses and 7th checked out - PROOF” and then the opposite side says: “We tested 10 billion hypotheses and 10 billion'th one was the one that checked out - NOT CONVINCING ENOUGH”. They can even produce 10 billion hypotheses they've tested (indluding location of 5 of diamonds and 8 of clubs etc.). How are you going to convince anyone that your way is better? Are you going to argue that you really, honestly, tried just 7 things and therefore your conclusion is more valid than the one made by a side which tested 10 billion things?

What you would do to win that argument would be to discredit most of those 10 billion hypotheses tested as not being plausible ways of cheating. For example finding Balicki signalling a location of a card triplet of 4c 5d 8h when vulnerable or location of 7d when not vulnerable wouldn't be convincing even if that checks out over 200 boards. Why wouldn't it be? Because there are billions of ways to create such hypotheses and they are useless when cheating at bridge. This is the real reason they wouldn't be accepted as proof - not that you haven't tested them.

Following your way would also lead to absurd situations: say you've tested your 7 hypotheses and one checked out - you are convinced. Now you are afraid to test any more because with every following one you will become less convinced! This is of course nonsense. You looking or not looking after you already used something you considered to be a valid method is meaningless to you conclusion but if we were to apply that “classical way” we would run into that paradox.

It seems to me there is a lot of confusion about methods which are valid in an environment when you can repeat an experiment and get more data and actual problem at hand of evaluating how convincing is data we have but can't get any more of.
I hope the thought experiment above sheds some light on it although I am not optimistic. You will lose in court if you go “we derived the hypothesis using 5 matches and verified on 5 more matches” way of thinking because there is no way to verify you actually did that nor should that matter anyway.
Oct. 15, 2015
Piotr Lopusiewicz edited this comment Oct. 15, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
I am probably more conservative by nature, I think 1000 is a reasonable number. That doesn't matter at all anyway if there aren't any false positives it's just too much anyway.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>In the beginning no one thought Boye had F-S and that he was being a drama queen. Of course, that was not the case. Then his team got F-N. This was so successful that he got the Germans to confess, which saved hundreds of man hours for Boye, Ish, Brad etc. There is no reason to doubt that he has the goods on B-Z and is still just trying to change the process and repair bridge.

Justin, the only difference is that in case of FN and FS I (and the public) was presented with the evidence and was able to verify it myself. In case of BZ we are still getting there. For example the latest 5 fingers stuff is more than enough for me if no false positives are found and claimed number of hits (more than 17 I believe) is registered.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>The number of possible hypotheses may be infinite.

Yes and here comes the judgement call.
I think the most accurate way is to think about a set of “reasonable hypotheses” or the ones we would believe to be plausible, for example no one would take “look, five fingers mean either 8 of speades or 5 of diamond or 2 aces!” seriosuly. The way to go about it is to estimate number of plausible hypothesis and then compare that number to the odds we get. If we can get about 16 or 17 hits on say 200 hand sample those odds would be overwhelming to any reasonable person (I believe!). They aren't that overwhhelming with 12 hits (as it's reasonable to assume there are about 1000 reasonable hypotheses and the odds being 1 to 50000 makes uncomfortably too close).

Anyway, anyone with the current count? Are there any false positives confirmed? If not it looks like game over as I counted more than 17 hands confirmed already.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>However, I think Kit's formula is only slightly wrong, while I carelessly overlooked a significant error in your approach. Namely you assume even in your null hypothesis that the observed number of signals is the number that will be given, not a valid assumption.

I don't assume anything. I just want to know what are the chances of getting 12 signals produced randomly hit the hands with 5 carders so I can see how much of a coincidence that is.
Other than that I +1 Kurt Haggblom's post above and I encourage you to read it.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
Justin I am sorry but you are just wrong here.
The standard most BW users are happy to accept as “statistical proof” is ridiculous and most arguments expressed here are about that.

I want BZ banned as much as the next guy and I am fine with “all those shenanigans at the table are enough to ban them without without further ado” standard. Let's just not pretend something like “wow 1 in 17000 shots happen that MUST BE A PROOF!” is in any way more than worthwhile than tapping yourself on the back while claiming B is a jerk and should be banned.

It's frightening how many ignorant views about probability are acclaimed here. It would be easy to convict completely innocent people using that kind of arguments.

I am personally quite convinced BZ are cheaters I just refuse to accept incorrect arguments based on statistics done in lazy manner.

Also Kit didn't say it's a proof in his view. He just said it looks promising. I agree. I think the placement of bidding cards looked even more promising and I am willing to accept those combined is enough to ban them. I don't think it's “beyond reasonable doubt” but it's good enough.

The main reason me and Tomasz are “defending them” or more accurately demanding higher standard of proof is that we need that for our Polish scene to be cleaned up. As long as it leaves any doubt there will always be defenders and nothing will change on our local scene.

It was completley utterly different with FN and FS. The evidence enough to be 99.999% sure or w/e tha was that they are cheaters without any previous exposure to their behavior. It's not that clear here.

I am tilted by your and similar comments because I am the last person in the world to defend Balicki. I think it's ridiculous that his behavior was allowed for so long I called him out on it during major tournament in Poland and was ostracized by TD and his friend for that. I really want him exposed, banned and preferably sued to oblivion but for that there need to be beyond resonable standard evidence - and ther isn't yet.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>>If you want these comments to mean anything concrete on the issue at hand, you have to be more specific. As in what affects the 1 to 17,000 number and by how much.

It's pretty simple: there are more hypotheses than 5 card suit ones. If you try 1000 of them (and I believe coming up with 1000 basic hypotheses about what info to convey isn't that difficult, in fact probably more than 1000 were tested already) then the odds are smaller. Looking at one of your unlikely shots getting there doesn't mean it's very unexpected occurrence if you would accept other shots of this kind (for example if you would be equally convinced by a full palm touching a table showing an odd distribution or a shortness).

Now there are ways to quantify it one is described by Jeannie and one is described by me. Both are based on the same principle it's just a debate about which should be applied but they attempt to measure the same thing.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
It's not equivalent.
If there were 44 hands with 5 card suit out of 100 (just an example we need to know the actual number) then the chances of hitting 12 is equal to number of ways to pick 12 from those 44 divided by all ways to hit 12 out of 100. This is a different number than calculating chances of hitting 12 in a row.

In the post above I described how Kit's method fails at edge cases (which is enough to show it's not correct).

>>A player does not need to cheat on every hand where he has that type of hand, in order to be cheating.

Of course. Notice that in those examples my method produces even smaller numbers than mentioned 1 to 17000.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
>> As Michal says: you have to count all of the hypotheses which have been considered up to that point. In practical terms, this means that you need to use a lot more examples to test the final hypothesis before you can have any confidence that it will generalise to unseen examples.

I think you should count all plausible hypothesis of the kind we are considering. You looking or not looking at the data doesn't influence it nor does it influence a reality. It also doesn't matter how quickly you come up with testing “5 card suit” instead of say “3 odd suits, 1 even one”.
I realize this is a bit of a philosophical discussion but something irks me about degrees of freedom or leaving boards to test the prediction approach. I find it very superficial.

The conclusion we arrive at though is clear: you need more data as 1 to 17000 (or w/e it is) isn't really convincing once you realize there could be 1000 or so similar hypothesis to test.
Oct. 13, 2015
You are ignoring the author of this comment. Click to temporarily show the comment.
Imagine that by some weird conincidence out of 100 boards we reviewed all of them contained 5 card suit in Balicki's hand.
You follow your reasoning, conclude that it's 1 to 17000 for that to happen and walk away convinced while in fact there was not a shred of evidence as that had 100% probability of happening by pure chance.

Imagine now that among 100 boards we reviewed there are exactly 12 with 5 carder in B's hand and the signal matched all of them. Probability of that happening is quite low (1 and 15 zeros to 1).
From that follows that a correct method needs to include number of hands B's had 5 card suit.

What we want to calculate is this: “how likely it is that 12 gestures occurring randomly during 100 hands match with hands with 5 card suit in it”.
I hope this is clear.
Oct. 13, 2015
Piotr Lopusiewicz edited this comment Oct. 13, 2015
1 2 3 4 5 6 7 8 9 10 11 12
.

Bottom Home Top