Join Bridge Winners
Statistical Models for Testing Illegal Communication
(Page of 10)

I have been involved in the investigation of cheating for over two years. I think it is important to comment about the use of serious statistical arguments in the prosecution of alleged wrongdoers. I am not a lawyer. I respect all the people in the EBL and the rest of the bridge community that have spent time on these cases. The vast majority (if not all) of the time was given pro bono. Also, there have been no precedents for cases like this. There were limits on my own time to this case.

Up to now I have felt constrained to speak too much because of pending cases. I feel it is fine for me to talk now although I will try to restrict my conversation to facts that are either now in the public record or originally came from me. Although at times I will hint at “what should have been done”, this is not intended to be critical of anyone. The idea is to learn for the future.

The opinions in this article are my own. I have not been in contact with any representative of the European Bridge League since the release of the CAS ruling.

I think it is important because the majority of people do not know enough about the investigation and hearings of the cases in order to make informed judgments. There may be cases in the future that involve statistics in one way or the other, so it is important to know how to handle these cases. I am going to emphasize the role of serious statisticians in the analysis and prosecution (and defense) of cases.

This is the first of two articles. This articlewill explain the statistical issues in the two major cases (Fantoni-Nunes and Fisher-Schwartz) with some hope of generalizing for future cases. The second will take a closer look at the CAS ruling in the Fantoni-Nunes case with a particular goal of correcting the errors and misunderstandings expressed in the CAS report.

I will end with my major recommendation, but since it is so important I will state it here as well.

  • Statistical experts need to be part of committees investigating and bringing charges about bridge players. It is not sufficient to ask them to make some high-school (or higher level) mathematical calculations of probabilities under certain conditions. They must be involved from the beginning to the end. What they say should strongly affect the actual charges brought against players.

It is clear now in retrospect that few people understand the proper role of statisticians in cases like this. In fairness to them, in most cases, statistical “experts” are used more to confuse people than anything else. What I am talking about is how statisticians should be used.

ABOUT MYSELF

I am a professor at University of Chicago in probability and statistics. I do not have any particular specialization in cheating at games (I do not believe it is necessary) and have never been an “expert” witness for a trial. I did not receive (or want) any payment for advice I gave in the bridge investigations. Although I have played little tournament bridge in the last twenty years, I consider myself a good player (competitive in US Flight A tournaments) but far from world-class (WC). On some issues such as hand evaluation, handling competitive auctions, tricky defenses, knowledge of systems, etc., I realize that I do not have WC abilities and must take that into account when such expertise is needed. I do feel like I know enough to determine when my bridge abilities are insufficient for a statistical analysis and that I need to call on WC players advice.

As late as the summer of 2015, I was unaware of any cheating allegations against any of the players that have been investigated recently. The first time that I learned that Fisher-Schwartz were suspected was the posts of Boye Brogeland in August 2015, and the first that I learned that Fantones-Nunes were suspected was the very revealing post of Kit Woolsey in early September of that year.

It is unlikely that I will ever spend enough time in bridge so that I will be able to compete at the world level. Hence, the eradication of cheating brings no personal gain to me except for my satisfaction as a “fan”. My justifications for spending time on these issues are academic (there have been interesting questions), “outreach”, and as a hobby. I view this article as part of outreach: as a mathematician I am supposed to occasionally describe areas to non-mathematicians. In order to keep it from becoming too technical I will use almost no mathematical formulas. My emphasis is not on the mathematical calculations but on the assumptions upon which these calculations are made. I have chosen to post this as an article on Bridge Winners since this seems to be the best outlet to reach serious bridge players.

In the comment section for this article, I will be happy to answer serious questions (even basic questions for those with limited statistical background) and I welcome serious comments and criticisms from those with strong statistical backgrounds. Of course, any representatives from the defense are welcome to express differing opinions. I would like to focus on the statistical issues and how they can be presented to a lay audience. I do not think there need to be comments about the relationship between bridge and the Olympic movement here; they can be expressed elsewhere.

CONSULTING FOR EBL

I was asked to comment on the Fisher-Schwartz (referred to as F-S) and Fantoni-Nunes (F-N) cases for the EBL in late December 2015. (There were some other cases as well, but I want to focus on these two because the statistical arguments were much more clear cut.)

I made the rookie-teacher mistake after preparing reports in December of assuming that since the recipients seemed happy and did not ask questions, that they understood what I wrote. One of my most important points was that I was not just doing mathematical calculations but was also advising what charges the committee could file that would be appropriately backed up by statistical evidence. I expected that this would affect the statement of the exact charges. I had no legal experience so I did not know how the procedures were going to run. I laid out a simple statistical model that I will describe below to test if signaling is happening. Then in separate reports I made recommendations on how to apply them in the two cases. The model was intended to be as mathematically simple as possible. After making the model, the probability calculations were done.

In constructing the model, two things were taken into consideration.

  • What was common to both cases was the realization that the code may not be completely understood. As I explained from day one, the goal is not to prove that a certain code is exactly what is being used. One is only showing sufficient agreement to the code to be satisfied that there is illegal communication. (If the code is far from correct, then the test is valid but it is not going to detect any illegal signaling).
  • It is not necessary that signaling is done all the time. If there enough occurrences of the action, then one can focus on the meaning of a certain action.

THE BINOMIAL MODEL

I will now present the model that was sometimes referred to as the ''binomial model’’ in the reports. It is mathematically simple allowing for easy calculations. Although it is simple, it is relatively robust. This is a statistical term indicating that if the assumptions are not exactly correct, the calculations are still valid.

  • We assume there are a finite number of actions. This is what the player does that the other player can see. It might be the case that an action is always made (e.g., if a lead is placed in a certain orientation). We also allow the possibility of no action.
  • We may choose to observe only a subset of the collection of actions and consider other actions as “no action” (or more precisely an action we are not considering). For example, in the case of F-N, we can consider the orientation of the leads as an action, classifying them as horizontal, vertical, or in between (diagonal). We can focus only on the leads that are horizontal or vertical. The diagonal leads are an action we are not considering.
  • There also is a conjectured code (which I will call conjecture). If a player chooses to make an action, then it expresses something about his hand. For the moment, we will assume that the conjecture is precise, that is, we can determine from the hand what action the conjecture will indicate provided the player chooses to make an action. (I am not asserting that our conjecture is precisely correct, but only that we can tell with complete assurance what our conjecture would predict for the action if it is made.) This will not always be the case.
  • We then keep track of how many times the actual action of the player matches that predicted by the conjecture when one of the actions we are studying is made.
  • Probability calculations are done based on this, but I will not discuss how this is done at the moment. I want to focus on the assumptions.

This is a general framework under which many things can be tested. I now describe how I recommended it be applied in the relevant cases.

FISHER-SCHWARTZ

  • For F-S and the movement of boards, the actions are the unusual placements of the boards before an opening lead. Boards in which no unusual placement is made (the board is not touched by the partner of the opening leader before the lead) are not considered. The conjecture is the suit that the signaler has a preference for on the lead is indicated by the choice of placement.
  • This is not quite a precise conjecture; however, it is pretty close. If one only considers hands where they choose to signal, the choice of suit wanted is pretty clear. There was ambiguity in one or two hands, and the amount of data was such that the original conjecture was not exactly the same as the actual code, but that did not invalidate the test.
  • Focusing only on the boards where unusual actions were made removed a lot of judgment from the analysis. There is significantly more expert disagreement on a particular hand for the question “should I signal for a diamond if I can?” than asking on a hand with an unusual action, “given that I signaled, which suit did I ask for?”
  • As it turns out, F-S occasionally made the unusual actions when sitting E-W, which allowed these boards to be considered as well.
  • There are other possible codes for F-S that were not mentioned (by me at least) because they did not happen enough for the test to be powerful enough. One is where the action is “touching the bottom of the shirt after the lead” and the conjecture is “singleton”.

FANTONI-NUNES

  • Here I recommended that the actions be the placement of a card horizontally or vertically when leading from a suit with more than one card.
  • There were some diagonal leads and while they might have meaning we were not analyzing them.
  • The number of leads with singletons was insufficient to try to crack the code on these. It would make sense that F-N had a way to show them, but we did not know it. The number of singleton leads that come up is sufficiently small that it is hard to analyze them. This is especially true if one has to crack the code and then check it.
  • The test is checking only for the cards placed horizontally or vertically without a singleton. The issue as to whether there is other signaling is going on is not being addressed.

Before proceeding, it is worth emphasizing that one of the aspects of these analyses is that they need little “expert bridge opinion”. There have been other potential messages, such as “I have a weak hand for the bidding up to this point” that require the player sending the message to make a bridge judgment. This makes statistical analysis much more difficult because it requires collecting expert (in this case WC) opinions about the value of a hand at this point and this in turn may require deep understanding of the players’ bidding system. Even with all of this, there will be variance in expert opinions (if experts did not have different opinions about hands, bridge would be a dull game!), and this creates a lot of “noise” that makes it hard to have a powerful test. Testing hypotheses that do not require expert opinion is a lot easier. For F-S, the first analyses considered a “five option” signal where not moving the tray meant no preference. Since the decision between not signaling and signaling on a close hand (do I want to ask for a spade or should I leave it to partner?) requires bridge judgment, this made the analysis trickier.

MAKING PROBABILITY CALCULATIONS

In standard statistical testing, after one makes a ''null hypothesis’’, one computes the probabilities of unusual results given the null hypothesis. These calculations are often very straightforward (perhaps using a computer to do the arithmetic operations), but it is key to understand what assumptions are needed and how the calculations will vary if the assumptions are not exactly correct. For the moment we will assume the “fresh data” assumption. By this I mean that the conjecture has already been made and we are observing new hands that were not used in make the conjecture. By new, this can mean hands literally played after the conjecture was made (e.g., there was one session of hands played by Reese and Schapiro in Buenos Aires after the conjecture about the heart signaling had been made) or in these cases it can be watching videos of matches that were not used in developing the conjecture. (I will discuss further below how to deal with data that is not fresh.). We now can do the following.

  • Watch the hands and note on which hands an action that we are testing for is made; make sure the hand is one for which we are testing as well (in F-N case, this means a non-singleton lead); and then record if the play matches the conjecture. As an example, let us say there are 82 matches out of 85 hands.

That’s it. Our data is how many times the plays match the conjecture. We compare this to what can be roughly stated as

  • How many matches would there be if the actions were random?

Then you compute the probability that there would be at least 82 matches and you state the results. The numbers like .000000000000… given for F-N were given for probabilities computed like this. These are easy calculations which I will not do here. I instead want to focus on the assumptions behind the calculations and the robustness, that is, how strongly do the results depend on the particular assumptions.

  • I was perhaps a little more technical then I should have been in my report, so I will try to simplify a bit here. There is one important number which I will call q which is the maximum over all actions of the probability the deal of the cards will make the player want to make that action. In the example of F-Nfor horizontal (H) and vertical (V) leads from non-singletons, if one follows the conjecture, there is a certain fraction of hands that will make the leader want to lead H and the rest the leader would want to lead V given that one leads H or V. This number is very hard to estimate exactly but it can be estimated.

While it is hard to estimate we do not need to know this number exactly.

  • If one uses a value q that is larger than the true value, then the probability of an unusually large number of matches will increase. So as long as we choose q larger than the actual value, we will be giving upper bounds on the probability.

As an example, suppose there was a code where one always led H except if one held AK of diamonds and led spades in which case one led V. Then rarely would one want to lead V and the number q would be close to 1. If we did the binomial test then we would find that almost all the time we have matches: on almost every hand the leader plays H and when the lead is a spade, he does not have AK of diamonds. The probability of almost all matches is very high in this case and that is because q is near 1.

The data indicated that H leads were somewhat more likely than V leads for F-N, maybe as much as .57 but this was rough.

  • I used q=.6 as an upper bound for the probabilities. (For F-S, where there were four possible actions, I chose q=1/3 as a safe upper bound.)

The number that we are actually interested in is the probability that there is a match. This exact probability depends both on the probability that the conjectured code wants to make a certain play (the q above) and the probability distribution of how the player chooses to play the cards. As has been noted, there are other players who do not place their lead the same way every time, and it is hard to tell how they choose. I was perhaps imprecise when I wrote in my report that the “null hypothesis” was that the players were playing randomly; a better formulation would have been that the orientation of the lead is independent of the hand. Consequently:

  • If the null hypothesis is true, then calculations using q will give upper bounds for the actual probabilities. We do not need to know what the usual “random” placement of cards for F-N is.
  • When calculating with a value of q larger than the true value, the probabilities are larger than the actual probabilities --- therefore if one gets .000000000001 using a larger q, the actual probability is less than .000000000001.

While this is fairly straightforward, there are three issues to deal with.

QUALITY OF DATA

This may not be worth a full page, but, of course, one needs to collect the data. In cases of, say, horizontal vs. vertical vs. diagonal there may be some debate. For the F-N case, Nic Hammond handled these issues and I will not comment on them myself except to say that the statistical test is valid, even with some errors in the data, if the errors are made independently of the result. In other words, whoever decides what is H vs. V vs. D should be doing this without looking first as to how it matches the conjecture.

CRACKING CODES AND FRESH DATA

This is a serious and somewhat complicated issue mathematically. As most of you know, the main reason that cheating systems were found for a number of pairs is that bridge matches were finally put on sufficiently good video that they could be watched and re-watched. At least for the 2014 European championship, as far as I know, the videos were there primarily for “fan” interest so that bridge players around the world could see the players. The policy had the unexpected consequence that videos were available to be reviewed by many people for signs of illegal communication.

In trying to break the codes, a number of videos were watched. There are many possible codes that one could test especially considering that there are many different possible innocent-looking mannerisms. If we are reusing the same data, we must take this into consideration. The calculations mentioned above with the very small probabilities assume that we are testing only one code.

Let me give two examples. First imagine that you enter a raffle with 25 million entrants and there will be one winner selected. Then, the chance that you will win is one in 25 million but the chance that somebody wins is one. If you first draw the ticket and then consider the person who won, you can say there is a 25 million to one chance that she should win and therefore there is something wrong. Of course, this reasoning is incorrect but care is needed not to confuse “the probability that I win the lottery” from “the probability that somebody will win the lottery”. . There are so many different possible patterns of actions, there must be some that correlate somehow with some aspects of player hands.

Of course, if you first found the winner and then saw that she won the next lottery (“fresh data”), then one should be very suspicious. Looking at fresh data is comparable to first seeing who won the first lottery and then checking if that person won the next lottery. Somewhere in between I might realize that I know the lottery winner and ask what is the probability that I would know the winner, it would be the probability of one person winning times the number of people I know who played. In our situation the analogue of “number of people that I know” is “number of codes that would be tried before finding one that worked well”.

One can react completely in the opposite direction and decide that one can never use the data that the hypothesis was collected from. I claim this is going too far. Suppose that a state had a simple lottery where each day a three digit number is chosen. Zeroes are allowed, so there are 1000 possible numbers. If you choose that number for that day, you win. Upon looking at last year’s winning numbers you notice that the middle digit of every number in June and July was a 3. That is SO unlikely, that even though there is no fresh data, we should be able to conclude that something suspicious is going on. The way to give rough (and I only know how to give rough) estimates for this is to guess how many possible “suspicious patterns” that one would note. One then adds the probabilities of getting any of those suspicious patterns. In this case it would be very small.

For F-N, in my original report, I decided to give a rough number of 1,000,000 for the number of possible codes before one was found with such a strong correlation. If there was a 10^(-14) of chance for each possibility, this gave a chance of 10^(-8) that some code would have been found to match very well. This was a very rough estimate. One consideration in this is that one should only need to consider relatively simple codes that were useful. One part of the testimony in F-N was expert witness testimony about the usefulness of the conjectured code. This was one of the reasons to have bridge expert testimony; there is a second reason in the next section.

In the case of the signaling, we tried whenever possible to use fresh data. In the case of F-N, we found out the exact vugraph matches that MaaijkeMevius watched in deriving her hypothesis. In the case of F-S we took advantage of the fact that some vugraph matches were not used in deriving the hypothesis as well as none of the times F-S were playing E/W. These gave some very good “fresh data” estimates of the probability.

PROBABILITY OF CHEATING

The statistical test and the probability calculations calculate (an upper bound for) the probability that there would be so many matches if the lead placements were independent of the placements chosen by the conjecture. In the notation of the CAS appeal, this is P(E | H), the probability that the exceptional event (evidence) would happen given that the hypothesis of independence is true. The probability that we want is P(H | E) --- in other words, we have observed the evidence (incredible percentage of matches with the data) and we want to know what is the probability that the pair is “innocent”. Regardless of whether one is using “comfortable satisfaction” or “reasonable doubt”, one is trying to figure out how likely it is that the pair is innocent given what we have seen. If this probability is not too low, then we want to conclude “not guilty” in the sense of “not proven guilty”.

These probabilities are essentially impossible to compute and this is probably one of the reasons that people do not try to quantify the probability needed for “comfortable satisfaction” or “reasonable doubt”. Possible factors in trying to determine this in our cases are:

  • The utility of the conjectured code (would one consider using it?)
  • Other possible suspicions that have arisen for the pair.

In my report, I used my bridge experience as well as my understanding of the data (but I did not use any previous suspicions for F-N because I was not aware of them), to state that I believed beyond a reasonable doubt that they were exchanging signals. I certainly did not say that it was conclusive exactly what the code was. However, in order for the strong correlations to be there, it would really have to be very close to the conjectured code but with perhaps some additional aspects or rules for special hands (and rules perhaps for occasional deviation from the conjecture).

SUMMARY

Having outlined the basics of the analysis, I would like to make some concluding statements.

  • The statistical model we used was very simple. The mathematics needed is learned in a first college course in statistics or probability and for some students in high school.
  • However, even with the simple statistical model, there are a number of questions about appropriateness of the model, whether it applies to this case, and in one case choosing a parameter (the q mentioned above).
  • The statistical model explains what charge one could bring up. In particular, it is not being asserted that the entire code is known and this is all that the players are doing. All we are concluding is that there is improper exchange of information on some of the actions of the players that is close to what we say.
  • Some people have suggested much more complicated models to analyze statistics. The model chosen to study one aspect of Balicki and Żmudziński (B-Z) that did not result in a conviction in the EBL hearing was more complicated, and frankly, although the evidence was compelling, it was nowhere near as convincing as the cases for F-S and F-N. (There were further charges for Balicki and Żmudziński (B-Z) on their defensive plays but I have not been involved in the analyses of these.)
  • One of the reasons that the B-Z evidence was not so strong was that it relied on expert (WC) bridge opinion. In general, analysis of any conjecture that includes judgment (“is this a strong hand for the bidding now”) will be much more difficult.
  • If one has a more complex model, the mathematics may be too difficult for non-experts, but one should still be able to explain the assumptions in the model to a non-specialist. The issues discussed here still need to be addressed: quality of data, freshness of data, and deducing the probability of cheating from probabilities of the evidence given a null hypothesis

CONCLUSION

In order to defend the reliability of this analysis it is critical for the investigating committee to understand what is going on. This leads to my strongest recommendation.

  • Statistical experts need to be part of the committee investigating and bringing charges about bridge players. It is not sufficient to ask them to make some high-school (or higher level) mathematical calculations of probabilities under certain conditions. They must be involved from the beginning to the end. What they say should strongly affect the actual charges brought against players.

The CAS panel report indicates that the panel did not understand the statistical model. I am unable to say how much of this misunderstanding was caused by the presentation of the case and how much by the fact that the panel was not qualified to evaluate the information. In my next article, I will discuss a number of specific passages in the CAS panel report for F-N that indicate a failure to understand the statistical methods used to analyze the case.

73 Comments
Getting Comments... loading...
.

Bottom Home Top