QSSIM - Qualifying System Simulation
Robert Parker
November 1997

---Tourney data---
The data set consists of about 110,000 games played between about 3700
players over a four year period (Nov. '92 to Sept. '96.)

---Robots---
32 of the top players have been replaced by robots with fixed, known
abilities.  Each game involving a robot is simulated, with P(Win) =
CumulativeLogistic(parameter=B,diff=RobotAbility-OpponentRating).
The parameter B has been estimated from the '92-'96 data to be
B=1/156.

---Method---
a.  Randomly assign the 32 abilities 2100, 2095, ... 1945 to the 32 robots
b.  In all '92-'96 games, replace 32 players with robots.
c.  Sequentially rate games from '92 to 950600. (That's June 0, 1995.)
d.  Start Qualifying Period.
e.  Sequentially rate games from 950600 to 960600.
f.  End Qualifying Period
g.  Calculate the following QS methods:

               A. Iterate only when #games >= 50

   1. OPRmleI (iteration for robots only,
               ratings curve stddev = estimated from data)
   2. IOPR  (iteration, ratings curve stddev = 200*sqrt(2))
   3. IOPRmle (iteration, ratings curve stddev = estimated from data)

               B. If iterating, iterate only when #games >= 30

   4. OPRmleI (iteration for robots only,
               ratings curve stddev = estimated from data)
   5. IOPR  (iteration, ratings curve stddev = 200*sqrt(2))
   6. IOPRmle (iteration, ratings curve stddev = estimated from data)

               C. Iteration for robots only

   7. OPRmleHI  (iteration when #games>=30,
                 ratings curve stddev = estimated from data,
                 Opp Strength = Max rating during Qualifying Period)

               D. No iteration

   8. OPR   (no iteration, ratings curve stddev = 200*sqrt(2))
   9. OPRmle  (no iteration, ratings curve stddev = estimated from data)
  10. HI  (Peak rating during Qualifying Period)
  11. RAT (Current rating at end of Qualifying Period)


---Statistics---
Two statistics are calculated:
1. Kendall's tau, a measure of the correlation between the known robot ranks
and the QS-assigned ranks.  Higher is better.
2. n-out-of 10: The number of the known Top 10 robots who are ranked among
the Top 10 by the QS.  Higher is better.

---Points about the simulation---

Robot abilities are kept fixed over the entire 4-year period.

Of the (8695) games involving Robots over the 4-year period, (958) were Robot
vs. Robot.

The Qualifying Period was 950600 to 960600.  Each robot played at least 50
games within that period.

The problem of assigning initial ratings for the Robots was avoided:
Each was treated as a new player starting in '92.

---A note on calculation of OPR---

I've actually used the logistic distribution instead of the normal
distribution as the ratings curve in the calculation of OPR.  There is very
little difference in the rankings from the two methods: using the methods
to calculate OPR for WSC97, we see three sets of rankings that are affected:
#'s 23 and 24 are switched, #'s 44, 45, and 46 are jumbled, and #'s 39 and
40 are switched.  More on that later.

---Conclusions---

Here are averages and 90% confidence intervals for the two statistics and
the 15 methods.  See a graph of this at
http://www.math.unm.edu/~rparker

               tau                   n-out-of-10

           Upper     Lower     Ave           Upper     Lower     Ave
OPRmleI    0.5821    0.5669    0.5745        7.1309    6.8791    7.0050
IPR        0.5777    0.5618    0.5697        7.0813    6.8187    6.9500
IPRmle     0.5778    0.5620    0.5699        7.0776    6.8224    6.9500
OPRmleI    0.5821    0.5669    0.5745        7.1309    6.8791    7.0050
IPR        0.5748    0.5588    0.5668        7.0416    6.7884    6.9150
IPRmle     0.5742    0.5581    0.5661        7.0230    6.7670    6.8950
OPRmleHI   0.5740    0.5578    0.5659        7.0362    6.7738    6.9050
OPR172     0.5817    0.5661    0.5739        7.1003    6.8497    6.9750
OPRmle     0.5806    0.5651    0.5729        7.1061    6.8539    6.9800
HI         0.5804    0.5621    0.5713        7.2825    7.0375    7.1600
RAT        0.6325    0.6171    0.6248        7.5460    7.3240    7.4350

The only measure that distinguishes itself is RAT, the rating at the end
of the qualifying period.  Looking at n-out-of-10, HI has a slight (but
insignificant) lead over all except RAT.

The n-out-of-10 statistic is probably more pessimistic than we would see in
the real-world situation, due to the assignment of the Robot abilities.  (I
conjecture that the top players are actually separated by more than 5
points of ability. ??)  That is, I suspect that a QS will actually be able to
do better than correctly choosing 7 of the top 10.

It seems that the particular details of implementation of OPR (iterate vs. no
iterate, how to measure Opp Strength, stdev of ratings curve) have little
effect on the overall performance of the system.  This conclusion depends on
the tournament participation behavior of the 32 top players who were replaced
with robots, however.  Whether a player might, through clever scheduling,
manipulate one of these systems is the subject of further study.

--END POST--