Aalice.349 net.chess utzoo!decvax!ucbvax!mhtsa!alice!ken Thu Jan 7 14:31:21 1982 chess strength This is in response to Gillogly's recent comments on computer chess strength (and projections of same). I performed the following experiment: There are 7 programs named P3, P4, ... P9. P3 searches the standard (Chess 4.X clone) 3 plies and then enters a quiescence capture search with full evaluation at the leaves. (Getting out of check does not count as a ply.) P4, P5 ... are identical programs that search progressively deeper. Program Px played 20 games with Py (all x,y; x not equal y). Each program played both sides of 10 randomly selected starting opening positions. The positions were chosen by random number generator selecting positions from ECO (Encyclopedia of Chess Openings) that had 10 moves played by each side and were judged (by mini-max) to be "equal" or "unclear" or "with compensation". Altogether there were 420 games played. ((7x7 - 7)/2 * 20.) The performance rating of each program was calculated (on an arbitrary base) by calculating what rating a program needed to obtain the expected performance actually obtained. This was then iterated until it converged to 1 rating point for all programs. Here is the crosstable: P3 P4 P5 P6 P7 P8 P9 P3 -- 8 0 0 0 0 0+ P4 12 -- 5 0+ 0 0 0 P5 20 15 -- 3+ 3 0+ 0 P6 20 19+ 16+ -- 4 1+ 1+ P7 20 20 17 16 -- 5 4 P8 20 20 19+ 18+ 15 -- 5+ P9 19+ 20 20 18+ 16 14+ -- The performance rating of the programs (with P4 chosen as the base of 0) P3 -134 P4 0000 P5 0335 P6 0591 P7 0796 P8 0973 P9 1093 If you plot the performance data, you will note that P3 performs better than expected. This is true. Due to edge effects in my chess hardware, P3 performs approximately 3.2 plies on the average. This is why P4 was chosen as zero. Other empirical results: 1. Newborn: double the power = 100 rating points. If you assume 5.5 branching factor, this means that a ply will yield 250 points. In the linear range of P4-P7 (in fact the range observed by Newborn in existing programs) this relation holds. 2. Thompson: rating = 400 * nodes ** (1/8) Cahlander: rating = 400 * 2**(1/8) * m**(d/16) m = legal moves, d = depth (ply) This formula predicted the performance up to 2000, but obviously is not valid above that. 3. Gillogly: linear in time, 80 points/30 minutes. This is based on few games over a very small time range (factor of 5). I think it was optimistic to extrapolate these results. I urge curve-fitters to guess at the nature of the P4-P9 ratings. I also would like a discussion of the relevance of the whole experiment. Ken Thompson ----------------------------------------------------------------- gopher://quux.org/ conversion by John Goerzen of http://communication.ucsd.edu/A-News/ This Usenet Oldnews Archive article may be copied and distributed freely, provided: 1. There is no money collected for the text(s) of the articles. 2. The following notice remains appended to each copy: The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996 Bruce Jones, Henry Spencer, David Wiseman.