Aucbvax.5298 fa.space utzoo!decvax!ucbvax!space Tue Nov 24 03:22:30 1981 SPACE Digest V2 #40 >From OTA@S1-A Tue Nov 24 03:17:50 1981 SPACE Digest Volume 2 : Issue 40 Today's Topics: STS-1 -- "The Bug Heard 'Round the World" ---------------------------------------------------------------------- Date: 24 Nov 1981 01:24:40-PST From: decvax!duke!unc!smb at Berkeley In-real-life: Steven M. Bellovin To: decvax!duke!unc!space@Berkeley Subject: STS-1 -- "The Bug Heard 'Round the World" There's a very interesting article on just what delayed the launch of STS-1 in the October 1981 issue of SOFTWARE ENGINEERING NOTES. It's written by John R. Garman, the deputy chief of the Spacecraft Software Division at the Johnson Space Center. I won't try to summarize the article -- it's fairly complex, and describes how the 4 identically- programmed computers and the backup computer with different software co-exist. But the origin of the bug is interesting. The problem was caused when a time delay in an initialization subroutine was changed to avoid problems during system reconfigurations; this affected the system's idea of what the time of day was, and hence caused affected scheduling of certain asynchronous processes. (Because all 4 computers must have *identical* ideas of what time it is, they use the operating system's timer queue; hence, any use of the timer before the other initialization code ran could cause trouble. The real TOD clock is used only during cold-starts of the first computer.) The nature of this change was such that there was only a 1 in 67 chance of a failure. "No 'mapping' analyzer built today could have found that linkage. Testing might have. But the window wasn't opened until late in the test program (relative to this code), and even then, *most* simulations didn't go through the expense of initializing 'from scratch'. And even where they did, it would have to have been in a lab with a reasonably accurate model of the telemetry system *plus* a simulation or test involving both PASS [Primary Avionics Software System] *and* BFS [Backup Flight Control System], and it would still be fighting the low probability. Even then, the temptation would be to try again....and never be able to repeat it; and never be sure it wasn't a 'funny' in the lab set-up.... or a similar problem fixed by another software change. That, in fact, apparently did happen in one of the labs....about 4 months prior to the flight.. "And then, on *the* day that the first GPC [General Purpose Computer] was turned on, 30 hours before scheduled launch, we hit the problem......" ------------------------------ End of SPACE Digest ******************* ----------------------------------------------------------------- gopher://quux.org/ conversion by John Goerzen of http://communication.ucsd.edu/A-News/ This Usenet Oldnews Archive article may be copied and distributed freely, provided: 1. There is no money collected for the text(s) of the articles. 2. The following notice remains appended to each copy: The Usenet Oldnews Archive: Compilation Copyright (C) 1981, 1996 Bruce Jones, Henry Spencer, David Wiseman.