Computer-aided maintenance explained

Computer-aided maintenance (not to be confused with CAM which usually stands for Computer Aided Manufacturing) refers to systems that utilize software to organize planning, scheduling, and support of maintenance and repair. A common application of such systems is the maintenance of computers, either hardware or software, themselves. It can also apply to the maintenance of other complex systems that require periodic maintenance, such as reminding operators that preventive maintenance is due or even predicting when such maintenance should be performed based on recorded past experience.

Computer aided configuration

The first computer-aided maintenance software came from DEC in the 1980s to configure VAX computers. The software was built using the techniques of artificial intelligence expert systems, because the problem of configuring a VAX required expert knowledge. During the research, the software was called R1 and was renamed XCON when placed in service. Fundamentally, XCON was a rule-based configuration database written as an expert system using forward chaining rules. As one of the first expert systems to be pressed into commercial service it created high expectations, which did not materialize, as DEC lost commercial pre-eminence. [1]

Help Desk software

Help desks frequently use help desk software that captures symptoms of a bug and relates them to fixes, in a fix database. One of the problems with this approach is that the understanding of the problem is embodied in a non-human way, so that solutions are not unified.

Strategies for finding fixes

  1. The bubble-up strategy simply records pairs of symptoms and fixes. The most frequent set of pairs is then presented as a tentative solution, which is then attempted. If the fix works, that fact is further recorded, along with the configuration of the presenting system, into a solutions database.
  2. Oddly enough, shutting down and booting up again manages to 'fix,' or at least 'mask,' a bug in many computer-based systems; thus reboot is the remedy for distressingly many symptoms in a 'fix database.' The reason a reboot often works is that it causes the RAM to be flushed. However, typically the same set of actions are likely to create the same result demonstrating a need to refine the "startup" applications (which launch into memory) or install the latest fix/patch of the offending application.
  3. Currently, most expertise in finding fixes lies in human domain experts, who simply sit at a replica of the computer-based system, and who then 'talk through' the problem with the client to duplicate the problem, and then relate the fix.

References

  1. Virginia E. Barker and Dennis E. O'Connor. Expert systems for configuration at Digital: XCON and beyond. Communications of the ACM, 32(3):298--318, March 1989.