Winner-take-all in action selection explained

Winner-take-all is a computer science concept that has been widely applied in behavior-based robotics as a method of action selection for intelligent agents. Winner-take-all systems work by connecting modules (task-designated areas) in such a way that when one action is performed it stops all other actions from being performed, so only one action is occurring at a time. The name comes from the idea that the "winner" action takes all of the motor system's power.^[1] ^[2] ^[3]

History

In the 1980s and 1990s, many roboticists and cognitive scientists were attempting to find speedier and more efficient alternatives to the traditional world modeling method of action selection.^[4] In 1982, Jerome A. Feldman and D.H. Ballard published the "Connectionist Models and Their Properties", referencing and explaining winner-take-all as a method of action selection. Feldman's architecture functioned on the simple rule that in a network of interconnected action modules, each module will set its own output to zero if it reads a higher input than its own in any other module.^[5] In 1986, Rodney Brooks introduced behavior-based artificial intelligence. Winner-take-all architectures for action selection soon became a common feature of behavior-based robots, because selection occurred at the level of the action modules (bottom-up) rather than at a separate cognitive level (top-down), producing a tight coupling of stimulus and reaction.^[6]

Types of winner-take-all architectures

Hierarchy

In the hierarchical architecture, actions or behaviors are programmed in a high-to-low priority list, with inhibitory connections between all the action modules. The agent performs low-priority behaviors until a higher-priority behavior is stimulated, at which point the higher behavior inhibits all other behaviors and takes over the motor system completely. Prioritized behaviors are usually key to the immediate survival of the agent, while behaviors of lower priority are less time-sensitive. For example, "run away from predator" would be ranked above "sleep."While this architecture allows for clear programming of goals, many roboticists have moved away from the hierarchy because of its inflexibility.^[7]

Heterarchy and fully distributed

In the heterarchy and fully distributed architecture, each behavior has a set of pre-conditions to be met before it can be performed, and a set of post-conditions that will be true after the action has been performed. These pre- and post-conditions determine the order in which behaviors must be performed and are used to causally connect action modules. This enables each module to receive input from other modules as well as from the sensors, so modules can recruit each other. For example, if the agent's goal were to reduce thirst, the behavior "drink" would require the pre-condition of having water available, so the module would activate the module in charge of "find water". The activations organize the behaviors into a sequence, even though only one action is performed at a time. The distribution of larger behaviors across modules makes this system flexible and robust to noise.^[8] Some critics of this model hold that any existing set of division rules for the predecessor and conflictor connections between modules produce sub-par action selection. In addition, the feedback loop used in the model can in some circumstances lead to improper action selection.^[9]

Arbiter and centrally coordinated

In the arbiter and centrally coordinated architecture, the action modules are not connected to each other but to a central arbiter. When behaviors are triggered, they begin "voting" by sending signals to the arbiter, and the behavior with the highest number of votes is selected. In these systems, bias is created through the "voting weight", or how often a module is allowed to vote. Some arbiter systems take a different spin on this type of winner-take-all by using a "compromise" feature in the arbiter. Each module is able to vote for or against each smaller action in a set of actions, and the arbiter selects the action with the most votes, meaning that it benefits the most behavior modules.

This can be seen as violating the general rule against creating representations of the world in behavior-based AI, established by Brooks. By performing command fusion, the system is creating a larger composite pool of knowledge than is obtained from the sensors alone, forming a composite inner representation of the environment. Defenders of these systems argue that forbidding world-modeling puts unnecessary constraints on behavior-based robotics, and that agents benefits from forming representations and can still remain reactive.

Notes and References

Schilling, M., Paskarbeit, J., Hoinville, T., Hüffmeier, A., Schneider, A., Schmitz, J., Cruse, H. (Sept. 17 2013). A hexapod walker using a heterarchical structure for action selection. Frontiers in Computational Neuroscience, 7.
Öztürk, P. (2009). Levels and types of action selection: The action selection soup. Adaptive Behavior, 17.
Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Retrieved from http://papers.klab.caltech.edu/104/.
Jones, J.L. (2004). Robot programming: A practical guide to behavior-based robotics. The McGraw Hill Companies, Inc.
Ballard, D.H., Feldman, J.A. (1982). Connectionist models and their properties. Cognitive Science, 6, 205-54.
Brooks, R.A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2, 14-23. Retrieved from https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=56.
Rosenblatt, J.K. (1995). DAMN: A distributed architecture for mobile navigation. Retrieved from http://www.robotics.usc.edu/~maja/teaching/cs584/papers/damn.pdf.
Blumberg, B.M. (1996). Old tricks, new dogs: Ethology and interactive creatures. Retrieved from ProQuest Dissertations & Theses Database.
Tyrrell, T. (Mar. 1 1994). An evaluation of Maes’ bottom-up mechanism for behavior selection. Adaptive Behavior, 2, 307-348.