[ISEA96] Paper: Antonio Camurri – Multimodal Environments

Abstract

Long Paper

Multimodal Environments (MEs) are a family of systems capable of establishing creative, multimodal user interaction, and exhibiting dynamic, real-time, adaptive behaviour. In a typical scenario, one or more users are immersed in an environment allowing them to communicate by means of full-body movement, including dance and gesture, and possibly by singing, playing, etc. Users get feedback from the environment in real time, in terms of sound, music, visual media, and actuators in general (e.g., movements of semi-autonomous mobile systems). MEs are therefore a sort of extension of Augmented Reality environments, integrating intelligent features. From another viewpoint, a ME is a sort of prolongation of the human mind and senses. From an Artificial Intelligence perspective, MEs are populated by agents capable of changing their reactions and their “social interaction” rules over time. A gesture of a user can mean different things in different situations, and can produce changes in the agents populating the ME. MEs embed multi-level representations of different media and modalities, as well as representations of communication metaphors and of analogies to integrate modalities. MEs open new niches of applications, from art (including music, dance, theater), to culture (interactive museums), to entertainment (interactive discotheque, “dance karaoke”), to a number of industrial applications, many still to be discovered. In the paper, we present a flexible ME architecture, and its four particular applications we recently developed for art, music, entertainment, and culture applications: the SoundCage Interactive Music Machine, the HARP-Vscope, the HARP-DanceWeb, and the Theatrical Machine. The HARP-Vscope is a ME application for the tracking of full body human movement by means of on-body, wireless sensors, for gesture recognition and real-time control of computer-generated music and animation. The SoundCage Interactive Music Machine (IMM) is a system based on a set of spatial sensors displaced in a sort of “cage”, whose design has been focused to track overall, full- body human movement features without the need for any on-body device or constraint. The HARP/DanceWeb is based on a different human movement acquisition system (based on ultrasound sensors technology), which can be used both in stand-alone installations and integrated with the SoundCage IMM. The Theatrical and Museums Machine is a quite different application, consisting of one or more semi-autonomous mobile robots capable to perform tasks like Cicerone in a museum, or robot-actor on stage in theatre/dance/music events and art installations: such systems include audio output and a small computer on-board for basic, low-level processing (managing a sort of arco-reflex behaviors), and include radio links for (i) the remote supervision computer, (ii) the sound/music channel, and (iii) possible further radio links to control fixed devices displaced in the area. The systems described in this paper have been developed with the partial support of the Esprit Basic Research Action Project 8579 MIAMI (Multimodal Interaction for Advanced Multimedia Interfaces), have been utilized since 1995 in concerts and various events (theatre, museums), and have been selected by the Industry CEC Commission for presentation in live demonstrations at the EITC’95 (European Information Technology Conference and Exhibition, Brussels Congress Centre, 27-29 November 1995).