Multisensor INTerpretation of behaviours and situations for an INTelligent INTervention in complex and dynamic environments (2011-2013)

Proyecto de Investigación del marco del VI Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica 2008-2011 (Resolución de 30 de diciembre de 2009; Boletín Oficial del Estado del 31, en Su Título II, Capítulo I)

Dentro del proyecto coordinado: INT3: "INTerpretación multisensorial de comportamientos y situaciones para una INTervención INTeligente en entornos complejos y dinámicos" en colaboración con los grupos LoUISE y GRAM de las universidades UCLM y UAH

Thematic project

Scene description languages

The answer to the needs presented above is to use scene description language or intermediate language, where the user can describe the scenario highlighting the characteristics of particular relevance to the application, usually forming ontologies with concepts drawn from analysis of the problem. These languages ​​are flexible, expressive and abstract, allowing the modeling of the fund and each of the entities of interest. Usually part of basic features (eg pixels), composing regions of interest (blobs or silhouettes) and composing them objects of interest. The set of objects, together with the background model and described the scene camera. During the execution of a control system is carried on the state of the entities. The term "state" often express the aggregation of all the properties and relationships of all relevant entities in a given time. On the other hand, events are triggered by changes detected in the state of an entity, having complex events are built from simpler ones. Depending on which authors refer to a primitive state and compounds, as well as primitive and composite events (called "activities" when referring only to physical objects) [Bremond et al., 2004], while others speak of primitive events ( an isolated event), monofilament (several sequential events) and multithreading (several events with partial overlaps and / or total) [Bolles & Nevatia, 2004]. Can be added within the same language inference rules that explain in simple physical and logical entities, avoiding long as the operator has to indicate how to obtain the required characteristics. The scene description language uses low-level tools transparently to the user, creating an abstraction layer where even several physical agents and / or software capable of supporting the intermediate representation altogether. Address some monitoring work based on generic characteristics, such as appearance [Zhong et al., 2000], other groups seek to use all available knowledge to reduce the ambiguity [Vacchetti et al. 2004]. Thus, in the European project "Cognitive Vision Systems' [CVS] focus their research on categorization, recognition, learning, interpretation and integration into vision systems for intelligent systems located. Nagel (2004) proposes a hierarchy of levels where you can differentiate different levels of representation and therefore refer to different types of knowledge. Some research groups are using human models for simplified monitoring and characterization [Wang et al., 2003]. We have also used models based on ellipsoids for robust tracking of humans in the middle distance [Zhao & Nevatia, 2004]. The W4 system [Haritaoglu, 2000] uses vertical histograms to distinguish human groups.

Surveillance supported Mixed Reality

The success of a surveillance system depends largely on the robustness of tracking algorithm. There are several problems when tracking objects in a scene, in particular, can include: partial occlusion and / or total, changes in lighting, shadows, changing background, background motion and camera movement, slowly moving objects, objects that appear and disappear from the scene, etc.. In general, the methods used for monitoring depend on the assumptions made for each specific task. The most common assumptions are: the number and type of objects is known in advance, it is possible to model the types of occlusions produced (other states) for a particular task, the background is modeled, the background model to detect certain occlusions, there is continuity in appearance, there is continuity in the movement, there is spatial and temporal continuity, etc.. Many tracking systems work only moderately well in limited environments where the background image is very dynamic and objectives are clearly separated. Even in these cases is especially desirable to develop a subsystem of human identification.
As a valid alternative can build an explicit model of the scene or by unsupervised learning [Collins et al., 2001] or manually [Xu & Ellis, 2006]. The term virtual reality was coined in 1989 [Lanier et al., 1989] as an interactive simulation that involves all the senses, computer-generated, searchable, visible and manipulated in real time, giving the feeling of presence in the environment. Today it is generally agreed that this technology allows us to create acceptable substitutes for real objects or environments. Unlike virtual reality, where the user is immersed in a synthetic world that replaces the real, the augmented reality paradigm is not intended to replace but complement the real, allowing the user to see the real world with virtual objects superimposed or composition with him. The user therefore has a vision of reality "enhanced", perceiving virtual objects coexisting in the same three-dimensional space with real [Azuma, 1997; Ikeuchi, 2001].

Surveillance multisensory distributed

The availability of new types of wireless networks and a large number of sensing devices, with increased computing capabilities, allows the implementation of surveillance systems become more sophisticated [Conci et al., 2005]. These systems consist of networks of sensors (video cameras, microphones, sensors, etc.) Able to work in omni mode or directional (oriented in the three dimensions) [Boult et al., 1999], and can be mounted on mobile platforms (monitored devices that allow motion around the environments under observation) or fixed (anchored at specific points on the environment) [Molina et al., 2003]. An important part of such systems is their control.
Traditionally, control of a surveillance system was held under a centralized configuration. These sensors report to a central controller that is making the decisions about what to do and transmits commands to remote devices. Although the design of this solution is conceptually simple, has several limitations regarding its robustness and scalability. These limitations stem from the rigid hierarchical centralized architecture. For example, to be failures or intrusions into the communication network, some areas under surveillance system can be uncovered. Or a major event can cause an influx of alarms and lead to a collapse of the control system, hampering their ability to decide and react. Therefore, prior reasons to consider new architectures are more decentralized and distributed. This distribution has to take into account two main issues. On an aside, the various components of the system must have a certain degree of autonomy, so they can make decisions locally. This will facilitate the solution of various problems that may occur as a result of isolation of these components and reduce communications in the system, which also improves the overall performance of it. On the other hand, must take into account the coordination of these distributed systems components. This coordination will improve system performance, for example, in assessing the significance of the events captured by various sensors, or the ability to keep moving parts in the system secure, or for collaboration to solve various effectors a problem.

Surveillance systems modeled by multiagent

One way to implement the requirements of decentralization, autonomy and coordination is through agent technology. From the point of view of this technology, intelligent surveillance system would be considered a multisensory multi-agent system (MAS) [Remagnino et al., 2004]. Agents are software components distributed with autonomy to make their own decisions and the ability to perceive and act on their environment. This distribution of intelligence that SMA will allow addressing the issues that appear to develop a multisensory intelligent surveillance system: (a) Bandwidth. (B) Productivity. (C) Speed. (D) Robustness. (E) Autonomy. (F) Scalability [Bradshaw, 1997].
The use of agents in surveillance systems has some precedent in the literature. For example, Monitorix [Abreu et al., 2000] is a MAS-based traffic surveillance video where the monitoring of vehicles is done through a traffic model and some learning algorithms that adjust the model parameters. The group VSAM (Monitoring and Video Surveillance) has developed a multicamera surveillance system that allows the human operator to monitor activities from a set of active video sensors [Boult et al, 1999]. The system can automatically detect people and vehicles and have them located on a geospatial model. Recently, another group proposed SMA architecture for the interpretation of the dynamics of a scene through the union of the information captured from multiple cameras. Looking to find multisensory SMAs for monitoring the work of Molina and colleagues [Molina et al., 2004]. Using fuzzy logic for evaluating the multisensory task priorities for surveillance applications in defense, all supported by an EMS for the logic of reasoning.
As a modeling approach, INGENIAS [Pavon et al., 2005], agent-based methodology that covers analysis, design and implementation, and is supported by tools, is an excellent alternative, this methodology and related tools can developers to get the implementation automatically, leaving most of the effort to the specification of the functionality and development of the surveillance system. 2.3.6. Integration with other fixed sensors on mobile robots In surveillance applications, mobile robots are used to provide a dynamic view and reactive to situations. That is, carrying sensors and actuators to be placed in position and a more favorable or necessary at all times. This allows the system ADVISED note the position information and current orientation of the robot, needed for proper integration of mobile sensors in monitoring the overall scheme.
In the work of Collins and colleagues [Collins et al., 2001] addresses the problem of a visual sensor network assets and cooperative detection and tracking of vehicles and humans. The system operates in real time as a security system and active surveillance. Bhanu and Zou (2004) uses a neural network with time delay (TDNN) to fuse the audio and video information to detect a person moving in a scene with other people. Desnoyer and colleagues [Desnoyer et al., 1990] try to do an integration using stochastic methods to model the environment from the data captured by a robot with multiple types of sensors. Wu and colleagues [Wu et al., 2003] propose a multi-camera surveillance for the detection, representation and recognition in video streams for surveillance in a parking lot. There is another kind of approach, using a probabilistic treatment of data from sensor configurations [Kumar et al., 2004].

UNED - Universidad Nacional de Educación a Distancia ®