Two!Ears replaces current thinking about auditory modeling by a systemic approach in which human listeners are regarded as multi-modal agents that develop their concept of the world by exploratory interaction. The goal of the project is to develop an intelligent, active computational model of auditory perception and experience in a multi-modal context. Our novel approach is based on a structural link from binaural perception to judgment and action, realized by interleaved signal-driven (bottom-up) and hypothesis-driven (top-down) processing within an innovative expert-system architecture. The system achieves object formation based on Gestalt principles, meaning assignment, knowledge acquisition and representation, learning, logic-based reasoning and reference-based judgment. More specifically, the system assigns meaning to acoustic events by combining signal- and symbol-based processing in a joint model structure, integrated with proprioceptive and visual percepts. It is therefore able to describe an acoustic scene in much the same way that a human listener and in terms of the sensations that sounds evoke (e.g. loudness, timbre, spatial extent) and their semantics (e.g., whether the sound is unexpected or a familiar voice). Our system will be implemented on a robotic platform, which will actively parse its physical environment, orientate itself and move its sensors in a humanoid manner. The system has an open architecture, so that it can easily be modified or extended. This is crucial, since the cognitive functions to be modeled are domain and application specific. Two!Ears will have significant impact on future development of ICT wherever knowledge and control of aural experience is relevant. It will also benefit research in related areas such as biology, medicine and sensory and cognitive psychology.