Past Projects

  1. ZAM: Zero-resource keyword recognition for Audio Mass-Data (Zero-Resource Schlagworterkennung bei Audio-Massendaten)
    Runtime: 01.12.2016 – 31.08.2017
    Role: Coauthor Proposal, Beneficiary, Principal Investigator
    Partners: University of Passau and others
    To process mass audio data captured by a range of diverse sensors, technical solutions within the field of keyword recognition shall be investigated. It shall be shown which approaches simplify, accelerate, and optimise audio analysis as well as optimise manual work processes. The major aim thereby is to significantly reduce human work load by utmost automation given the following focus: 1) limited to no resources (“zero resource”) for training and 2) answering the question on how low audio quality can be when reasonably processing audio highly automatically.
  2. VocEmoApI: Voice Emotion detection by Appraisal Inference (#230331)

    EU Horizon 2020 ERC Proof of Concept Grant (PoC 2015) – 46% acceptance rate in the call
    Runtime: 01.11.2015 – 30.04.2017
    Role: Coauthor Proposal, Beneficiary, LEAR
    Partners: audEERING GmbH
    The automated sensing of human emotions has gained a lot of commercial attention lately. For facial and physiological sensing many companies offer first professional products. Recently, voice analytics has become a hot topic, too, with first companies emerging for the telecom, entertainment, and robot markets (e.g. Sympalog, audEERING, Aldebaran, etc.). Current vocal emotion detection approaches rely on machine learning where emotions are identified based on a reference set of expression clips. The drawback of this method is the need to rely on a small set of basic, highly prototypical emotions. Real life emotion detection application fields such as clinical diagnosis, marketing research, media impact analysis, and forensics and security, require subtle differentiations of feeling states. VocEmoApI will develop a proof-of-concept software for vocal emotion detection based on a fundamentally different approach: Focusing on vocal nonverbal behavior and sophisticated acoustic voice analysis, it exploits the building blocks of emotional processes – a person’s appraisal of relevant events and situations which trigger action tendencies and expressions which constitute an emotional episode. Evidence for emotion-antecedent appraisals will be continuously tracked in running speech. The approach can infer not only basic emotion categories but also much finer distinctions such as subcategories of emotion families and subtle emotions. The development of VocEmoApI draws extensively from the results of the applicant’s Advanced Grant, providing a solid theoretical basis. Market analysis through marketing research partners will be conducted and the prototype software will be utilized to promote the technology and estimate a product value based on feedback from industry contacts. A massive impact of VocEmoApI on large markets such as household robotics, public security, clinical diagnosis and therapy, call analytics, and marketing research is to be expected.
  3. MixedEmotions: Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets (#644632)

    EU Horizon 2020 Innovation Action (IA) – 12.5% acceptance rate in the call
    Runtime: 01.04.2015 – 31.03.2017
    Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
    Partners: NUI Galway, Univ. Polit. Madrid, University of Passau, Expert Systems, Paradigma Tecnológico, TU Brno, Sindice Ltd., Deutsche Welle, Phonexia SRO, Adoreboard, Millward Brown
    MixedEmotions will develop innovative multilingual multi-modal Big Data analytics applications that will analyze a more complete emotional profile of user behavior using data from mixed input channels: multilingual text data sources, A/V signal input (multilingual speech, audio, video), social media (social network, comments), and structured data. Commercial applications (implemented as pilot projects) will be in Social TV, Brand Reputation Management and Call Centre Operations. Making sense of accumulated user interaction from different data sources, modalities and languages is challenging and has not yet been explored in fullness in an industrial context. Commercial solutions exist but do not address the multilingual aspect in a robust and large-scale setting and do not scale up to huge data volumes that need to be processed, or the integration of emotion analysis observations across data sources and/or modalities on a meaningful level. MixedEmotions will implement an integrated Big Linked Data platform for emotion analysis across heterogeneous data sources, different languages and modalities, building on existing state of the art tools, services and approaches that will enable the tracking of emotional aspects of user interaction and feedback on an entity level. The MixedEmotions platform will provide an integrated solution for: large-scale emotion analysis and fusion on heterogeneous, multilingual, text, speech, video and social media data streams, leveraging open access and proprietary data sources, and exploiting social context by leveraging social network graphs; semantic-level emotion information aggregation and integration through robust extraction of social semantic knowledge graphs for emotion analysis along multidimensional clusters.
  4. Speech Emotion Recognition using Nonlinear Dimensionality Reduction Methods
    China Scholarship Council
    Runtime: 01.08.2014 – 31.07.2016
    Role: Supervisor
    Partners: TUM
  5. Semi-Supervised Learning in the Analysis of Continuous Speaker Emotion and Personality
    China Scholarship Council
    Runtime: 01.08.2011 – 31.07.2015
    Role: Supervisor
    Partners: TUM
  6. In-car music recommendation system based on driver’s emotion
    TUM University Foundation Fellowship
    Runtime: 01.05.2014 – 30.04.2015
    Role: Supervisor
    Partners: TUM
  7. PROPEREMO: Production and Perception of Emotions: An affective sciences approach (#230331)
    FP7 ERC Advanced Grant
    Runtime: 01.03.2008 – 28.02.2015
    Role: Participant
    Partners: University of Geneva (PI Klaus Scherer), TUM, Free University of Berlin
    Emotion is a prime example of the complexity of human mind and behaviour, a psychobiological mechanism shaped by language and culture, which has puzzled scholars in the humanities and social sciences over the centuries. In an effort to reconcile conflicting theoretical traditions, we advocate a componential approach which treats event appraisal, motivational shifts, physiological responses, motor expression, and subjective feeling as dynamically interrelated and integrated components during emotion episodes. Using a prediction-generating theoretical model, we will address both production (elicitation and reaction patterns) and perception (observer inference of emotion from expressive cues). Key issues are the cognitive architecture and mental chronometry of appraisal, neurophysiological structures of relevance and valence detection, the emergence of conscious feelings due to the synchronization of brain/body systems, the generating mechanism for motor expression, the dimensionality of affective space, and the role of embodiment and empathy in perceiving and interpreting emotional expressions. Using multiple paradigms in laboratory, game, simulation, virtual reality, and field settings, we will critically test theory-driven hypotheses by examining brain structures and circuits (via neuroimagery), behaviour (via monitoring decisions and actions), psychophysiological responses (via electrographic recording), facial, vocal, and bodily expressions (via micro-coding and image processing), and conscious feeling (via advanced self-report procedures). In this endeavour, we benefit from extensive research experience, access to outstanding infrastructure, advanced analysis and synthesis methods, validated experimental paradigms as well as, most importantly, from the joint competence of an interdisciplinary affective science group involving philosophers, linguists, psychologists, neuroscientists, behavioural economists, anthropologists, and computer scientists.
  8. ASC-INCLUSION: Integrated Internet-Based Environment for Social Inclusion of Children with Autism Spectrum Conditions (#289021)

    EU FP7 Specific Targeted Research Project (STREP)
    Runtime: 01.11.2011 – 31.12.2014
    Role: Coordinator, Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
    Partners: University of Cambridge, Bar Ilan University, Compedia, University of Genoa, Karolinska Institutet, Autism Europe, TUM, Koc University, Spectrum ASC-Med
    Autism Spectrum Conditions (ASC, frequently defined as ASD – Autism Spectrum Disorders) are neurodevelopmental conditions, characterized by social communication difficulties and restricted and repetitive behaviour patterns. Current studies suggest 1% of the population might fit an ASC diagnosis. Alongside their difficulties individuals with ASC tend to have intact and sometimes superior abilities to comprehend and manipulate closed, rule-based, predictable systems, such as computerized environment. Their affinity for the computerized environment has led to several attempts to teach emotion recognition and expression, and social problem solving to individuals with ASC, using computer-based training.
    In the last decade, web applications have been increasingly used for social interaction, forming online communities and social networks. Anecdotal reports of the emergence of online autistic communities, and the use of forums and virtual-worlds, show the great promise the internet holds for better inclusion and social skills training for users/people with ASC. Since intervention into ASC has been shown to be more effective when provided early in life, using the internet as a platform for the support of younger individuals with ASC could significantly promote their social inclusion.
    The project aims to create and evaluate the effectiveness of such an internet-based platform, directed for children with ASC (and other groups like ADHD and socially-neglected children) and those interested in their inclusion. This platform will combine several state-of-the art technologies in one comprehensive virtual world, including analysis of users’ gestures, facial and vocal expressions using standard microphone and webcam, training through games, text communication with peers and smart agents, animation, video and audio clips. User’s environment will be personalized, according to individual profile & sensory requirements, as well as motivational. Carers will be offered their own supportive environment, including professional information, reports of child’s progress and use of the system and forums for parents and therapists.
  9. Novel Approaches for Large Vocabulary Continuous Speech Recognition
    China Scholarship Council
    Runtime: 01.08.2010 – 31.07.2014
    Role: Supervisor
    Partners: TUM
  10. U-STAR: Universal Speech Translation Advanced Research
    Academic Cooperation
    Runtime: 01.06.2012 – 31.03.2013
    Role: Consortial Partner
    Partners: TUM and others.
    The Universal Speech Translation Advanced Research Consortium (U-STAR) is an international research collaboration entity formed to develop a network-based speech-to-speech translation (S2ST) with the aim of breaking language barriers around the world and to implement vocal communication between different languages.
  11. GLASS: Generic Live Audio Source Separation
    Industry Cooperation with HUAWEI TECHNOLOGIES within the HUAWEI Innovative Research Program (HIRP)
    Runtime: 01.01.2011 – 31.12.2013
    Role: Principal Investigator, Author Proposal
    Partners: TUM and HUAWEI
    GLASS finds new ways of separating audio sources, e.g., for crystal clear speech communication, by machine intelligence and advanced separation algorithms.
  12. Kontextsensitive automatische Erkennung spontaner Sprache mit BLSTM-Netzwerken (#SCHU2508-4/1)
    (“Context-Sensitive Automatic Recognition of Spontaneous Speech by BLSTM Networks”)
    Funded by the DFG (German Research Foundation)
    Runtime: 01.03.2011 – 28.02.2014
    Role: Principal Investigator, Author Proposal
    Partners: TUM
    Trotz zahlreicher Fortschritte im Bereich der automatischen Spracherkennung ist die Erkennungsleistung und Robustheit heutiger Spracherkennungssysteme nicht ausreichend, um als Grundlage für natürliche, spontansprachliche Mensch-Maschine-Interaktion zu dienen. Ziel des Forschungsvorhabens ist es deshalb, die Genauigkeit von Systemen zur Erkennung natürlicher, fließender Sprache mittels neuartiger Mustererkennungsmethoden zu verbessern. Da die Effizienz der menschlichen Spracherkennung vor allem auf der intelligenten Auswertung von Langzeit-Kontextinformation beruht, sollen dabei Ansätze zur Berücksichtigung von Kontext auf Merkmalsebene verfolgt werden. Ausgehend von sogenannten Tandem-Spracherkennern, bei denen neuronale Netze zur Phonemprädiktion in Kombination mit dynamischen Klassifikatoren verwendet werden, sollen hierzu bidirektionale Long Short-Term Memory (BLSTM) Netzwerke eingesetzt werden. Im Gegensatz zu derzeit in Tandem- Systemen verwendeten Phonemschätzern erlaubt es das BLSTM-Prinzip, ein optimales Maß an Kontextinformation bei der Prädiktion miteinzubeziehen. Da jüngste Erfolge im Bereich der kontextsensitiven Phonemerkennung und Schlüsselwortdetektion die Effektivität des BLSTM-Ansatzes unterstreichen, ist eine entsprechende Weiterentwicklung kontinuierlicher Spracherkennungssysteme äußerst vielversprechend.
  13. Nichtnegative Matrix-Faktorisierung zur störrobusten Merkmalsextraktion in der Sprachverarbeitung (#SCHU2508-2/1)
    (“Non-Negative Matrix Factorization for Robust Feature Extraction in Speech Processing”)
    Funded by the DFG (German Research Foundation)
    Runtime: 01.06.2010 – 31.05.2013
    Role: Principal Investigator, Author Proposal
    Partners: TUM
    Hauptziel des Forschungsvorhabens ist, die Erkennung von Sprach- und Musiksignalen störrobuster zu gestalten. Besonderes Kennzeichen ist die Integration von Merkmalen, die auf Nichtnegativer Matrix-Faktorisierung (NMF) basieren. NMF – ein Verfahren zur Datenreduktion – erfreut sich in der Signalverarbeitung jüngst zunehmender Popularität. Dabei wird meist ein Spektrogramm in zwei Faktoren zerlegt. Der erste enthält eine spektrale ‘Basis’ des Signals, der zweite die Aktivität der Basisvektoren über die Zeit. In diesem Forschungsvorhaben werden aus dem zweiten Faktor Merkmale gewonnen, die bestehende Architekturen zur Sprach- und Musikverarbeitung ergänzen können. Erste durchgeführte Experimente zur NMF-Merkmalsextraktion für die störrobuste Erkennung gesprochener Buchstabensequenzen im Fahrzeug haben sich dabei konventionellen Verfahren als signifikant überlegen und äußerst vielversprechend erwiesen. Das dabei verwendete Verfahren soll im Rahmen des Projekts durch Weiterentwicklung der NMF verbessert werden und insbesondere für den Einsatz in echtzeitfähigen Spracherkennungssystemen, auch für fließende Sprache, vorbereitet werden. Schließlich sollen die beschriebenen NMF-Merkmale in weiteren Anwendungsfeldern wie Emotionserkennung, Erkennung von nichtlinguistischer Vokalisierung wie Lachen oder Husten in Sprache und Akkorderkennung mit dem Ziel der Steigerung aktueller Erkennungsgüte und der Störrobustheit eingesetzt werden.
  14. Highly Robust Interest and Emotion Recognition from Speech
    China Scholarship Council
    Runtime: 01.08.2011 – 30.09.2012
    Role: Supervisor
    Partners: TUM
  15. SEMAINE: Sustained Emotionally coloured Machine-humane Interaction using Nonverbal Expression (#211486)

    EU FP7 STREP
    Runtime: 01.01.2008 – 31.12.2010
    Role: Principal Investigator, Coauthor Proposal (highest ranked in the call), Project Steering Board Member, Workpackage Leader
    Partners: DfKI, Queens University Belfast (QUB), Imperial College of Science, Technology and Medicine London, University Twente, University Paris VIII, CNRS-ENST, TUM
    SEMAINE deals with real-time, robust, non-verbally competent conversations between a conversational agent and a human user.
  16. TCVC: Talking Car and Virtual Companion
    Industry Cooperation with Continental Automotive GmbH
    Runtime: 01.06.2008 – 30.11.2008
    Role: Principal Investigator, Author Proposal
    Partners: TUM and Continental Automotive GmbH
    TCVC provides an expertise on emotion in the car with respect to a requirement analysis, potential and near-future use-cases, technology assessment and a user acceptance study.
  17. ICRI: In-Car Real Internet
    Industry Cooperation with Continental Automotive GmbH
    Runtime: 01.06.2008 – 30.11.2008
    Role: Principal Investigator, Author Proposal
    Partners: TUM and Continental Automotive GmbH
    ICRI aims at benchmarking of internet browsers on embedded platforms as well as at development of an integrated multimodal demonstrator for internet in the car. Investigated modalities contain hand-writing and touch-gestures and natural speech apart from conventional GUI interaction. The focus lies on MMI development with an embedded realisation.
  18. cUSER 2
    Industry cooperation with Toyota
    Runtime: 08.01.2007 – 31.07.2007
    Role: Principal Investigator, Coauthor Proposal
    Partners: TUM and Toyota
    The aim of the cUSER follow-up project is to establish a system to interpret human interest by combined speech and facial expression analysis basing on multiple input analyses. Besides the aim of improved highest possible accuracy by subject adaptation, class balancing strategies, and fully automatic segmentation by individual audio and video stream analysis, cUSER 2 focuses on real-time application by cost-sensitive feature space optimization, graphics processing power utilization, and high-performance programming methods. Furthermore, feasibility and real recognition scenarios will be evaluated.
  19. cUSER
    Industry cooperation with Toyota
    Runtime: 01.08.2005 – 30.09.2006
    Role: Coauthor Proposal, Project Steering, Senior Researcher
    Partners: TUM and Toyota
    The aim of this project was an audiovisual approach to the recognition of spontaneous human interest. For a most robust estimate, information from four sources was combined for the first time (as reported in the literature) by a synergistic and individual failure tolerant early feature fusion: Firstly, speech is analyzed with respect to acoustic properties based on a high-dimensional prosodic, articulatory, and voice quality feature space plus the linguistic analysis of spoken content by ASR and bag-of-words vector space modeling. Secondly, visual analysis provides patterns of the facial expression by Active Appearance Models, and of the movement activity by eye tracking. Experiments were carried out on a video database of 10.5h of spontaneous human-to-human conversation collected throughout the project. It contains 20+ subjects in gender and age-class balance. Recordings were fulfilled considering diverse comfort and noise conditions. Multiple levels of interest were annotated within a rich transcription. Experiments aimed at person-independent and robust real-life usage and showed the high potential of such a multimodal approach. Benchmark results could further be provided based on transcription versus full automatic processing.
  20. CEICES: Combining Efforts for Improving automatic Classification of Emotional user States
    Reserach Initiative within the EU FP6 Network of Excellence (NoE) HUMAINE
    Runtime: 2005 – 2008
    Role: Invited Research Expert
    Partners: Friedrich Alexander Universität Erlangen-Nürnberg (FAU), Fondazione Bruno Kessler (FBK, formerly ITC-IRST), Universität Karlsruhe (UKA), Universität Augsburg (UA), LIMSI-CNRS, Tel Aviv University (TAU), Tel Aviv Academic College of Engineering (AFEKA), TUM
    CEICES is a co-operation between several sites dealing with classification of emotional user states conveyed via speech; this initiative was taken within the European Network of Excellence HUMAINE under the name CEICES (Combining Efforts for Improving automatic Classification of Emotional user States). The database used within CEICES is a German corpus with recordings of 51 ten- to thirteen-year old children communicating with Sony’s AIBO pet robot. Conceptualization, design and recordings were done at the originator site FAU. The approach to be followed within CEICES looked like this: the originator site provided speech files, phonetic lexicon, manually corrected word segmentation, and manually corrected F0 values, emotional labels, definition of train and test samples, etc. The data was annotated at the word level. All partners committed themselves to share with all the other partners their extracted feature values together with the necessary information (which feature models which acoustic or linguistic phenomenon, format of feature values, classifier used, etc.). Thus each site could assess the features provided by all other sites, together with their own features, aiming at a repertoire of optimal features.
  21. FERMUS: Fehlerrobuste Multimodale Sprachdialoge
    Industry cooperation with BMW, DaimlerChrysler, Siemens, VDO Automotive
    Runtime: 01.03.2000 – 30.06.2003
    Role: Coauthor Proposal, Project Steering, Researcher
    Partners: TUM and BMW, DaimlerChrysler, Siemens, VDO
    The primary intention of the FERMUS-project was to localize and evaluate various strategies for a dedicated analysis of potential error patterns during human-machine interaction with information and communication systems in upper-class cars. For reaching this goal, we have employed a huge bundle of progressive and mainly recognition-based input modalities, like interfaces for natural speech and dynamic gestural input. Particularly, emotional patterns of the driver have beeen integrated for generating context-adequate dialog structures.
  22. SOMMIA: Speech-oriented Man-Machine-Interface in the Automobile
    Industry Cooperation with SiemensVDO Automotive
    Runtime: 01.06.2000 – 30.09.2001
    Role: Coauthor Proposal, Project Steering, Researcher
    Partners: TUM and SiemensVDO Automotive
    The SOMMIA project focused on the design and evaluation of an ergonomic and generic operation concept for a speech-based MMI integrated in a car MP3-player or in comparable automotive applications. In addition, the system was subject to several economic and geometrical boundary conditions: a two line, 16 characters display with a small set of LEDs and a speaker- independent full word recognizer with 30 to 50 words active vocabulary. Nevertheless, the interface had to meet high technical requirements: its handling should be easy to learn, comfortable and, above all, intuitive and interactively explorable.
  23. ADVIA: Adaptive Dialogverfahren im Automobil
    Industry cooperation with BMW
    Runtime: 01.07.1998 – 31.12.2000
    Role: Coauthor Proposal, Project Steering, Researcher
    Partners: TUM and BMW
    In modernen Fahrzeugen der Luxusklasse, aber durch den ständigen Preisverfall bereits in Mittelklassefahrzeugen, koexistieren zahlreiche elektronische Geräte zur Steigerung des Komforts. Einige Beispiele hierfür sind: Klimaanlage, Navigations- und Telematiksysteme, Audiokomponenten wie CD-Wechsler und Radio, Funktelefon, Bordcomputer und vieles mehr. Da die meisten dieser Komponenten in irgendeiner Weise computerbasiert sind, kann man sich viele Funktionen ausdenken und diese implementieren. Spendiert man jeder dieser Funktionen einen eigenen mechanischen Schalter würde die Anzahl dieser Bedienelemente in der Größenordnung von grossen Konzertorgeln oder Studiomischpulten liegen, nur eben auf der kleinen Fläche im Fahrzeugcockpit. Werden nun die Geräte von verschiedenen Herstellern geliefert, sieht jedes anders aus und besitzt eine firmenspezifische Bedienlogik. Fazit: Trotz aller Funktionalität wird die Flut an Technik unbedienbar. Natürlich haben die Fahrzeughersteller schon längst reagiert und multifunktionale Schnittstellen mit Display und wenigen Bedienelementen entwickelt, wobei die eigentlichen Geräte räumlich über das Fahrzeug verteilt sind und über ein Bussystem mit der Bedieneinheit kommunizieren. Allerdings ergibt sich nun das nächste Problem: Wie gestaltet man eine solche Bedieneinheit, wie kann man Sprache, Akustik und Gesten integrieren und wie erleichtert man dem Fahrer den Umgang mit all der Technik. Hier kommen wir ins Spiel. Für unsere Forschungen steht uns ein speziell eingerichter Fahrssimulator zur Verfügung, der uns Usability-Versuche mit Fahrzeugbedienungen gestattet. Hierbei wird ein MMI (Man-Machine-Interface) durch einen Rechner simuliert. Durch Wizard-of-Oz Versuche können wir in einer Art Rapid Prototyping neue Techniken ausprobieren, indem ein “Wizard” die Versuchspersonen beobachtet und im Hintergrund den Versuchsrechner bedient. Die Versuchsperson bekommt dadurch den Eindruck, tatsächlich mit einer Geste oder einem Satz Aktionen zu bewirken. Erweisen sich bestimmte Bedienungsmethoden als vorteilhaft können diese tatsächlich umgesetzt werden und beispielsweise auf dem Lehrstuhlnetzwerk realisiert werden. Diese Vorgehensweise erscheint zunächst seltsam, hat sich aber als absolut effektiv bewährt.

Ready.

Comments are closed