Past Projects

  1. Affect.AI: Voice analysis for Randomised Controlled Trials
    MedTech Superconnector (MTSC) Accelerator Programme Pilot Project
    Runtime: 06.01.2020 – 30.04.2021
    Role: Principal Investigator
    Partners: Imperial College London
    The project deals with voice analysis based on digital biomarkers of depression in the voice for randomised controlled trials in the context of depression.
  2. AUDEO: Audio-basierte Herkunftsland-Erkennung von Migranten
    BMBF IKT2020-Grant (Forschungsprogramm Zivile Sicherheit – Anwender-innovativ: Forschung für die zivile Sicherheit)
    Runtime: 01.06.2019 – 31.05.2021
    Role: Beneficiary
    Partners: Bundespolizeipräsidium, Hochschule für Medien, Kommunikation und Wirtschaft GmbH,  audEERING GmbH
    Ziel des Vorhabens ist die Entwicklung einer juristisch-belastbaren, akkuraten Stimmanalyse-Software zur vereinfachten, objektiven und echtzeitfähigen Bestimmung der 10 relevantesten Herkunftsländer von Personen im Migrationskontext.
  3. HOL-DEEP-SENSE: Holistic Deep Modelling for User Recognition and Affective Social Behaviour Sensing
    EU Horizon 2020 Marie Skłodowska-Curie action Individual Fellowship (MASCA-IF 2017)
    Runtime: 01.10.2018 – 31.03.2021
    Role: Coauthor Proposal, Coordinator, Beneficiary, Supervisor
    Partners: University of Augsburg, Massachussetts Insititute of Technology, Technische Universität MünchenThe “Holistic Deep Modelling for User Recognition and Affective Social Behaviour Sensing” (HOL-DEEP-SENSE) project aims at augmenting affective machines such as virtual assistants and social robots with human-like acumen based on holistic perception and understanding abilities. Social competencies comprising context awareness, salience detection and affective sensitivity present a central aspect of human communication, and thus are indispensable for enabling natural and spontaneous human-machine interaction. Therefore, with the aim to advance affective computing and social signal processing, we envision a “Social Intelligent Multi-modal Ontological Net” (SIMON) that builds on technologies at the leading edge of deep learning for pattern recognition. In particular, our approach is driven by multi-modal information fusion using end-to-end deep neural networks trained on large datasets, allowing SIMON to exploit combined auditory, visual and physiological analysis. In contrast to standard machine learning systems, SIMON makes use of task relatedness to adapt its topology within a novel construct of subdivided neural networks. Through deep affective feature transformation, SIMON is able to perform associative domain adaptation via transfer and multi-task learning, and thus can infer user characteristics and social cues in a holistic context. This new unified sensing architecture will enable affective computers to assimilate ontological human phenomena, leading to a step change in machine perception. This will offer a wide range of applications for health and wellbeing in future IoT-inspired environments, connected to dedicated sensors and consumer electronics. By verifying the gains through holistic sensing, the project will show the true potential of the much sought-after emotionally and socially intelligent AI, and herald a new generation of machines with hitherto unseen skills to interact with humans via universal communication channels.
  4. Sentiment Analyse
    Industry Cooperation with BMW AG
    Runtime: 01.05.2018 – 30.04.2021
    Role: Principal Investigator
    Partners: University of Augsburg, BMW AG
    The project aims at real-time internet-scale sentiment analysis in unstructured multimodal data in the wild.
  5. An Embedded Soundscape System for Personalised Wellness via Multimodal Bio-Signal and Speech Monitoring – 7% acceptance rate in the call
    ZD.B Fellowship
    Runtime: 01.01.2018 – 31.12.2020
    Role: Supervisor, Co-Author Proposal
    Partners:  University of Augsburg
    The main research aim is to explore how diverse multimodal data can inform the production of personalised embedded soundscapes, and how such digitally produced soundscapes can improve human wellness. As highlighted by ZD.B Digital Health / Medicine, digitisation in health care shows great potential. The proposed could be effective in a variety of scenarios, including nervousness. Imagine the hours before an important presentation and the presenter’s nerves are building. The presenter could use a smart-device application, to provide a speech instance (whilst monitoring pulse). The application returns a (user dependent) soundscape which clinically reduces the negative feeling. To explore this, the project will be divided into 3 phases (detailed in section 5), each a fundamental part for development of such wellness systems. Questions will arise, pertaining to both human audible, and speech perception with observations of current ‘norms’ in data science, contributing to the ethics involved in artificial intelligence.
  6. OPTAPEB: Optimierung der Psychotherapie durch Agentengeleitete Patientenzentrierte Emotionsbewältigung (#V5IKM010)
    BMBF IKT2020-Grant (Forschungsprogramm zur Mensch-Technik-Interaktion: Technik zum Menschen bringen – Interaktive körpernahe Medizintechnik)

    Runtime: 01.11.2017 – 31.10.2020
    Role: Beneficiary
    Partners: Universität Regensburg, Fraunhofer IIS, VTplus GmbH, Ambiotex GmbH, NTT GmbH, eHealthLabs, audEERING GmbH
    OPTAPEB aims to develop an immersive and interactive virtual reality system that assists users in curing phobia. The system will allow to experience situations of phobia and protocol this emotional experience and the user’s behaviour. Various levels of emotional reactions will be monitored continuously and in real time by the system that applies sensors based on innovative e-wear technology, speech signals, and other pervasive technologies (e.g. accelerometres). A further goal of the project is the development of a game-like algorithm to control the user experience of anxieties through exposure therapy and to adapt the course of the therapy to the user needs and the current situation automatically.

  7. ACLEW: Analyzing Child Language Experiences Around the World (HJ-253479) – 14 winning projects in total
    T-AP (Trans-Atlantic Platform for the Social Sciences and Humanities along with Argentina (MINCyT), Canada (SSHRC, NSERC), Finland (AKA), France (ANR), United Kingdom (ESRC/AHRC), United States (NEH)) Digging into Data Challenge 4th round

    Runtime: 01.06.2017 – 31.05.2020
    Role: Principal Investigator, Co-Author Proposal
    Partners: Duke University, École Normale Supérieure, Aalto University, CONICET, Imperial College London, University of Manitoba, Carnegie Mellon University, University of Toronto
    An international collaboration among linguists and speech experts to study child language development across nations and cultures to gain a better understanding of how an infant’s environment affects subsequent language ability.

  8. Evolutionary Computing: The Changing MindHiPEDS EPSRC, Imperial College, Industry integrated Centre for Doctoral Training (CDT)
    Runtime: 01.04.2017 – 31.03.2021
    Role: Supervisor
    Partners: Imperial College London
    This project aims to (1) innovate upon NeuroEvolution of Augmenting Topologies (NEAT), (2) permit function extraction for Transfer Learning, (3) find ways to merge evolutionary computation with broader systems and (4) deploy methods using the latest processing technology – “NeuroMorphic” computing chips.
  9. ZAM: Zero-resource keyword recognition for Audio Mass-Data (Zero-Resource Schlagworterkennung bei Audio-Massendaten)
    Runtime: 01.12.2016 – 31.08.2017
    Role: Coauthor Proposal, Beneficiary, Principal Investigator
    Partners: University of Passau and others
    To process mass audio data captured by a range of diverse sensors, technical solutions within the field of keyword recognition shall be investigated. It shall be shown which approaches simplify, accelerate, and optimise audio analysis as well as optimise manual work processes. The major aim thereby is to significantly reduce human work load by utmost automation given the following focus: 1) limited to no resources (“zero resource”) for training and 2) answering the question on how low audio quality can be when reasonably processing audio highly automatically.
  10. Deep Learning Speech Enhancement
    Industry Cooperation with HUAWEI TECHNOLOGIES
    Runtime: 12.11.2016 – 11.11.2018
    Role: Principal Investigator, Author Proposal
    Partners: University of Passau, University of Augsburg, HUAWEI TECHNOLOGIES
    The research target of this project is to develop state-of-the-art methods for speech enhancement based on deep learning. The aim is to overcome limitations in challenging scenarios that are posed by non-stationary noise and distant speech with a potentially moving device and potentially limited power and memory on the device. It will be studied how deep learning speech enhancement can successfully be applied to multi-channel input signals. Furthermore, an important aspect is robustness and adaptation to unseen conditions, such as different noise types.
  11. Deep Learning Methods for End-to-End Modeling of Multimodal Phenomena (#1806264)
    HiPEDS EPSRC / Imperial College, Industry integrated Centre for Doctoral Training
    Runtime: 01.10.2016 – 30.09.2020
    Role: Supervisor, Co-author Proposal
    Partners: Imperial College London
    Automatic affect recognition in real-world environments is an important task towards a complete interaction between humans and machines. The main challenges that arise towards that goal are the uncontrolled conditions that exist in such environments, and the various modalities emotions can be expressed with. The last 10 years, several advancements have been accomplished in determining emotional states with the use of Deep Neural Networks (DNNs). To this end, in this project we investigate developing methods and algorithms utilizing DNNs for classification of audio-visual phenomena including audio-visual speech recognition and audio-visual behavior understanding and subject characterization.
  12. EngageME: Automated Measurement of Engagement Level of Children with Autism Spectrum Conditions during Human-robot Interaction (#701236) – 14.4% acceptance rate in the call
    EU Horizon 2020 Marie Skłodowska-Curie action Individual Fellowship (MASCA-IF 2015)

    Runtime: 01.09.2016 – 31.08.2019
    Role: Coauthor Proposal, Coordinator, Beneficiary, Supervisor
    Partners: University of AugsburgUniversity of Passau, Massachussetts Insititute of Technology
    Engaging children with ASC (Autism Spectrum Conditions) in communication centred activities during educational therapy is one of the cardinal challenges by ASC and contributes to its poor outcome. To this end, therapists recently started using humanoid robots (e.g., NAO) as assistive tools. However, this technology lacks the ability to autonomously engage with children, which is the key for improving the therapy and, thus, learning opportunities. Existing approaches typically use machine learning algorithms to estimate the engagement of children with ASC from their head-pose or eye-gaze inferred from face-videos. These approaches are rather limited for modeling atypical behavioral displays of engagement of children with ASC, which can vary considerably across the children. The first objective of EngageME is to bring novel machine learning models that can for the first time effectively leverage multi-modal behavioural cues, including facial expressions, head pose, vocal and physiological cues, to realize fully automated context-sensitive estimation of engagement levels of children with ASC. These models build upon dynamic graph models for multi-modal ordinal data, based on state-of-the-art machine learning approaches to sequence classification and domain adaptation, which can adapt to each child, while still being able to generalize across children and cultures. To realize this, the second objective of EngageME is to provide the candidate with the cutting-edge training aimed at expanding his current expertise in visual processing with expertise in wearable/physiological, and audio technologies, from leading experts in these fields. EngageME is expected to bring novel technology/models for endowing assistive robots with ability to accurately ‘sense’ engagement levels of children with ASC during robot-assisted therapy, while providing the candidate with a set of skills needed to become one of the frontiers in the emerging field of affect-sensitive assistive technology.
  13. DE-ENIGMA: Multi-Modal Human-Robot Interaction for Teaching and Expanding Social Imagination in Autistic Children (#688835) – 6.9% acceptance rate in the call

    EU Horizon 2020 Research & Innovation Action (RIA)
    Runtime: 01.02.2016 – 31.07.2019
    Role: Coauthor Proposal, Beneficiary, Principal Investigator, WP Leader
    Partners: University of Twente, Savez udruzenja Srbije za pomoc osobama sa autizmom, Autism-Europe, IDMIND, University College London, University of AugsburgUniversity of Passau, Romane Institute of Mathematics Simion Stoilow of the Romanian Academy, Imperial College London
    Autism Spectrum Conditions (ASC, frequently defined as ASD — Autism Spectrum Disorders) are neurodevelopmental conditions, characterized by social communication difficulties and restricted and repetitive behaviour patterns. There are over 5 million people with autism in Europe – around 1 in every 100 people, affecting lives of over 20 million people each day. Alongside their difficulties, individuals with ASC tend to have intact and sometimes superior abilities to comprehend and manipulate closed, rule-based, predictable systems, such as robotbased technology. Over the last couple of years, this has led to several attempts to teach emotion recognition and expression to individuals with ASC, using humanoid robots. This has been shown to be very effective as an integral part of the psychoeducational therapy for children with ASC. The main reason for this is that humanoid robots are perceived by children with autism as being more predictable, less complicated, less threatening, and more comfortable to communicate with than humans, with all their complex and frightening subtleties and nuances. The proposed project aims to create and evaluate the effectiveness of such a robot-based technology, directed for children with ASC. This technology will enable to realise robust, context-sensitive (such as user- and culture-specific), multimodal (including facial, bodily, vocal and verbal cues) and naturalistic human-robot interaction (HRI) aimed at enhancing the social imagination skills of children with autism. The proposed will include the design of effective and user-adaptable robot behaviours for the target user group, leading to more personalised and effective therapies than previously realised. Carers will be offered their own supportive environment, including professional information, reports of child’s progress and use of the system and forums for parents and therapists.
  14. U-STAR: Universal Speech Translation Advanced Research
    Academic Cooperation

    Runtime: 01.01.2016 – 30.09.2017
    Role: Consortial Partner
    Partners: University of Passau and 36 further partners – cf. homepage
    The Universal Speech Translation Advanced Research Consortium (U-STAR) is an international research collaboration entity formed to develop a network-based speech-to-speech translation (S2ST) with the aim of breaking language barriers around the world and to implement vocal communication between different languages.
  15. Promoting Early Diagnosis of Rett Syndrome through Speech-Language Pathology
    (Akustische Parameter als diagnostische Marker zur Früherkennung von Rett-Syndrom) (#16430)
    Österreichische Nationalbank (OeNB) Jubiläumsfonds
    Runtime: 01.11.2015 – 31.10.2019
    Role: Main Cooperation Partner
    Partners: Medical University of Graz, Karolinska Institutet, Boston Children’s Hospital and Harvard Medical School, University of Passau, Imperial College London, Victoria University of Wellington
  16. VocEmoApI: Voice Emotion detection by Appraisal Inference (#230331)

    EU Horizon 2020 ERC Proof of Concept Grant (PoC 2015) – 46% acceptance rate in the call
    Runtime: 01.11.2015 – 30.04.2017
    Role: Coauthor Proposal, Beneficiary, LEAR
    Partners: audEERING GmbH
    The automated sensing of human emotions has gained a lot of commercial attention lately. For facial and physiological sensing many companies offer first professional products. Recently, voice analytics has become a hot topic, too, with first companies emerging for the telecom, entertainment, and robot markets (e.g. Sympalog, audEERING, Aldebaran, etc.). Current vocal emotion detection approaches rely on machine learning where emotions are identified based on a reference set of expression clips. The drawback of this method is the need to rely on a small set of basic, highly prototypical emotions. Real life emotion detection application fields such as clinical diagnosis, marketing research, media impact analysis, and forensics and security, require subtle differentiations of feeling states. VocEmoApI will develop a proof-of-concept software for vocal emotion detection based on a fundamentally different approach: Focusing on vocal nonverbal behavior and sophisticated acoustic voice analysis, it exploits the building blocks of emotional processes – a person’s appraisal of relevant events and situations which trigger action tendencies and expressions which constitute an emotional episode. Evidence for emotion-antecedent appraisals will be continuously tracked in running speech. The approach can infer not only basic emotion categories but also much finer distinctions such as subcategories of emotion families and subtle emotions. The development of VocEmoApI draws extensively from the results of the applicant’s Advanced Grant, providing a solid theoretical basis. Market analysis through marketing research partners will be conducted and the prototype software will be utilized to promote the technology and estimate a product value based on feedback from industry contacts. A massive impact of VocEmoApI on large markets such as household robotics, public security, clinical diagnosis and therapy, call analytics, and marketing research is to be expected.
  17. EmotAsS: Emotionsensitive Assistance System (#16SV7213)

    BMBF IKT2020-Grant (Sozial- und emotionssensitive Systeme für eine optimierte Mensch-Technik-Interaktion)
    Runtime: 01.06.2015 – 31.05.2018
    Role: Coauthor Proposal, Beneficiary, Principal Investigator
    Partners: University of Bremen, University of AugsburgUniversity of Passau, vacances Mobiler Sozial- und Pflegedienst GmbH, Martinshof (Werkstatt Bremen), Meier und Schütte GmbH und Co. KG.
    The aim of the project is to develop and investigate emotion detection and according usage for interaction processes in manufactories for handicapped individuals. It is therefore intended to develop a system, which reliably recognizes, responds, and reacts appropriately to emotions of people with disabilities during their everyday work routinge. The findings are to be transferred to further fields of application, and tested in particular for the communication with dementia patients.
    (Original German description: Emotionen und deren Erkennung in der gesprochenen Sprache sind für die erfolgreiche Mensch-Technik- Interaktion wichtig, insbesondere bei Menschen mit Erkrankungen oder Behinderungen. Ziel des Projekts ist es, Emotionserkennung und deren Nutzung für Interaktionsprozesse in Werkstätten für behinderte Menschen zu entwickeln und zu untersuchen. Es soll daher ein System entwickelt werden, das sicher Emotionen bei Menschen mit Behinderungen in der Sprache erkennt und angemessen und unterstützend auf diese reagiert. Die Erkenntnisse sollen auf ein weiteres Anwendungsgebiet übertragen und in der Kommunikation mit Demenzerkrankten erprobt werden.)
  18. MixedEmotions: Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets (#644632)

    EU Horizon 2020 Innovation Action (IA) – 12.5% acceptance rate in the call
    Runtime: 01.04.2015 – 31.03.2017
    Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
    Partners: NUI Galway, Univ. Polit. Madrid, University of Passau, Expert Systems, Paradigma Tecnológico, TU Brno, Sindice Ltd., Deutsche Welle, Phonexia SRO, Adoreboard, Millward Brown
    MixedEmotions will develop innovative multilingual multi-modal Big Data analytics applications that will analyze a more complete emotional profile of user behavior using data from mixed input channels: multilingual text data sources, A/V signal input (multilingual speech, audio, video), social media (social network, comments), and structured data. Commercial applications (implemented as pilot projects) will be in Social TV, Brand Reputation Management and Call Centre Operations. Making sense of accumulated user interaction from different data sources, modalities and languages is challenging and has not yet been explored in fullness in an industrial context. Commercial solutions exist but do not address the multilingual aspect in a robust and large-scale setting and do not scale up to huge data volumes that need to be processed, or the integration of emotion analysis observations across data sources and/or modalities on a meaningful level. MixedEmotions will implement an integrated Big Linked Data platform for emotion analysis across heterogeneous data sources, different languages and modalities, building on existing state of the art tools, services and approaches that will enable the tracking of emotional aspects of user interaction and feedback on an entity level. The MixedEmotions platform will provide an integrated solution for: large-scale emotion analysis and fusion on heterogeneous, multilingual, text, speech, video and social media data streams, leveraging open access and proprietary data sources, and exploiting social context by leveraging social network graphs; semantic-level emotion information aggregation and integration through robust extraction of social semantic knowledge graphs for emotion analysis along multidimensional clusters.
  19. SEWA: Automatic Sentiment Estimation in the Wild (#645094)

    EU Horizon 2020 Innovation Action (IA) – 9.3% acceptance rate in the call
    Runtime: 01.02.2015 – 31.07.2018
    Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
    Partners: Imperial College London, University of AugsburgUniversity of Passau, PlayGen Ltd, RealEyes
    The main aim of SEWA is to deploy and capitalise on existing state-of-the-art methodologies, models and algorithms for machine analysis of facial, vocal and verbal behaviour, and then adjust and combine them to realise naturalistic human-centric human-computer interaction (HCI) and computer-mediated face-to-face interaction (FF-HCI). This will involve development of computer vision, speech processing and machine learning tools for automated understanding of human interactive behaviour in naturalistic contexts. The envisioned technology will be based on findings in cognitive sciences and it will represent a set of audio and visual spatiotemporal methods for automatic analysis of human spontaneous (as opposed to posed and exaggerated) patterns of behavioural cues including continuous and discrete analysis of sentiment, liking and empathy.
  20. ARIA-VALUSPA: Artificial Retrieval of Information Assistants – Virtual Agents with Linguistic Understanding, Social skills, and Personalised Aspects (#645378)

    EU Horizon 2020 Research & Innovation Action (RIA) – 9.3% acceptance rate in the call
    Runtime: 01.01.2015 – 31.12.2017
    Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
    Partners: University of Nottingham, Imperial College London, CNRS, University of Augsburg, University of Twente, Cereproc Ltd, La Cantoche Production
    The ARIA-VALUSPA project will create a ground-breaking new framework that will allow easy creation of Artificial Retrieval of Information Assistants (ARIAs) that are capable of holding multi-modal social interactions in challenging and unexpected situations. The system can generate search queries and return the information requested by interacting with humans through virtual characters. These virtual humans will be able to sustain an interaction with a user for some time, and react appropriately to the user’s verbal and non-verbal behaviour when presenting the requested information and refining search results. Using audio and video signals as input, both verbal and non-verbal components of human communication are captured. Together with a rich and realistic emotive personality model, a sophisticated dialogue management system decides how to respond to a user’s input, be it a spoken sentence, a head nod, or a smile. The ARIA uses special speech synthesisers to create emotionally coloured speech and a fully expressive 3D face to create the chosen response. Back-channelling, indicating that the ARIA understood what the user meant, or returning a smile are but a few of the many ways in which it can employ emotionally coloured social signals to improve communication. As part of the project, the consortium will develop two specific implementations of ARIAs for two different industrial applications. A ‘speaking book’ application will create an ARIA with a rich personality capturing the essence of a novel, whom users can ask novel-related questions. An ‘artificial travel agent’ web-based ARIA will be developed to help users find their perfect holiday – something that is difficult to do with existing web interfaces such as those created by or tripadvisor.
  21. Automatic General Audio Signal Classification
    China Scholarship Council
    Runtime: 01.09.2014 – 31.08.2018
    Role: Supervisor
    Partners: TUM
  22. Speech Emotion Recognition using Nonlinear Dimensionality Reduction Methods
    China Scholarship Council
    Runtime: 01.08.2014 – 31.07.2016
    Role: Supervisor
    Partners: TUM
  23. In-car music recommendation system based on driver’s emotion
    TUM University Foundation Fellowship
    Runtime: 01.05.2014 – 30.04.2015
    Role: Supervisor
    Partners: TUM
  24. iHEARu: Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics (#338164)

    FP7 ERC Starting Grant (StG) – 8.6% acceptance rate in the call (7% in Computer Science)
    Runtime: 01.01.2014 – 31.12.2018
    Role: Author Proposal, Principal Investigator and Grant Holder
    Partners: University of AugsburgUniversity of Passau, TUM
    Recently, automatic speech and speaker recognition has matured to the degree that it entered the daily lives of thousands of Europe’s citizens, e.g., on their smart phones or in call services. During the next years, speech processing technology will move to a new level of social awareness to make interaction more intuitive, speech retrieval more efficient, and lend additional competence to computer-mediated communication and speech-analysis services in the commercial, health, security, and further sectors. To reach this goal, rich speaker traits and states such as age, height, personality and physical and mental state as carried by the tone of the voice and the spoken words must be reliably identified by machines. In the iHEARu project, ground-breaking methodology including novel techniques for multi-task and semi-supervised learning will deliver for the first time intelligent holistic and evolving analysis in real-life condition of universal speaker characteristics which have been considered only in isolation so far. Today’s sparseness of annotated realistic speech data will be overcome by large-scale speech and meta-data mining from public sources such as social media, crowd-sourcing for labelling and quality control, and shared semi-automatic annotation. All stages from pre-processing and feature extraction, to the statistical modelling will evolve in “life-long learning” according to new data, by utilising feedback, deep, and evolutionary learning methods. Human-in-the-loop system validation and novel perception studies will analyse the self-organising systems and the relation of automatic signal processing to human interpretation in a previously unseen variety of speaker classification tasks. The project’s work plan gives the unique opportunity to transfer current world-leading expertise in this field into a new de-facto standard of speaker characterisation methods and open-source tools ready for tomorrow’s challenge of socially aware speech analysis.
  25. U-STAR: Universal Speech Translation Advanced Research
    Academic Cooperation

    : 01.06.2012 – 31.03.2013
    Role: Consortial Partner
    Partners: TUM and others.
    The Universal Speech Translation Advanced Research Consortium (U-STAR) is an international research collaboration entity formed to develop a network-based speech-to-speech translation (S2ST) with the aim of breaking language barriers around the world and to implement vocal communication between different languages.
  26. ASC-INCLUSION: Integrated Internet-Based Environment for Social Inclusion of Children with Autism Spectrum Conditions (#289021)

    EU FP7 Specific Targeted Research Project (STREP)
    Runtime: 01.11.2011 – 31.12.2014
    Role: Coordinator, Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
    Partners: University of Cambridge, Bar Ilan University, Compedia, University of Genoa, Karolinska Institutet, Autism Europe, TUM, Koc University, Spectrum ASC-Med
    Autism Spectrum Conditions (ASC, frequently defined as ASD – Autism Spectrum Disorders) are neurodevelopmental conditions, characterized by social communication difficulties and restricted and repetitive behaviour patterns. Current studies suggest 1% of the population might fit an ASC diagnosis. Alongside their difficulties individuals with ASC tend to have intact and sometimes superior abilities to comprehend and manipulate closed, rule-based, predictable systems, such as computerized environment. Their affinity for the computerized environment has led to several attempts to teach emotion recognition and expression, and social problem solving to individuals with ASC, using computer-based training.
    In the last decade, web applications have been increasingly used for social interaction, forming online communities and social networks. Anecdotal reports of the emergence of online autistic communities, and the use of forums and virtual-worlds, show the great promise the internet holds for better inclusion and social skills training for users/people with ASC. Since intervention into ASC has been shown to be more effective when provided early in life, using the internet as a platform for the support of younger individuals with ASC could significantly promote their social inclusion.
    The project aims to create and evaluate the effectiveness of such an internet-based platform, directed for children with ASC (and other groups like ADHD and socially-neglected children) and those interested in their inclusion. This platform will combine several state-of-the art technologies in one comprehensive virtual world, including analysis of users’ gestures, facial and vocal expressions using standard microphone and webcam, training through games, text communication with peers and smart agents, animation, video and audio clips. User’s environment will be personalized, according to individual profile & sensory requirements, as well as motivational. Carers will be offered their own supportive environment, including professional information, reports of child’s progress and use of the system and forums for parents and therapists.
  27. Semi-Supervised Learning in the Analysis of Continuous Speaker Emotion and Personality
    China Scholarship Council
    Runtime: 01.08.2011 – 31.07.2015
    Role: Supervisor
    Partners: TUM
  28. Highly Robust Interest and Emotion Recognition from Speech
    China Scholarship Council
    Runtime: 01.08.2011 – 30.09.2012
    Role: Supervisor
    Partners: TUM
  29. Kontextsensitive automatische Erkennung spontaner Sprache mit BLSTM-Netzwerken (#SCHU2508-4/1)
    (“Context-Sensitive Automatic Recognition of Spontaneous Speech by BLSTM Networks”)
    DFG (German Research Foundation) Project
    Runtime: 01.03.2011 – 28.02.2014
    Role: Principal Investigator, Author Proposal
    Partners: TUM
    Trotz zahlreicher Fortschritte im Bereich der automatischen Spracherkennung ist die Erkennungsleistung und Robustheit heutiger Spracherkennungssysteme nicht ausreichend, um als Grundlage für natürliche, spontansprachliche Mensch-Maschine-Interaktion zu dienen. Ziel des Forschungsvorhabens ist es deshalb, die Genauigkeit von Systemen zur Erkennung natürlicher, fließender Sprache mittels neuartiger Mustererkennungsmethoden zu verbessern. Da die Effizienz der menschlichen Spracherkennung vor allem auf der intelligenten Auswertung von Langzeit-Kontextinformation beruht, sollen dabei Ansätze zur Berücksichtigung von Kontext auf Merkmalsebene verfolgt werden. Ausgehend von sogenannten Tandem-Spracherkennern, bei denen neuronale Netze zur Phonemprädiktion in Kombination mit dynamischen Klassifikatoren verwendet werden, sollen hierzu bidirektionale Long Short-Term Memory (BLSTM) Netzwerke eingesetzt werden. Im Gegensatz zu derzeit in Tandem- Systemen verwendeten Phonemschätzern erlaubt es das BLSTM-Prinzip, ein optimales Maß an Kontextinformation bei der Prädiktion miteinzubeziehen. Da jüngste Erfolge im Bereich der kontextsensitiven Phonemerkennung und Schlüsselwortdetektion die Effektivität des BLSTM-Ansatzes unterstreichen, ist eine entsprechende Weiterentwicklung kontinuierlicher Spracherkennungssysteme äußerst vielversprechend.
  30. GLASS: Generic Live Audio Source Separation
    Industry Cooperation with HUAWEI TECHNOLOGIES within the HUAWEI Innovative Research Program (HIRP)
    Runtime: 01.01.2011 – 31.12.2013
    Role: Principal Investigator, Author Proposal
    Partners: TUM and HUAWEI
    GLASS finds new ways of separating audio sources, e.g., for crystal clear speech communication, by machine intelligence and advanced separation algorithms.
  31. Novel Approaches for Large Vocabulary Continuous Speech Recognition
    China Scholarship Council
    Runtime: 01.08.2010 – 31.07.2014
    Role: Supervisor
    Partners: TUM
  32. Nichtnegative Matrix-Faktorisierung zur störrobusten Merkmalsextraktion in der Sprachverarbeitung (#SCHU2508-2/1)
    (“Non-Negative Matrix Factorization for Robust Feature Extraction in Speech Processing”)
    DFG (German Research Foundation) Project
    Runtime: 01.06.2010 – 31.05.2013
    Role: Principal Investigator, Author Proposal
    Partners: TUM
    Hauptziel des Forschungsvorhabens ist, die Erkennung von Sprach- und Musiksignalen störrobuster zu gestalten. Besonderes Kennzeichen ist die Integration von Merkmalen, die auf Nichtnegativer Matrix-Faktorisierung (NMF) basieren. NMF – ein Verfahren zur Datenreduktion – erfreut sich in der Signalverarbeitung jüngst zunehmender Popularität. Dabei wird meist ein Spektrogramm in zwei Faktoren zerlegt. Der erste enthält eine spektrale ‘Basis’ des Signals, der zweite die Aktivität der Basisvektoren über die Zeit. In diesem Forschungsvorhaben werden aus dem zweiten Faktor Merkmale gewonnen, die bestehende Architekturen zur Sprach- und Musikverarbeitung ergänzen können. Erste durchgeführte Experimente zur NMF-Merkmalsextraktion für die störrobuste Erkennung gesprochener Buchstabensequenzen im Fahrzeug haben sich dabei konventionellen Verfahren als signifikant überlegen und äußerst vielversprechend erwiesen. Das dabei verwendete Verfahren soll im Rahmen des Projekts durch Weiterentwicklung der NMF verbessert werden und insbesondere für den Einsatz in echtzeitfähigen Spracherkennungssystemen, auch für fließende Sprache, vorbereitet werden. Schließlich sollen die beschriebenen NMF-Merkmale in weiteren Anwendungsfeldern wie Emotionserkennung, Erkennung von nichtlinguistischer Vokalisierung wie Lachen oder Husten in Sprache und Akkorderkennung mit dem Ziel der Steigerung aktueller Erkennungsgüte und der Störrobustheit eingesetzt werden.
  33. TCVC: Talking Car and Virtual Companion
    Industry Cooperation with Continental Automotive GmbH
    Runtime: 01.06.2008 – 30.11.2008
    Role: Principal Investigator, Author Proposal
    Partners: TUM and Continental Automotive GmbH
    TCVC provides an expertise on emotion in the car with respect to a requirement analysis, potential and near-future use-cases, technology assessment and a user acceptance study.
  34. ICRI: In-Car Real Internet
    Industry Cooperation with Continental Automotive GmbH
    Runtime: 01.06.2008 – 30.11.2008
    Role: Principal Investigator, Author Proposal
    Partners: TUM and Continental Automotive GmbH
    ICRI aims at benchmarking of internet browsers on embedded platforms as well as at development of an integrated multimodal demonstrator for internet in the car. Investigated modalities contain hand-writing and touch-gestures and natural speech apart from conventional GUI interaction. The focus lies on MMI development with an embedded realisation.
  35. PROPEREMO: Production and Perception of Emotions: An affective sciences approach (#230331)
    FP7 ERC Advanced Grant
    Runtime: 01.03.2008 – 28.02.2015
    Role: Participant
    Partners: University of Geneva (PI Klaus Scherer), TUM, Free University of Berlin
    Emotion is a prime example of the complexity of human mind and behaviour, a psychobiological mechanism shaped by language and culture, which has puzzled scholars in the humanities and social sciences over the centuries. In an effort to reconcile conflicting theoretical traditions, we advocate a componential approach which treats event appraisal, motivational shifts, physiological responses, motor expression, and subjective feeling as dynamically interrelated and integrated components during emotion episodes. Using a prediction-generating theoretical model, we will address both production (elicitation and reaction patterns) and perception (observer inference of emotion from expressive cues). Key issues are the cognitive architecture and mental chronometry of appraisal, neurophysiological structures of relevance and valence detection, the emergence of conscious feelings due to the synchronization of brain/body systems, the generating mechanism for motor expression, the dimensionality of affective space, and the role of embodiment and empathy in perceiving and interpreting emotional expressions. Using multiple paradigms in laboratory, game, simulation, virtual reality, and field settings, we will critically test theory-driven hypotheses by examining brain structures and circuits (via neuroimagery), behaviour (via monitoring decisions and actions), psychophysiological responses (via electrographic recording), facial, vocal, and bodily expressions (via micro-coding and image processing), and conscious feeling (via advanced self-report procedures). In this endeavour, we benefit from extensive research experience, access to outstanding infrastructure, advanced analysis and synthesis methods, validated experimental paradigms as well as, most importantly, from the joint competence of an interdisciplinary affective science group involving philosophers, linguists, psychologists, neuroscientists, behavioural economists, anthropologists, and computer scientists.
  36. SEMAINE: Sustained Emotionally coloured Machine-humane Interaction using Nonverbal Expression (#211486)

    Runtime: 01.01.2008 – 31.12.2010
    Role: Principal Investigator, Coauthor Proposal (highest ranked in the call), Project Steering Board Member, Workpackage Leader
    Partners: DfKI, Queens University Belfast (QUB), Imperial College of Science, Technology and Medicine London, University Twente, University Paris VIII, CNRS-ENST, TUM
    SEMAINE deals with real-time, robust, non-verbally competent conversations between a conversational agent and a human user.
  37. cUSER 2
    Industry cooperation with Toyota
    Runtime: 08.01.2007 – 31.07.2007
    Role: Principal Investigator, Coauthor Proposal
    Partners: TUM and Toyota
    The aim of the cUSER follow-up project is to establish a system to interpret human interest by combined speech and facial expression analysis basing on multiple input analyses. Besides the aim of improved highest possible accuracy by subject adaptation, class balancing strategies, and fully automatic segmentation by individual audio and video stream analysis, cUSER 2 focuses on real-time application by cost-sensitive feature space optimization, graphics processing power utilization, and high-performance programming methods. Furthermore, feasibility and real recognition scenarios will be evaluated.
  38. cUSER
    Industry cooperation with Toyota
    Runtime: 01.08.2005 – 30.09.2006
    Role: Coauthor Proposal, Project Steering, Senior Researcher
    Partners: TUM and Toyota
    The aim of this project was an audiovisual approach to the recognition of spontaneous human interest. For a most robust estimate, information from four sources was combined for the first time (as reported in the literature) by a synergistic and individual failure tolerant early feature fusion: Firstly, speech is analyzed with respect to acoustic properties based on a high-dimensional prosodic, articulatory, and voice quality feature space plus the linguistic analysis of spoken content by ASR and bag-of-words vector space modeling. Secondly, visual analysis provides patterns of the facial expression by Active Appearance Models, and of the movement activity by eye tracking. Experiments were carried out on a video database of 10.5h of spontaneous human-to-human conversation collected throughout the project. It contains 20+ subjects in gender and age-class balance. Recordings were fulfilled considering diverse comfort and noise conditions. Multiple levels of interest were annotated within a rich transcription. Experiments aimed at person-independent and robust real-life usage and showed the high potential of such a multimodal approach. Benchmark results could further be provided based on transcription versus full automatic processing.
  39. CEICES: Combining Efforts for Improving automatic Classification of Emotional user States
    Reserach Initiative within the EU FP6 Network of Excellence (NoE) HUMAINE
    Runtime: 2005 – 2008
    Role: Invited Research Expert
    Partners: Friedrich Alexander Universität Erlangen-Nürnberg (FAU), Fondazione Bruno Kessler (FBK, formerly ITC-IRST), Universität Karlsruhe (UKA), Universität Augsburg (UA), LIMSI-CNRS, Tel Aviv University (TAU), Tel Aviv Academic College of Engineering (AFEKA), TUM
    CEICES is a co-operation between several sites dealing with classification of emotional user states conveyed via speech; this initiative was taken within the European Network of Excellence HUMAINE under the name CEICES (Combining Efforts for Improving automatic Classification of Emotional user States). The database used within CEICES is a German corpus with recordings of 51 ten- to thirteen-year old children communicating with Sony’s AIBO pet robot. Conceptualization, design and recordings were done at the originator site FAU. The approach to be followed within CEICES looked like this: the originator site provided speech files, phonetic lexicon, manually corrected word segmentation, and manually corrected F0 values, emotional labels, definition of train and test samples, etc. The data was annotated at the word level. All partners committed themselves to share with all the other partners their extracted feature values together with the necessary information (which feature models which acoustic or linguistic phenomenon, format of feature values, classifier used, etc.). Thus each site could assess the features provided by all other sites, together with their own features, aiming at a repertoire of optimal features.
  40. SOMMIA: Speech-oriented Man-Machine-Interface in the Automobile
    Industry Cooperation with SiemensVDO Automotive
    Runtime: 01.06.2000 – 30.09.2001
    Role: Coauthor Proposal, Project Steering, Researcher
    Partners: TUM and SiemensVDO Automotive
    The SOMMIA project focused on the design and evaluation of an ergonomic and generic operation concept for a speech-based MMI integrated in a car MP3-player or in comparable automotive applications. In addition, the system was subject to several economic and geometrical boundary conditions: a two line, 16 characters display with a small set of LEDs and a speaker- independent full word recognizer with 30 to 50 words active vocabulary. Nevertheless, the interface had to meet high technical requirements: its handling should be easy to learn, comfortable and, above all, intuitive and interactively explorable.
  41. FERMUS: Fehlerrobuste Multimodale Sprachdialoge
    Industry cooperation with BMW, DaimlerChrysler, Siemens, VDO Automotive
    Runtime: 01.03.2000 – 30.06.2003
    Role: Coauthor Proposal, Project Steering, Researcher
    Partners: TUM and BMW, DaimlerChrysler, Siemens, VDO
    The primary intention of the FERMUS-project was to localize and evaluate various strategies for a dedicated analysis of potential error patterns during human-machine interaction with information and communication systems in upper-class cars. For reaching this goal, we have employed a huge bundle of progressive and mainly recognition-based input modalities, like interfaces for natural speech and dynamic gestural input. Particularly, emotional patterns of the driver have beeen integrated for generating context-adequate dialog structures.
  42. ADVIA: Adaptive Dialogverfahren im Automobil
    Industry cooperation with BMW
    Runtime: 01.07.1998 – 31.12.2000
    Role: Coauthor Proposal, Project Steering, Researcher
    Partners: TUM and BMW
    In modernen Fahrzeugen der Luxusklasse, aber durch den ständigen Preisverfall bereits in Mittelklassefahrzeugen, koexistieren zahlreiche elektronische Geräte zur Steigerung des Komforts. Einige Beispiele hierfür sind: Klimaanlage, Navigations- und Telematiksysteme, Audiokomponenten wie CD-Wechsler und Radio, Funktelefon, Bordcomputer und vieles mehr. Da die meisten dieser Komponenten in irgendeiner Weise computerbasiert sind, kann man sich viele Funktionen ausdenken und diese implementieren. Spendiert man jeder dieser Funktionen einen eigenen mechanischen Schalter würde die Anzahl dieser Bedienelemente in der Größenordnung von grossen Konzertorgeln oder Studiomischpulten liegen, nur eben auf der kleinen Fläche im Fahrzeugcockpit. Werden nun die Geräte von verschiedenen Herstellern geliefert, sieht jedes anders aus und besitzt eine firmenspezifische Bedienlogik. Fazit: Trotz aller Funktionalität wird die Flut an Technik unbedienbar. Natürlich haben die Fahrzeughersteller schon längst reagiert und multifunktionale Schnittstellen mit Display und wenigen Bedienelementen entwickelt, wobei die eigentlichen Geräte räumlich über das Fahrzeug verteilt sind und über ein Bussystem mit der Bedieneinheit kommunizieren. Allerdings ergibt sich nun das nächste Problem: Wie gestaltet man eine solche Bedieneinheit, wie kann man Sprache, Akustik und Gesten integrieren und wie erleichtert man dem Fahrer den Umgang mit all der Technik. Hier kommen wir ins Spiel. Für unsere Forschungen steht uns ein speziell eingerichter Fahrssimulator zur Verfügung, der uns Usability-Versuche mit Fahrzeugbedienungen gestattet. Hierbei wird ein MMI (Man-Machine-Interface) durch einen Rechner simuliert. Durch Wizard-of-Oz Versuche können wir in einer Art Rapid Prototyping neue Techniken ausprobieren, indem ein “Wizard” die Versuchspersonen beobachtet und im Hintergrund den Versuchsrechner bedient. Die Versuchsperson bekommt dadurch den Eindruck, tatsächlich mit einer Geste oder einem Satz Aktionen zu bewirken. Erweisen sich bestimmte Bedienungsmethoden als vorteilhaft können diese tatsächlich umgesetzt werden und beispielsweise auf dem Lehrstuhlnetzwerk realisiert werden. Diese Vorgehensweise erscheint zunächst seltsam, hat sich aber als absolut effektiv bewährt.


Comments are closed.