Past Projects

Noise Embeddings with a Hearing Aid Tailored Deep Learning Noise Supression Framework
Industry Cooperation with Sivantos GmbH
Runtime: 01.07.2023 – 30.06.2024
Role: Principal Investigator, Co-author Proposal
Partner: Sivantos GmbH, University of Augsburg
The overall goal of this project is to develop a Noise Suppression Framework for hearing aids, which can be extended by so called “embeddings” to allow a certain modification of the noise reduction behavior without re-training of the overall system. In doing so, typical hearing aid requirements like the preservation of the desired speech, the overall delay of the system, and certain aspects of flexible parameterization (e.g. with respect to the amount of noise reduction should be considered.
Machine Learning für Kameradaten mit unvollständiger Annotation
Industry Cooperation with BMW AG
Runtime: 01.01.2022 – 31.12.2023
Role: Principal Investigator
Partners: University of Augsburg, BMW AG
The project aims at self-supervised and reinforced learning for analysis of camera data with incomplete annotation.
EASIER: Intelligent Automatic Sign Language Translation (#101016982)
EU Horizon 2020 Research & Innovation Action (RIA)
Runtime: 01.01.2021 – 31.12.2023
Role: Principal Investigator
Partners: Martel GmbH Martel, Athena Research & Innovation Center in Information Communication & Knowledge Technologies, Universität Hamburg, Radboud University, University of Surrey, University of Zurich, CNRS, DFKI, audEERING GmbH, nuromedia GmbH, Swiss TXT AG, European Union of the Deaf iVZW, SCOP Interpretis, University College London
EASIER aims to create a framework for barrier-free communication among deaf and hearing citizens across the EU by enabling users of European SLs to use their preferred language to interact with hearing individuals, via incorporation of state-of-the-art NMT technology that is capable of dealing with a wide range of languages and communication scenarios. To this end, it exploits a robust data-driven SL (video) recognition engine and utilizes a signing avatar engine that not only produces signing that is easy to comprehend by the deaf community but also integrates information on affective expressions and coherent prosody. The envisaged ecosystem will incorporate a robust translation service surrounded by numerous tools and services which will support equal participation of deaf individuals to the whole range of everyday-life activities within an inclusive community, and also accelerate the incorporation of less-resourced SLs into SL technologies, while it leverages the SL content creation industry. The deaf community is heavily involved in all project processes, while deaf researchers are among the staff members of all SL expert partners.
MARVEL: Multimodal Extreme Scale Data Analytics for Smart Cities Environments (#957337)
EU Horizon 2020 Research & Innovation Action (RIA)
Runtime: 01.01.2021 – 31.12.2023
Role: Principal Investigator
Partners: Idryma Technologies, Infineon, Aarhus University, Atos Spain, Consiglio Nazionale delle Ricerche, Intrasoft, FBK, audEERING GmbH, Tampereen Korkeakoulusaatio, Privanova, Sphynx Technology Solutions, Comune die Trento, Univerzitet u Novom Sadu Fakultet Tehnickih Nauka, Information Technology for Market Leadership, Greenroads Limited, Zelus Ike, Instytut Chemii Bio Organicnej Polskiej Akademii Nauk
The “Smart City” paradigm aims to support new forms of monitoring and managing of resources as well as to provide situational awareness in decision-making fulfilling the objective of servicing the citizen, while ensuring that it meets the needs of present and future generations with respect to economic, social and environmental aspects. Considering the city as a complex and dynamic system involving different interconnected spatial, social, economic, and physical processes subject to temporal changes and continually modified by human actions. Big Data, fog, and edge computing technologies have significant potential in various scenarios considering each city individual tactical strategy. However, one critical aspect is to encapsulate the complexity of a city and support accurate, cross-scale and in-time predictions based on the ubiquitous spatio-temporal data of high-volume, high-velocity and of high-variety. To address this challenge, MARVEL delivers a disruptive Edge-to-Fog-to-Cloud ubiquitous computing framework that enables multi-modal perception and intelligence for audio-visual scene recognition, event detection in a smart city environment. MARVEL aims to collect, analyse and data mine multi-modal audio-visual data streams of a Smart City and help decision makers to improve the quality of life and services to the citizens without violating ethical and privacy limits in an AI-responsible manner. This is achieved via: (i) fusing large scale distributed multi-modal audio-visual data in real-time; (ii) achieving fast time-to-insights; (iii) supporting automated decision making at all levels of the E2F2C stack; and iv) delivering a personalized federated learning approach, where joint multi modal representations and models are co-designed and improved continuously through privacy aware sharing of personalized fog and edge models of all interested parties.
MAIKI: Mobiler Alltagstherapieassistent mit interaktionsfokussierter künstlicher Intelligenz bei Depression
BMBF
Runtime: 01.10.2021 – 31.12.2021
Role: Principal Investigator, Co-author Proposal
Partners: FlyingHealth Incubator GmbH, GET.ON Institut für Online Gesundheitstrainings GmbH, University of Augsburg
Dieses Vorhaben zielt auf einen mobilen digitalen Assistenten ab, der den Patienten interaktiv, intelligent und individualisiert darin unterstützt, seine Therapie im Alltag effektiver umzusetzen. Hierzu werden Methoden künstlicher Intelligenz (Interaktionsanalyse mit Stimmanalyse und Natural Language Processing, Artificial Empathy, Maschinelles Lernen) mit dem Ziel erforscht und entwickelt, die Patienten-Assistenten-Interaktion zu optimieren, therapeutische Interventionen fallspezifisch zu optimieren und deren Umsetzung interaktiv auf intelligente und zugleich personalisierte Art zu unterstützen. Dieser digitale mobile Therapiebegleiter geht über den derzeitigen Stand der Technik hinaus, da er a) eine dauerhafte behandlungsrelevante Kommunikation mit dem Betroffenen aufrechterhält (was bisher die face-to-face Psychotherapie nicht vermag) und b) seine Empfehlungen fortlaufend an aktuelles Erleben, Verhalten und bisherigem Therapieverlauf der Betroffenen anpasst. Auf Basis dieser digitalen Therapieindividualisierung soll der Gesundungsprozess beschleunigt und Rückfallquoten verringert werden.
Improving asthma care through personalised risk assessment and support from a conversational agent (#EP/W002477/1 )
EPSRC UK Research and Innovation fEC Grants
Runtime: 01.09.2021 – 31.08.2023
Role: Principal Investigator, Co-author Proposal
Partners: Imperial College London
Over 5.4 million people have asthma in the UK, and despite £1Billion a year in NHS spending on asthma treatment, the national mortality rate is the highest in Europe. One of the reasons for this statistic, is that risk is often dramatically underestimated by many with asthma. This leads to neglect of early care, poor control, and eventually, hospitalisation. Therefore, improving accurate risk assessment and reduction via relevant behaviour change among people with asthma could save lives and dramatically reduce health care costs. We aim to address this early-care gap by investigating a new type of low-cost, and scalable personalised risk assessment, combined with follow-up automated support for risk reduction. The technology will leverage artificial intelligence to calculate a personalised asthma risk score based on voice features and self-reported data. It will then provide personalised advice on actions that can be taken to lower risk followed by customised conversational guidance to support the process of healthy change. We envision our work will ultimately lead to a safe and engaging system where the patients are able to see their current risk of an asthma attack after answering a series of questions, akin to clinical history taking, and record their voice. They then get ongoing customised support from an automated coach on how to reduce that risk. Any progress they make will visibly lower their risk (presented, for example, as “Strengthening their shield”), in order to make their state of asthma control more tangible and motivating. The technology will be developed collaboratively with direct involvement from people with asthma and clinicians through co-design methods and regular feedback in order to ensure risk assessment, feedback and guidance are clinically sound, and delivered in a way that is autonomy-supportive, clear, useful, and engaging to patients.Leader Humor: A Multimodal Approach to Humor Recognition and an Analysis of the Influence of Leader Humor on Team Performance in Major European Soccer Leagues (#SCHU2508/12-1)
Leader Humor: A Multimodal Approach to Humor Recognition and an Analysis of the Influence of Leader Humor on Team Performance in Major European Soccer Leagues (#SCHU2508/12-1)
(“Ein multimodaler Ansatz zur Erkennung und Messung von Humor und eine Analyse des Einflusses des Humors von Führungskräften auf die Teamleistung in europäischen Profifußball-Ligen”)DFG (German Research Foundation) Project
Runtime: 01.09.2021 – 31.08.2023
Role: Principal Investigator, Co-author Proposal
Partners: University of Passau, University of Augsburg In this project, scholars active in the fields of management and computerized psychometry take the unique opportunity to join their respective perspectives and complementary capabilities to address the overarching question of “How, why, and under which circumstances does leader humor affect team processes and team performance, and how can (leader) humor be measured on a large scale by applying automatic multimodal recognition approaches?”. Trait humor, which is one of the most fundamental and complex phenomena in social psychology, has garnered increasing attention in management research. However, scholarly understanding of humor in organizations is still substantially limited, largely because research in this domain has primarily been qualitative, survey-based, and small scale. Notably, recent advances in computerized psychometry promise to provide unique tools to deliver unobtrusive, multi-faceted, ad hoc measures of humor that are free from the substantial limitations associated with traditional humor measures. Computerized psychometry scholars have long noted that a computerized understanding of humor is essential for the humanization of artificial intelligence. Yet, they have struggled to automatically identify, categorize, and reproduce humor. In particular, computerized approaches have suffered not only from a lack of theoretical foundations but also from a lack of complex, annotated, real-life data sets and multimodal measures that consider the multi- faceted, contextual nature of humor. We combine our areas of expertise to address these research gaps and complementary needs in our fields. Specifically, we substantially advance computerized measures of humor and provide a unique view into the contextualized implications of leader humor, drawing on the empirical context of professional soccer. Despite initial attempts to join computerized psychometry and management research, these two fields have not yet been successfully combined to address our overall research question. We aspire to fill this void as equal partners, united by our keen interest in humor, computerized psychometry, leader rhetoric, social evaluations, and team performance.
Affect.AI: Voice analysis for Randomised Controlled Trials
MedTech Superconnector (MTSC) Accelerator Programme Pilot Project
Runtime: 06.01.2020 – 30.04.2021
Role: Principal Investigator
Partners: Imperial College London
The project deals with voice analysis based on digital biomarkers of depression in the voice for randomised controlled trials in the context of depression.
Improving the specificity of affective computing via multimodal analysis
ARC Discovery Project (22% Acceptance Rate in 2nd Round of Call)
Runtime: 01.01.2020 – 31.12.2023
Role: Principal Investigator, Co-author Proposal
Partners: University of Canberra, University of Pittsburgh, CMU, Imperial College London
Being able to have computational models and approaches to sense and understand a person’s emotion or mood is a core component of affective computing. While much research over the last two decades has tried to address the question of sensitivity – the correct recognition of affect classes – the equally important issue of specificity – the correct recognition of true negatives – has been neglected. This highly inter-disciplinary project aims to address this issue and to solve the fundamental affective computing problem of developing robust non-invasive multimodal approaches for accurately sensing a person’s affective state. Of course, neither sensitivity, nor specificity should be seen in isolation. The underlying issue is one of conceptualising affective states as areas within a continuous space, of determining the affect intensity on a continuous scale and of being able to analyse very subtle expressions of affect.
ParaStiChaD: Paralinguistic Speech Characteristics in Major Depressive Disorder (#SCHU2508/8-1)
(“Paralinguistische Stimmmerkmale in Major Depression”)
DFG (German Research Foundation) Project
Runtime: 01.01.2020 – 31.12.2022
Role: Principal Investigator, Co-author Proposal
Partners: FAU Erlangen-Nuremberg, University of Augsburg, Rheinische Fachhoschule Köln
More needs to be done to improve the validity of current methods to detect depression, to improve the validity of ways to predict the future course of depression and to enhance the efficacy and availability of evidence-based treatments for depression. The work proposed in Paralinguistic Speech Characteristics In Major Depressive Disorder (ParaSpeChaD) aims to address these needs by clarifying the extent to which Paralinguistic Speech Characteristics (PSCs; i.e. the vocal phenomena that occur alongside the linguistic information in speech) can be used to detect depression and predict its future course and how recent progress in mobile sensor technology can be used to improve the detection, prediction and potentially even the treatment of depression.
HUAWEI Joint Lab: Human-centered Empathetic Interaction
HUAWEI Joint Lab
Runtime: 01.01.2020 – 31.12.2022
Role: Lab Leader
Partners: HUAWEI, University of Augsburg
The Huawei-University of Augsburg Joint Lab aims to bring together Affective Computing & Human-Centered Intelligence for Human-centred empathic interaction.
KIrun: Einsatz Künstlicher Intelligenz in der Laufsportanalytik mit Audioanalyse/ -auswertung zur Motivation, Leistungssteigerung und Verletzungsprävention (FKZ: 16KN069402)
BMWi Zentrales Innovationsprogramm Mittelstand (ZIM) Projekt
Runtime: 01.12.2019 – 31.08.2022
Role: Principal Investigator, Co-author Proposal
Partners: Universitätsklinikum Tübingen, HB Technologies AG (HBT), University of Augsburg
Das Kooperationsprojekt KIRun verfolgt die Entwicklung eines Messsystems und eines selbstlernenden Algorithmus, der auf Basis von auditiven, biomechanischen und physiologischen Messdaten das Wohlbefinden und die Anstrengung autonom ermittelt. Der innovative Kern besteht in der Ermittlung des Wohlbefindens und der Anstrengung auf der Basis von objektiven Messdaten: Audiosignale (z.B. Atemgeräusche) werden in diesem System nicht zur Sprachsteuerung verwendet, sondern werden permanent erfasst, um daraus eigenständig Rückschlüsse auf das Wohlbefinden zu ziehen. Ein autonomes Messverfahren zur Datenerfassung, mit dem das Wohlbefinden und die Anstrengung objektiviert und zeitsynchron zum Laufen erfasst und in eine Trainingssteuerung eingebunden werden, gibt es bislang nicht. Dies stellt ein Alleinstellungsmerkmal der Technologie und eine erhebliche Verbesserung zum Stand der Technik in der Laufsportanalyse dar. Per App soll eine gezielte Beeinflussung des Läufers in Richtung Wohlbefinden möglich werden, so dass die Motivation des Läufers für das Lauftraining maximal gesteigert werden kann. Als Zielgröße des Lauftrainings wird das maximale Wohlbefinden und nicht wie bisher üblich die maximale Geschwindigkeit oder das größte Streckenpensum angestrebt. Viele Einsteiger und Gelegenheitsläufer sind aufgrund falscher Trainingsgestaltung frühzeitig demotiviert oder steigen verletzt wieder aus. KIRun stellt dagegen einen positiven Trainingseindruck für den Läufer in den Mittelpunkt. Die Steigerung des Wohlbefindens mit Hilfe der “KIRun”-Technologie ist damit der effektive Antrieb für den Sportler, um die regelmäßige körperlichen Aktivität auszuüben.
EMBOA: Affective loop in Socially Assistive Robotics as an intervention tool for children with autism
ERASMUS+ project
Runtime: 01.09.2019 – 31.08.2022
Role: Principal Investigator, Co-author Proposal
Partners: Politechnika Gdanska, University of Hertfordshire, Istanbul Teknik Universitesi, Yeditepe University Vakif, Macedonian association for applied psychology, University of AugsburgThe EMBOA project (Affective loop in Socially Assistive Robotics as an intervention tool for children with autism) aims at the development of guidelines and practical evaluation of applying emotion recognition technologies in robot-supported intervention in children with autism. Children with autism spectrum disorder (ASD) suffer from multiple deficits, and limited social and emotional skills are among those, that influence their ability to involve in interaction and communication. Limited communication occurs in human-human interaction and affects relations with family members, peers, and therapists. There are promising results in the use of robots in supporting the social and emotional development of children with autism. We do not know, why children with autism are eager to interact with human-like looking robots and not with humans. Regardless of the reason, social robots proved to be a way to get through the social obstacles of a child and make him/her involved in the interaction. Once the interaction happens, we have a unique opportunity to engage a child in gradually building and practicing social and emotional skills. In the project, we combine social robots, that are already used in therapy for children with autism with algorithms for automatic emotion recognition. The EMBOA project goal is to confirm the possibility of the application (feasibility study), and in particular, we aim at the identification of the best practices and obstacles in using the combination of the technologies. What we hope to obtain is a novel approach for creating an affective loop in child-robot interaction that would enhance interventions regarding emotional intelligence building in children with autism. The lessons learned, summarized in the form of guidelines, might be used in higher education in all involved countries in robotics, computer science, and special pedagogy fields of study. The results will be disseminated in the form of trainings, multiplier events, and to the general public by scientific papers and published reports. The project consortium is multidisciplinary and combines partners with competence in interventions in autism, robotics, and automatic emotion recognition from Poland, UK, Germany, North Macedonia, and Turkey. The methodological approach includes systematic literature reviews and meta-analysis, data analysis based on statistical and machine learning approaches, and as well observational studies. We have planned a double-loop of observational studies. The first round is to analyze the application of emotion recognition methods in robot-based interaction in autism, and especially to compare diverse channels for observation of emotion symptoms. The lessons learned would be formulated in the form of guidelines. The guidelines would be evaluated with the AGREE (Appraisal of Guidelines, Research, and Evaluation) instrument and confirmed with the second round of observational studies. The objectives of our project are matching the Social Inclusion horizontal priority with regards to supporting the actions for improvement of learning performance of disadvantaged learners (testing of a novel approach for improvement of learning performances of children with autism).
AUDEO: Audio-basierte Herkunftsland-Erkennung von Migranten
BMBF IKT2020-Grant (Forschungsprogramm Zivile Sicherheit – Anwender-innovativ: Forschung für die zivile Sicherheit)
Runtime: 01.06.2019 – 31.05.2021
Role: Beneficiary
Partners: Bundespolizeipräsidium, Hochschule für Medien, Kommunikation und Wirtschaft GmbH, audEERING GmbH
Ziel des Vorhabens ist die Entwicklung einer juristisch-belastbaren, akkuraten Stimmanalyse-Software zur vereinfachten, objektiven und echtzeitfähigen Bestimmung der 10 relevantesten Herkunftsländer von Personen im Migrationskontext.
ForDigitHealth: Bayerischer Forschungsverbund zum gesunden Umgang mit digitalen Technologien und Medien
BayFOR (Bayerisches Staatsministerium für Wissenschaft und Kunst) Project
Runtime: 48 Months – 2019-31.05.2023
Role: Principal Investigator, Co-author Proposal
Partners: University of Augsburg, Otto-Friedrichs-University Bamberg, FAU Erlangen-Nuremberg, LMU Munich, JMU Würzburg
Die Digitalisierung führt zu grundlegenden Veränderungen unserer Gesellschaft und unseres individuellen Lebens. Dies birgt Chancen und Risiken für unsere Gesundheit. Zum Teil führt unser Umgang mit digitalen Technologien und Medien zu negativem Stress (Distress), Burnout, Depression und weiteren gesundheitlichen Beeinträchtigungen. Demgegenüber kann Stress auch eine positive, anregende Wirkung haben (Eustress), die es zu fördern gilt. Die Technikgestaltung ist weit fortgeschritten, sodass digitale Technologien und Medien dank zunehmender künstlicher Intelligenz, Adaptivität und Interaktivität die Gesundheit ihrer menschlichen Nutzerinnen und Nutzer bewahren und fördern können. Ziel des Forschungsverbunds ForDigitHealth ist es, die Gesundheitseffekte der zunehmenden Präsenz und intensivierten Nutzung digitaler Technologien und Medien – speziell in Hinblick auf die Entstehung von digitalem Distress und Eustress und deren Folgen – in ihrer Vielgestaltigkeit wissenschaftlich zu durchdringen sowie Präventions- und Interventionsoptionen zu erarbeiten und zu evaluieren. Dadurch soll der Forschungsverbund zu einem angemessenen, bewussten und gesundheitsförderlichen individuellen wie kollektiven Umgang mit digitalen Technologien und Medien beitragen.
sustAGE: Smart environments for person-centered sustainable work and well-being (#826506)
EU Horizon 2020 Research & Innovation Action (RIA)

Runtime: 01.01.2019 – 30.06.2022
Role: Principal Investigator, Scientific and Technical Manager (STM), Workpackage Leader, Co-Author Proposal
Partners: Foundation for Research and Technology Hellas, Centro Ricerche Fiat SCPA, Software AG, Imaginary SRL, Forschungsgesellschaft für Arbeitsphysiologie und Arbeitsschutz e.V., Heraklion Port Authority S.A., Aegis IT Research UG, University of Augsburg, Aristotelio Panepistimio Thessalonikis, Universidad Nacional de Educacion a Distancia
sustAGE aims to develop a person-centered solution for promoting the concept of “sustainable work” for EU industries. The project provides a paradigm shift in human machine interaction, building upon seven strategic technology trends, IoT, Machine learning, micro-moments, temporal reasoning, recommender systems, data analytics and gamification to deliver a composite system integrated with the daily activities at work and outside, to support employers and ageing employees to jointly increase well-being, wellness at work and productivity. The manifold contribution focuses on the support of the employment and later retirement of older adults from work and the optimization of the workforce management. The sustAGE platform guides workers on work-related tasks, recommends personalized cognitive and physical training activities with emphasis on game and social aspects, delivers warnings regarding occupational risks and cares for their proper positioning in work tasks that will maximize team performance. By combining a broad range of the innovation chain activities namely, technology R&D, demonstration, prototyping, pilots, and extensive validation, the project aims to explore how health and safety at work, continuous training and proper workforce management can prolongue older workers’ competitiveness at work. The deployment of the proposed technologies in two critical industrial sectors and their extensive evaluation will lead to a ground-breaking contribution that will improve the performance and quality of life at work and beyond for many ageing adult workers.
WorkingAge: Smart Working environments for all Ages (#210487208)
EU Horizon 2020 Research & Innovation Action (RIA)

Runtime: 01.01.2019 – 31.12.2021
Role: Principal Investigator, Co-Author Proposal
Partners: Instituto Tecnológico de Castilla y Leon, Exodus Anonymos Etaireia Pliroforikis, University of Cambridge, Politecnico di Milano, Green Communications SAS, Brainsigns SRL, RWTH Aachen, Telespazio France SAS, audEERING GmbH, European Emergency Number Association ASBL, Fundacion Intras, Telematic Medical Applications MEPE
WorkingAge will use innovative HCI methods (augmented reality, virtual reality, gesture/voice recognition and eye tracking) to measure the user emotional/cognitive/health state and create communication paths. At the same time with the use of IoT sensors it will be able to detect environmental conditions. The purpose is to promote healthy habits of users in their working environment and daily living activities in order to improve their working and living conditions. By studying the profile of the >50 (year old) workers and the working place requirements in three different working environments (Office, Driving and Manufacturing), both profiles (user and environment) will be considered. Information obtained will be used for the creation of interventions that will lead to healthy aging inside and outside the working environment. WorkingAge will test and validate an integrated solution that will learn the user’s behaviour, health data and preferences and through continue data collection and analysis will interact naturally with the user. This innovative system will provide workers assistance in their everyday routine in the form of reminders, risks avoidance and recommendations. In this way the WorkingAge project will create a sustainable and scalable product that will empower their users’ easing their life by attenuating the impact of aging in their autonomy, work conditions, health and well-being.
ERIK: Entwicklung einer Roboterplattform zur Unterstützung neuer Interaktionsstrategien bei Kindern mit eingeschränkten sozioemotionalen Fähigkeiten
BMBF IKT2020-Grant (Forschungsprogramm Roboter für Assistenzfunktionen: Interaktionsstrategien)
Runtime: 01.11.2018 – 31.10.2021
Role: Beneficiary, Scientific and Technical Manager (STM)
Partners: Fraunhofer IIS, ASTRUM IT GmbH, Humboldt-Universität zu Berlin, Friedrich-Alexander-Universität Erlangen-Nürnberg, audEERING GmbH
Das Verstehen und Ausdrücken von sozio-emotionalen Signalen, wie z. B. Gesichtsausdruck und Stimmenmodulation, ist bei Kindern mit Autismus beeinträchtigt. Während menschliche Interaktionspartner für sie schwer einzuschätzen sind, nehmen diese Kinder Roboter als vorhersehbarer und weniger komplex wahr. Häufig sind sie zudem technisch interessiert und aufgeschlossen. Zur Entwicklung der sozio-emotionalen Kommunikationsfähigkeiten autistischer Kinder wird im Projekt ERIK eine neue Therapieform mit Hilfe eines robotischen Systems entwickelt und erprobt. Der Roboter „Pepper“ erfasst in der Interaktion mit dem Kind die Mimik und Sprache. Durch das Spielen mit dem Roboter-Ball „Leka“ kann zusätzlich über Elektroden der Puls ermittelt werden. „Pepper“ interpretiert diese Signale und leitet in Echtzeit Emotionen ab. Kombiniert mit der Therapie-App „Zirkus Empatico“ können alltagsrelevante emotionale und soziale Fähigkeiten trainiert werden. Durch das Erkennen von Interesse, Frustration und Langweile des Kindes können die Therapieszenarien individuell angepasst werden. Mittels Gesten und Augenbewegungen kann „Pepper“ lebensnah mit Kindern interagieren, wobei Ängste im Umgang mit Menschen reduziert werden können. Der innovative Therapieansatz erlaubt Therapeuten, Interaktionen genauer zu beobachten und auszuwerten, da sie selbst nicht mehr Teil der Interaktion sind. Die emotionssensitive Robotik kann außerdem erstmalig auch mit Gruppen von Kindern interagieren.
HOL-DEEP-SENSE: Holistic Deep Modelling for User Recognition and Affective Social Behaviour Sensing
(#797323)
EU Horizon 2020 Marie Skłodowska-Curie action Individual Fellowship (MASCA-IF 2017)
Runtime: 01.10.2018 – 31.03.2021
Role: Coauthor Proposal, Coordinator, Beneficiary, Supervisor
Partners: University of Augsburg, Massachussetts Insititute of Technology, Technische Universität MünchenThe “Holistic Deep Modelling for User Recognition and Affective Social Behaviour Sensing” (HOL-DEEP-SENSE) project aims at augmenting affective machines such as virtual assistants and social robots with human-like acumen based on holistic perception and understanding abilities. Social competencies comprising context awareness, salience detection and affective sensitivity present a central aspect of human communication, and thus are indispensable for enabling natural and spontaneous human-machine interaction. Therefore, with the aim to advance affective computing and social signal processing, we envision a “Social Intelligent Multi-modal Ontological Net” (SIMON) that builds on technologies at the leading edge of deep learning for pattern recognition. In particular, our approach is driven by multi-modal information fusion using end-to-end deep neural networks trained on large datasets, allowing SIMON to exploit combined auditory, visual and physiological analysis. In contrast to standard machine learning systems, SIMON makes use of task relatedness to adapt its topology within a novel construct of subdivided neural networks. Through deep affective feature transformation, SIMON is able to perform associative domain adaptation via transfer and multi-task learning, and thus can infer user characteristics and social cues in a holistic context. This new unified sensing architecture will enable affective computers to assimilate ontological human phenomena, leading to a step change in machine perception. This will offer a wide range of applications for health and wellbeing in future IoT-inspired environments, connected to dedicated sensors and consumer electronics. By verifying the gains through holistic sensing, the project will show the true potential of the much sought-after emotionally and socially intelligent AI, and herald a new generation of machines with hitherto unseen skills to interact with humans via universal communication channels.
Sentiment Analyse
Industry Cooperation with BMW AG
Runtime: 01.05.2018 – 30.04.2021
Role: Principal Investigator
Partners: University of Augsburg, BMW AG
The project aims at real-time internet-scale sentiment analysis in unstructured multimodal data in the wild.
ECoWeB: Assessing and Enhancing Emotional Competence for Well-Being (ECoWeB) in the Young: A principled, evidence-based, mobile-health approach to prevent mental disorders and promote mental wellbeing (#754657)
EU Horizon 2020 Research & Innovation Action (RIA)

Runtime: 01.01.2018 – 31.12.2021
Role: Principal Investigator, Innovation Manager, Innovation Management Board Chair, Workpackage Leader, Co-Author Proposal
Partners: University of Exeter, audEERING GmbH, Vysoke Uceni Technicke v Brne, Institute of Communication and Computing Systems, Universitat Jaume i de Castellon, Fraunhofer Gesellschaft, University of Oxford, University of Geneva, LMU Munich, University of Gent, Monsenso ApS, University of Copenhagen, Deutsches Jugendinstitut eV
Although there are effective mental well-being promotion and mental disorder prevention interventions for young people, there is a need for more robust evidence on resilience factors, for more effective interventions, and for approaches that can be scalable and accessible at a population level. To tackle these challenges and move beyond the state-of-the-art, ECoWeB uniquely integrates three multidisciplinary approaches: (a) For the first time to our knowledge, we will systematically use an established theoretical model of normal emotional functioning (Emotional Competence Process) to guide the identification and targeting of mechanisms robustly implicated in well-being and psychopathology in young people; (b) A personalized medicine approach: systematic assessment of personal Emotional Competence (EC) profiles is used to select targeted interventions to promote well-being: (c) Mobile application delivery to target scalability, accessibility and acceptability in young people. Our aim is to improve mental health promotion by developing, evaluating, and disseminating a comprehensive mobile app to assess deficits in three major components of EC (production, regulation, knowledge) and to selectively augment pertinent EC abilities in adolescents and young adults. It is hypothesized that the targeted interventions, based on state-of-the-art assessment, will efficiently increase resilience toward adversity, promote mental well-being, and act as primary prevention for mental disorders. The EC intervention will be tested in cohort multiple randomized trials with young people from many European countries against a usual care control and an established, non-personalized socio-emotional learning digital intervention. Building directly from a fundamental understanding of emotion in combination with a personalized approach and leading edge digital technology is a novel and innovative approach, with potential to deliver a breakthrough in effective prevention of mental disorder.
An Embedded Soundscape System for Personalised Wellness via Multimodal Bio-Signal and Speech Monitoring – 7% acceptance rate in the call
ZD.B Fellowship
Runtime: 01.01.2018 – 31.12.2020
Role: Supervisor, Co-Author Proposal
Partners: University of Augsburg
The main research aim is to explore how diverse multimodal data can inform the production of personalised embedded soundscapes, and how such digitally produced soundscapes can improve human wellness. As highlighted by ZD.B Digital Health / Medicine, digitisation in health care shows great potential. The proposed could be effective in a variety of scenarios, including nervousness. Imagine the hours before an important presentation and the presenter’s nerves are building. The presenter could use a smart-device application, to provide a speech instance (whilst monitoring pulse). The application returns a (user dependent) soundscape which clinically reduces the negative feeling. To explore this, the project will be divided into 3 phases (detailed in section 5), each a fundamental part for development of such wellness systems. Questions will arise, pertaining to both human audible, and speech perception with observations of current ‘norms’ in data science, contributing to the ethics involved in artificial intelligence.
TAPAS: Training network on Automatic Processing of PAthological Speech (#766287)
EU Horizon 2020 Marie Sklodowska-Curie Innovative Training Networks European Training Networks (MSCA-ITN-ETN:ENG)

Runtime: 01.11.2017 – 31.10.2021
Role: Principal Investigator, Co-Author Proposal
Partners: IDIAP, Université Paul Sabatier Toulouse III, Universitair Ziekenhuis Antwerpen, FAU Erlangen-Nürnberg, Stichting Katholieke Universiteit, INESC ID, LMU Munich, Interuniersitair Micro-Electronicacentrum IMEC, Stichting het Nederlands Kanker Instituutantoni van Leeuwenhoek Ziekenhuis, University of Augsburg, University of Sheffield, audEERING GmbH

There are an increasing number of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson’s, etc). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech. TAPAS is proposing a programme of pathological speech research, that aims to transform the well-being of these people. The TAPAS work programme targets three key research problems: (a) Detection: We will develop speech processing techniques for early detection of conditions that impact on speech production. The outcomes will be cheap and non-invasive diagnostic tools that provide early warning of the onset of progressive conditions such as Alzheimer’s and Parkinson’s. (b) Therapy: We will use newly-emerging speech processing techniques to produce automated speech therapy tools. These tools will make therapy more accessible and more individually targeted. Better therapy can increase the chances of recovering intelligible speech after traumatic events such a stroke or oral surgery. (c) Assisted Living: We will re-design current speech technology so that it works well for people with speech impairments and also helps in making informed clinical choices. People with speech impairments often have other co-occurring conditions making them reliant on carers. Speech-driven tools for assisted-living are a way to allow such people to live more independently. TAPAS adopts an inter-disciplinary and multi-sectorial approach. The consortium includes clinical practitioners, academic researchers and industrial partners, with expertise spanning speech engineering, linguistics and clinical science. All members have expertise in some element of pathological speech. This rich network will train a new generation of 15 researchers, equipping them with the skills and resources necessary for lasting success.
OPTAPEB: Optimierung der Psychotherapie durch Agentengeleitete Patientenzentrierte Emotionsbewältigung (#V5IKM010)
BMBF IKT2020-Grant (Forschungsprogramm zur Mensch-Technik-Interaktion: Technik zum Menschen bringen – Interaktive körpernahe Medizintechnik)
Runtime: 01.11.2017 – 31.10.2020
Role: Beneficiary
Partners: Universität Regensburg, Fraunhofer IIS, VTplus GmbH, Ambiotex GmbH, NTT GmbH, eHealthLabs, audEERING GmbH
OPTAPEB aims to develop an immersive and interactive virtual reality system that assists users in curing phobia. The system will allow to experience situations of phobia and protocol this emotional experience and the user’s behaviour. Various levels of emotional reactions will be monitored continuously and in real time by the system that applies sensors based on innovative e-wear technology, speech signals, and other pervasive technologies (e.g. accelerometres). A further goal of the project is the development of a game-like algorithm to control the user experience of anxieties through exposure therapy and to adapt the course of the therapy to the user needs and the current situation automatically.
Sating Curiosity While Avoiding Risk in Reinforcement Learning (#2021037)
HiPEDS EPSRC / Presidential Scholarship, Imperial College, Industry integrated Centre for Doctoral Training (CDT)
Runtime: 01.10.2017 – 30.09.2021
Role: Supervisor, Co-author Proposal
Partners: Imperial College LondonIn reinforcement learning, an agent tries to maximise some notion of expected future cumulative reward by sequentially selecting actions that affect its environment. Its applications are many and range from embodied robotics, through biomedical engineering, to human-robot interaction via speech and emotion recognition. However, in many such applications the great amounts of real data required to effectively train the agent may be difficult or expensive to obtain and an inadequate agent policy can be catastrophic. Here I propose the combination of a) environment models that make more efficient use of the agent’s past experience and b) directed exploration that encourages the exploration of more informative locations in the environment – both by means of Bayesian deep neural networks. Recent advances in the latter show them to combine the ability of deep neural networks to learn from complex, high-dimensional data and also the aversion to overfitting and learning with uncertainty of Bayesian methods. With this approach, I expect to be able to design reinforcement learning solutions that can exhibit an improved usage of data both at the beginning and at later stages of the agent’s exploration.
ACLEW: Analyzing Child Language Experiences Around the World (HJ-253479) – 14 winning projects in total
T-AP (Trans-Atlantic Platform for the Social Sciences and Humanities along with Argentina (MINCyT), Canada (SSHRC, NSERC), Finland (AKA), France (ANR), United Kingdom (ESRC/AHRC), United States (NEH)) Digging into Data Challenge 4th round
Runtime: 01.06.2017 – 31.05.2020
Role: Principal Investigator, Co-Author Proposal
Partners: Duke University, École Normale Supérieure, Aalto University, CONICET, Imperial College London, University of Manitoba, Carnegie Mellon University, University of Toronto
An international collaboration among linguists and speech experts to study child language development across nations and cultures to gain a better understanding of how an infant’s environment affects subsequent language ability.
Evolutionary Computing: The Changing MindHiPEDS EPSRC, Imperial College, Industry integrated Centre for Doctoral Training (CDT)Runtime: 01.04.2017 – 31.03.2021
Role: Supervisor
Partners: Imperial College London
This project aims to (1) innovate upon NeuroEvolution of Augmenting Topologies (NEAT), (2) permit function extraction for Transfer Learning, (3) find ways to merge evolutionary computation with broader systems and (4) deploy methods using the latest processing technology – “NeuroMorphic” computing chips.
ZAM: Zero-resource keyword recognition for Audio Mass-Data (Zero-Resource Schlagworterkennung bei Audio-Massendaten)
Runtime: 01.12.2016 – 31.08.2017
Role: Coauthor Proposal, Beneficiary, Principal Investigator
Partners: University of Passau and others
To process mass audio data captured by a range of diverse sensors, technical solutions within the field of keyword recognition shall be investigated. It shall be shown which approaches simplify, accelerate, and optimise audio analysis as well as optimise manual work processes. The major aim thereby is to significantly reduce human work load by utmost automation given the following focus: 1) limited to no resources (“zero resource”) for training and 2) answering the question on how low audio quality can be when reasonably processing audio highly automatically.
Deep Learning Speech Enhancement
Industry Cooperation with HUAWEI TECHNOLOGIES
Runtime: 12.11.2016 – 11.11.2018
Role: Principal Investigator, Author Proposal
Partners: University of Passau, University of Augsburg, HUAWEI TECHNOLOGIES
The research target of this project is to develop state-of-the-art methods for speech enhancement based on deep learning. The aim is to overcome limitations in challenging scenarios that are posed by non-stationary noise and distant speech with a potentially moving device and potentially limited power and memory on the device. It will be studied how deep learning speech enhancement can successfully be applied to multi-channel input signals. Furthermore, an important aspect is robustness and adaptation to unseen conditions, such as different noise types.
Deep Learning Methods for End-to-End Modeling of Multimodal Phenomena (#1806264)
HiPEDS EPSRC / Imperial College, Industry integrated Centre for Doctoral Training (CDT)
Runtime: 01.10.2016 – 03.02.2021
Role: Supervisor, Co-author Proposal
Partners: Imperial College London
Automatic affect recognition in real-world environments is an important task towards a complete interaction between humans and machines. The main challenges that arise towards that goal are the uncontrolled conditions that exist in such environments, and the various modalities emotions can be expressed with. The last 10 years, several advancements have been accomplished in determining emotional states with the use of Deep Neural Networks (DNNs). To this end, in this project we investigate developing methods and algorithms utilizing DNNs for classification of audio-visual phenomena including audio-visual speech recognition and audio-visual behavior understanding and subject characterization.
EngageME: Automated Measurement of Engagement Level of Children with Autism Spectrum Conditions during Human-robot Interaction (#701236) – 14.4% acceptance rate in the call
EU Horizon 2020 Marie Skłodowska-Curie action Individual Fellowship (MASCA-IF 2015)

Runtime: 01.09.2016 – 31.08.2019
Role: Coauthor Proposal, Coordinator, Beneficiary, Supervisor
Partners: University of Augsburg, University of Passau, Massachussetts Insititute of Technology
Engaging children with ASC (Autism Spectrum Conditions) in communication centred activities during educational therapy is one of the cardinal challenges by ASC and contributes to its poor outcome. To this end, therapists recently started using humanoid robots (e.g., NAO) as assistive tools. However, this technology lacks the ability to autonomously engage with children, which is the key for improving the therapy and, thus, learning opportunities. Existing approaches typically use machine learning algorithms to estimate the engagement of children with ASC from their head-pose or eye-gaze inferred from face-videos. These approaches are rather limited for modeling atypical behavioral displays of engagement of children with ASC, which can vary considerably across the children. The first objective of EngageME is to bring novel machine learning models that can for the first time effectively leverage multi-modal behavioural cues, including facial expressions, head pose, vocal and physiological cues, to realize fully automated context-sensitive estimation of engagement levels of children with ASC. These models build upon dynamic graph models for multi-modal ordinal data, based on state-of-the-art machine learning approaches to sequence classification and domain adaptation, which can adapt to each child, while still being able to generalize across children and cultures. To realize this, the second objective of EngageME is to provide the candidate with the cutting-edge training aimed at expanding his current expertise in visual processing with expertise in wearable/physiological, and audio technologies, from leading experts in these fields. EngageME is expected to bring novel technology/models for endowing assistive robots with ability to accurately ‘sense’ engagement levels of children with ASC during robot-assisted therapy, while providing the candidate with a set of skills needed to become one of the frontiers in the emerging field of affect-sensitive assistive technology.
RADAR CNS: Remote Assessment of Disease and Relapse in Central Nervous System Disorders (#115902) – 15.8% acceptance rate in the call
EU Horizon 2020 / EFPIA Innovative Medicines Initiative (IMI) 2 Call 3

Runtime: 01.04.2016 – 31.03.2022
Role: Coauthor Proposal, Beneficiary, Principal Investigator, Workpackage Leader
Partners: King’s College London, Provincia Lombardo-Veneta – Ordine Ospedaliero di San Giovanni di Dio— Fatebenefratelli Lygature, Università Vita-Salute San Raffaele, Fundacio Hospital Universitari Vall D’Hebron, University of Nottingham, Centro de Investigacion Biomedica en Red, Software AG, Region Hovedstaden, Stichting VU-Vumc, University Hospital Freiburg, Stichting IMEC Nederland, Katholieke Universiteit Leuven, Northwestern University, Stockholm Universitet, University of Augsburg, University of Passau, Università degli Studi di Bergamo, Charité – Universitätsmedizin Berlin, Intel Corporation (UK) Ltd, GABO:mi, Janssen Pharmaceutica NV, H. Lundbeck A/S, UCB Biopharma SPRL, MSD IT Global Innovation Center
The general aim is to develop and test a transformative platform of remote monitoring (RMT) of disease state in three CNS diseases: epilepsy, multiple sclerosis and depression. Other aims are: (i) to build an infrastructure to identify clinically useful RMT measured biosignatures to assist in the early identification of relapse or deterioration; (ii) to develop a platform to identify these biosignatures; (iii) to anticipate potential barriers to translation by initiating a dialogue with key stakeholders (patients, clinicians, regulators and healthcare providers).
DE-ENIGMA: Multi-Modal Human-Robot Interaction for Teaching and Expanding Social Imagination in Autistic Children (#688835) – 6.9% acceptance rate in the call

EU Horizon 2020 Research & Innovation Action (RIA)
Runtime: 01.02.2016 – 31.07.2019
Role: Coauthor Proposal, Beneficiary, Principal Investigator, WP Leader
Partners: University of Twente, Savez udruzenja Srbije za pomoc osobama sa autizmom, Autism-Europe, IDMIND, University College London, University of Augsburg, University of Passau, Romane Institute of Mathematics Simion Stoilow of the Romanian Academy, Imperial College London
Autism Spectrum Conditions (ASC, frequently defined as ASD — Autism Spectrum Disorders) are neurodevelopmental conditions, characterized by social communication difficulties and restricted and repetitive behaviour patterns. There are over 5 million people with autism in Europe – around 1 in every 100 people, affecting lives of over 20 million people each day. Alongside their difficulties, individuals with ASC tend to have intact and sometimes superior abilities to comprehend and manipulate closed, rule-based, predictable systems, such as robotbased technology. Over the last couple of years, this has led to several attempts to teach emotion recognition and expression to individuals with ASC, using humanoid robots. This has been shown to be very effective as an integral part of the psychoeducational therapy for children with ASC. The main reason for this is that humanoid robots are perceived by children with autism as being more predictable, less complicated, less threatening, and more comfortable to communicate with than humans, with all their complex and frightening subtleties and nuances. The proposed project aims to create and evaluate the effectiveness of such a robot-based technology, directed for children with ASC. This technology will enable to realise robust, context-sensitive (such as user- and culture-specific), multimodal (including facial, bodily, vocal and verbal cues) and naturalistic human-robot interaction (HRI) aimed at enhancing the social imagination skills of children with autism. The proposed will include the design of effective and user-adaptable robot behaviours for the target user group, leading to more personalised and effective therapies than previously realised. Carers will be offered their own supportive environment, including professional information, reports of child’s progress and use of the system and forums for parents and therapists.
U-STAR: Universal Speech Translation Advanced Research
Academic Cooperation

Runtime: 01.01.2016 – 30.09.2017
Role: Consortial Partner
Partners: University of Passau and 36 further partners – cf. homepage
The Universal Speech Translation Advanced Research Consortium (U-STAR) is an international research collaboration entity formed to develop a network-based speech-to-speech translation (S2ST) with the aim of breaking language barriers around the world and to implement vocal communication between different languages.
Promoting Early Diagnosis of Rett Syndrome through Speech-Language Pathology
(Akustische Parameter als diagnostische Marker zur Früherkennung von Rett-Syndrom) (#16430)
Österreichische Nationalbank (OeNB) Jubiläumsfonds
Runtime: 01.11.2015 – 31.10.2019
Role: Main Cooperation Partner
Partners: Medical University of Graz, Karolinska Institutet, Boston Children’s Hospital and Harvard Medical School, University of Passau, Imperial College London, Victoria University of Wellington
VocEmoApI: Voice Emotion detection by Appraisal Inference (#230331)

EU Horizon 2020 ERC Proof of Concept Grant (PoC 2015) – 46% acceptance rate in the call
Runtime: 01.11.2015 – 30.04.2017
Role: Coauthor Proposal, Beneficiary, LEAR
Partners: audEERING GmbH
The automated sensing of human emotions has gained a lot of commercial attention lately. For facial and physiological sensing many companies offer first professional products. Recently, voice analytics has become a hot topic, too, with first companies emerging for the telecom, entertainment, and robot markets (e.g. Sympalog, audEERING, Aldebaran, etc.). Current vocal emotion detection approaches rely on machine learning where emotions are identified based on a reference set of expression clips. The drawback of this method is the need to rely on a small set of basic, highly prototypical emotions. Real life emotion detection application fields such as clinical diagnosis, marketing research, media impact analysis, and forensics and security, require subtle differentiations of feeling states. VocEmoApI will develop a proof-of-concept software for vocal emotion detection based on a fundamentally different approach: Focusing on vocal nonverbal behavior and sophisticated acoustic voice analysis, it exploits the building blocks of emotional processes – a person’s appraisal of relevant events and situations which trigger action tendencies and expressions which constitute an emotional episode. Evidence for emotion-antecedent appraisals will be continuously tracked in running speech. The approach can infer not only basic emotion categories but also much finer distinctions such as subcategories of emotion families and subtle emotions. The development of VocEmoApI draws extensively from the results of the applicant’s Advanced Grant, providing a solid theoretical basis. Market analysis through marketing research partners will be conducted and the prototype software will be utilized to promote the technology and estimate a product value based on feedback from industry contacts. A massive impact of VocEmoApI on large markets such as household robotics, public security, clinical diagnosis and therapy, call analytics, and marketing research is to be expected.
EmotAsS: Emotionsensitive Assistance System (#16SV7213)

BMBF IKT2020-Grant (Sozial- und emotionssensitive Systeme für eine optimierte Mensch-Technik-Interaktion)
Runtime: 01.06.2015 – 31.05.2018
Role: Coauthor Proposal, Beneficiary, Principal Investigator
Partners: University of Bremen, University of Augsburg, University of Passau, vacances Mobiler Sozial- und Pflegedienst GmbH, Martinshof (Werkstatt Bremen), Meier und Schütte GmbH und Co. KG.
The aim of the project is to develop and investigate emotion detection and according usage for interaction processes in manufactories for handicapped individuals. It is therefore intended to develop a system, which reliably recognizes, responds, and reacts appropriately to emotions of people with disabilities during their everyday work routinge. The findings are to be transferred to further fields of application, and tested in particular for the communication with dementia patients.
(Original German description: Emotionen und deren Erkennung in der gesprochenen Sprache sind für die erfolgreiche Mensch-Technik- Interaktion wichtig, insbesondere bei Menschen mit Erkrankungen oder Behinderungen. Ziel des Projekts ist es, Emotionserkennung und deren Nutzung für Interaktionsprozesse in Werkstätten für behinderte Menschen zu entwickeln und zu untersuchen. Es soll daher ein System entwickelt werden, das sicher Emotionen bei Menschen mit Behinderungen in der Sprache erkennt und angemessen und unterstützend auf diese reagiert. Die Erkenntnisse sollen auf ein weiteres Anwendungsgebiet übertragen und in der Kommunikation mit Demenzerkrankten erprobt werden.)
MixedEmotions: Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets (#644632)

EU Horizon 2020 Innovation Action (IA) – 12.5% acceptance rate in the call
Runtime: 01.04.2015 – 31.03.2017
Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
Partners: NUI Galway, Univ. Polit. Madrid, University of Passau, Expert Systems, Paradigma Tecnológico, TU Brno, Sindice Ltd., Deutsche Welle, Phonexia SRO, Adoreboard, Millward Brown
MixedEmotions will develop innovative multilingual multi-modal Big Data analytics applications that will analyze a more complete emotional profile of user behavior using data from mixed input channels: multilingual text data sources, A/V signal input (multilingual speech, audio, video), social media (social network, comments), and structured data. Commercial applications (implemented as pilot projects) will be in Social TV, Brand Reputation Management and Call Centre Operations. Making sense of accumulated user interaction from different data sources, modalities and languages is challenging and has not yet been explored in fullness in an industrial context. Commercial solutions exist but do not address the multilingual aspect in a robust and large-scale setting and do not scale up to huge data volumes that need to be processed, or the integration of emotion analysis observations across data sources and/or modalities on a meaningful level. MixedEmotions will implement an integrated Big Linked Data platform for emotion analysis across heterogeneous data sources, different languages and modalities, building on existing state of the art tools, services and approaches that will enable the tracking of emotional aspects of user interaction and feedback on an entity level. The MixedEmotions platform will provide an integrated solution for: large-scale emotion analysis and fusion on heterogeneous, multilingual, text, speech, video and social media data streams, leveraging open access and proprietary data sources, and exploiting social context by leveraging social network graphs; semantic-level emotion information aggregation and integration through robust extraction of social semantic knowledge graphs for emotion analysis along multidimensional clusters.
SEWA: Automatic Sentiment Estimation in the Wild (#645094)

EU Horizon 2020 Innovation Action (IA) – 9.3% acceptance rate in the call
Runtime: 01.02.2015 – 31.07.2018
Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
Partners: Imperial College London, University of Augsburg, University of Passau, PlayGen Ltd, RealEyes
The main aim of SEWA is to deploy and capitalise on existing state-of-the-art methodologies, models and algorithms for machine analysis of facial, vocal and verbal behaviour, and then adjust and combine them to realise naturalistic human-centric human-computer interaction (HCI) and computer-mediated face-to-face interaction (FF-HCI). This will involve development of computer vision, speech processing and machine learning tools for automated understanding of human interactive behaviour in naturalistic contexts. The envisioned technology will be based on findings in cognitive sciences and it will represent a set of audio and visual spatiotemporal methods for automatic analysis of human spontaneous (as opposed to posed and exaggerated) patterns of behavioural cues including continuous and discrete analysis of sentiment, liking and empathy.
ARIA-VALUSPA: Artificial Retrieval of Information Assistants – Virtual Agents with Linguistic Understanding, Social skills, and Personalised Aspects (#645378)

EU Horizon 2020 Research & Innovation Action (RIA) – 9.3% acceptance rate in the call
Runtime: 01.01.2015 – 31.12.2017
Role: Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
Partners: University of Nottingham, Imperial College London, CNRS, University of Augsburg, University of Twente, Cereproc Ltd, La Cantoche Production
The ARIA-VALUSPA project will create a ground-breaking new framework that will allow easy creation of Artificial Retrieval of Information Assistants (ARIAs) that are capable of holding multi-modal social interactions in challenging and unexpected situations. The system can generate search queries and return the information requested by interacting with humans through virtual characters. These virtual humans will be able to sustain an interaction with a user for some time, and react appropriately to the user’s verbal and non-verbal behaviour when presenting the requested information and refining search results. Using audio and video signals as input, both verbal and non-verbal components of human communication are captured. Together with a rich and realistic emotive personality model, a sophisticated dialogue management system decides how to respond to a user’s input, be it a spoken sentence, a head nod, or a smile. The ARIA uses special speech synthesisers to create emotionally coloured speech and a fully expressive 3D face to create the chosen response. Back-channelling, indicating that the ARIA understood what the user meant, or returning a smile are but a few of the many ways in which it can employ emotionally coloured social signals to improve communication. As part of the project, the consortium will develop two specific implementations of ARIAs for two different industrial applications. A ‘speaking book’ application will create an ARIA with a rich personality capturing the essence of a novel, whom users can ask novel-related questions. An ‘artificial travel agent’ web-based ARIA will be developed to help users find their perfect holiday – something that is difficult to do with existing web interfaces such as those created by booking.com or tripadvisor.
Automatic General Audio Signal Classification
China Scholarship Council
Runtime: 01.09.2014 – 31.08.2018
Role: Supervisor
Partners: TUM
Speech Emotion Recognition using Nonlinear Dimensionality Reduction Methods
China Scholarship Council
Runtime: 01.08.2014 – 31.07.2016
Role: Supervisor
Partners: TUM
In-car music recommendation system based on driver’s emotion
TUM University Foundation Fellowship
Runtime: 01.05.2014 – 30.04.2015
Role: Supervisor
Partners: TUM
iHEARu: Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics (#338164)

FP7 ERC Starting Grant (StG) – 8.6% acceptance rate in the call (7% in Computer Science)
Runtime: 01.01.2014 – 31.12.2018
Role: Author Proposal, Principal Investigator and Grant Holder
Partners: University of Augsburg, University of Passau, TUM
Recently, automatic speech and speaker recognition has matured to the degree that it entered the daily lives of thousands of Europe’s citizens, e.g., on their smart phones or in call services. During the next years, speech processing technology will move to a new level of social awareness to make interaction more intuitive, speech retrieval more efficient, and lend additional competence to computer-mediated communication and speech-analysis services in the commercial, health, security, and further sectors. To reach this goal, rich speaker traits and states such as age, height, personality and physical and mental state as carried by the tone of the voice and the spoken words must be reliably identified by machines. In the iHEARu project, ground-breaking methodology including novel techniques for multi-task and semi-supervised learning will deliver for the first time intelligent holistic and evolving analysis in real-life condition of universal speaker characteristics which have been considered only in isolation so far. Today’s sparseness of annotated realistic speech data will be overcome by large-scale speech and meta-data mining from public sources such as social media, crowd-sourcing for labelling and quality control, and shared semi-automatic annotation. All stages from pre-processing and feature extraction, to the statistical modelling will evolve in “life-long learning” according to new data, by utilising feedback, deep, and evolutionary learning methods. Human-in-the-loop system validation and novel perception studies will analyse the self-organising systems and the relation of automatic signal processing to human interpretation in a previously unseen variety of speaker classification tasks. The project’s work plan gives the unique opportunity to transfer current world-leading expertise in this field into a new de-facto standard of speaker characterisation methods and open-source tools ready for tomorrow’s challenge of socially aware speech analysis.
U-STAR: Universal Speech Translation Advanced Research
Academic Cooperation

Runtime: 01.06.2012 – 31.03.2013
Role: Consortial Partner
Partners: TUM and others.
The Universal Speech Translation Advanced Research Consortium (U-STAR) is an international research collaboration entity formed to develop a network-based speech-to-speech translation (S2ST) with the aim of breaking language barriers around the world and to implement vocal communication between different languages.
ASC-INCLUSION: Integrated Internet-Based Environment for Social Inclusion of Children with Autism Spectrum Conditions (#289021)

EU FP7 Specific Targeted Research Project (STREP)
Runtime: 01.11.2011 – 31.12.2014
Role: Coordinator, Principal Investigator, Coauthor Proposal, Project Steering Board Member, Workpackage Leader
Partners: University of Cambridge, Bar Ilan University, Compedia, University of Genoa, Karolinska Institutet, Autism Europe, TUM, Koc University, Spectrum ASC-Med
Autism Spectrum Conditions (ASC, frequently defined as ASD – Autism Spectrum Disorders) are neurodevelopmental conditions, characterized by social communication difficulties and restricted and repetitive behaviour patterns. Current studies suggest 1% of the population might fit an ASC diagnosis. Alongside their difficulties individuals with ASC tend to have intact and sometimes superior abilities to comprehend and manipulate closed, rule-based, predictable systems, such as computerized environment. Their affinity for the computerized environment has led to several attempts to teach emotion recognition and expression, and social problem solving to individuals with ASC, using computer-based training.
In the last decade, web applications have been increasingly used for social interaction, forming online communities and social networks. Anecdotal reports of the emergence of online autistic communities, and the use of forums and virtual-worlds, show the great promise the internet holds for better inclusion and social skills training for users/people with ASC. Since intervention into ASC has been shown to be more effective when provided early in life, using the internet as a platform for the support of younger individuals with ASC could significantly promote their social inclusion.
The project aims to create and evaluate the effectiveness of such an internet-based platform, directed for children with ASC (and other groups like ADHD and socially-neglected children) and those interested in their inclusion. This platform will combine several state-of-the art technologies in one comprehensive virtual world, including analysis of users’ gestures, facial and vocal expressions using standard microphone and webcam, training through games, text communication with peers and smart agents, animation, video and audio clips. User’s environment will be personalized, according to individual profile & sensory requirements, as well as motivational. Carers will be offered their own supportive environment, including professional information, reports of child’s progress and use of the system and forums for parents and therapists.
Semi-Supervised Learning in the Analysis of Continuous Speaker Emotion and Personality
China Scholarship Council
Runtime: 01.08.2011 – 31.07.2015
Role: Supervisor
Partners: TUM
Highly Robust Interest and Emotion Recognition from Speech
China Scholarship Council
Runtime: 01.08.2011 – 30.09.2012
Role: Supervisor
Partners: TUM
Kontextsensitive automatische Erkennung spontaner Sprache mit BLSTM-Netzwerken (#SCHU2508-4/1)
(“Context-Sensitive Automatic Recognition of Spontaneous Speech by BLSTM Networks”)
DFG (German Research Foundation) Project
Runtime: 01.03.2011 – 28.02.2014
Role: Principal Investigator, Author Proposal
Partners: TUM
Trotz zahlreicher Fortschritte im Bereich der automatischen Spracherkennung ist die Erkennungsleistung und Robustheit heutiger Spracherkennungssysteme nicht ausreichend, um als Grundlage für natürliche, spontansprachliche Mensch-Maschine-Interaktion zu dienen. Ziel des Forschungsvorhabens ist es deshalb, die Genauigkeit von Systemen zur Erkennung natürlicher, fließender Sprache mittels neuartiger Mustererkennungsmethoden zu verbessern. Da die Effizienz der menschlichen Spracherkennung vor allem auf der intelligenten Auswertung von Langzeit-Kontextinformation beruht, sollen dabei Ansätze zur Berücksichtigung von Kontext auf Merkmalsebene verfolgt werden. Ausgehend von sogenannten Tandem-Spracherkennern, bei denen neuronale Netze zur Phonemprädiktion in Kombination mit dynamischen Klassifikatoren verwendet werden, sollen hierzu bidirektionale Long Short-Term Memory (BLSTM) Netzwerke eingesetzt werden. Im Gegensatz zu derzeit in Tandem- Systemen verwendeten Phonemschätzern erlaubt es das BLSTM-Prinzip, ein optimales Maß an Kontextinformation bei der Prädiktion miteinzubeziehen. Da jüngste Erfolge im Bereich der kontextsensitiven Phonemerkennung und Schlüsselwortdetektion die Effektivität des BLSTM-Ansatzes unterstreichen, ist eine entsprechende Weiterentwicklung kontinuierlicher Spracherkennungssysteme äußerst vielversprechend.
GLASS: Generic Live Audio Source Separation
Industry Cooperation with HUAWEI TECHNOLOGIES within the HUAWEI Innovative Research Program (HIRP)
Runtime: 01.01.2011 – 31.12.2013
Role: Principal Investigator, Author Proposal
Partners: TUM and HUAWEI
GLASS finds new ways of separating audio sources, e.g., for crystal clear speech communication, by machine intelligence and advanced separation algorithms.
Novel Approaches for Large Vocabulary Continuous Speech Recognition
China Scholarship Council
Runtime: 01.08.2010 – 31.07.2014
Role: Supervisor
Partners: TUM
Nichtnegative Matrix-Faktorisierung zur störrobusten Merkmalsextraktion in der Sprachverarbeitung (#SCHU2508-2/1)
(“Non-Negative Matrix Factorization for Robust Feature Extraction in Speech Processing”)
DFG (German Research Foundation) Project
Runtime: 01.06.2010 – 31.05.2013
Role: Principal Investigator, Author Proposal
Partners: TUM
Hauptziel des Forschungsvorhabens ist, die Erkennung von Sprach- und Musiksignalen störrobuster zu gestalten. Besonderes Kennzeichen ist die Integration von Merkmalen, die auf Nichtnegativer Matrix-Faktorisierung (NMF) basieren. NMF – ein Verfahren zur Datenreduktion – erfreut sich in der Signalverarbeitung jüngst zunehmender Popularität. Dabei wird meist ein Spektrogramm in zwei Faktoren zerlegt. Der erste enthält eine spektrale ‘Basis’ des Signals, der zweite die Aktivität der Basisvektoren über die Zeit. In diesem Forschungsvorhaben werden aus dem zweiten Faktor Merkmale gewonnen, die bestehende Architekturen zur Sprach- und Musikverarbeitung ergänzen können. Erste durchgeführte Experimente zur NMF-Merkmalsextraktion für die störrobuste Erkennung gesprochener Buchstabensequenzen im Fahrzeug haben sich dabei konventionellen Verfahren als signifikant überlegen und äußerst vielversprechend erwiesen. Das dabei verwendete Verfahren soll im Rahmen des Projekts durch Weiterentwicklung der NMF verbessert werden und insbesondere für den Einsatz in echtzeitfähigen Spracherkennungssystemen, auch für fließende Sprache, vorbereitet werden. Schließlich sollen die beschriebenen NMF-Merkmale in weiteren Anwendungsfeldern wie Emotionserkennung, Erkennung von nichtlinguistischer Vokalisierung wie Lachen oder Husten in Sprache und Akkorderkennung mit dem Ziel der Steigerung aktueller Erkennungsgüte und der Störrobustheit eingesetzt werden.
TCVC: Talking Car and Virtual Companion
Industry Cooperation with Continental Automotive GmbH
Runtime: 01.06.2008 – 30.11.2008
Role: Principal Investigator, Author Proposal
Partners: TUM and Continental Automotive GmbH
TCVC provides an expertise on emotion in the car with respect to a requirement analysis, potential and near-future use-cases, technology assessment and a user acceptance study.
ICRI: In-Car Real Internet
Industry Cooperation with Continental Automotive GmbH
Runtime: 01.06.2008 – 30.11.2008
Role: Principal Investigator, Author Proposal
Partners: TUM and Continental Automotive GmbH
ICRI aims at benchmarking of internet browsers on embedded platforms as well as at development of an integrated multimodal demonstrator for internet in the car. Investigated modalities contain hand-writing and touch-gestures and natural speech apart from conventional GUI interaction. The focus lies on MMI development with an embedded realisation.
PROPEREMO: Production and Perception of Emotions: An affective sciences approach (#230331)
FP7 ERC Advanced Grant
Runtime: 01.03.2008 – 28.02.2015
Role: Participant
Partners: University of Geneva (PI Klaus Scherer), TUM, Free University of Berlin
Emotion is a prime example of the complexity of human mind and behaviour, a psychobiological mechanism shaped by language and culture, which has puzzled scholars in the humanities and social sciences over the centuries. In an effort to reconcile conflicting theoretical traditions, we advocate a componential approach which treats event appraisal, motivational shifts, physiological responses, motor expression, and subjective feeling as dynamically interrelated and integrated components during emotion episodes. Using a prediction-generating theoretical model, we will address both production (elicitation and reaction patterns) and perception (observer inference of emotion from expressive cues). Key issues are the cognitive architecture and mental chronometry of appraisal, neurophysiological structures of relevance and valence detection, the emergence of conscious feelings due to the synchronization of brain/body systems, the generating mechanism for motor expression, the dimensionality of affective space, and the role of embodiment and empathy in perceiving and interpreting emotional expressions. Using multiple paradigms in laboratory, game, simulation, virtual reality, and field settings, we will critically test theory-driven hypotheses by examining brain structures and circuits (via neuroimagery), behaviour (via monitoring decisions and actions), psychophysiological responses (via electrographic recording), facial, vocal, and bodily expressions (via micro-coding and image processing), and conscious feeling (via advanced self-report procedures). In this endeavour, we benefit from extensive research experience, access to outstanding infrastructure, advanced analysis and synthesis methods, validated experimental paradigms as well as, most importantly, from the joint competence of an interdisciplinary affective science group involving philosophers, linguists, psychologists, neuroscientists, behavioural economists, anthropologists, and computer scientists.
SEMAINE: Sustained Emotionally coloured Machine-humane Interaction using Nonverbal Expression (#211486)

EU FP7 STREP
Runtime: 01.01.2008 – 31.12.2010
Role: Principal Investigator, Coauthor Proposal (highest ranked in the call), Project Steering Board Member, Workpackage Leader
Partners: DfKI, Queens University Belfast (QUB), Imperial College of Science, Technology and Medicine London, University Twente, University Paris VIII, CNRS-ENST, TUM
SEMAINE deals with real-time, robust, non-verbally competent conversations between a conversational agent and a human user.
cUSER 2
Industry cooperation with Toyota
Runtime: 08.01.2007 – 31.07.2007
Role: Principal Investigator, Coauthor Proposal
Partners: TUM and Toyota
The aim of the cUSER follow-up project is to establish a system to interpret human interest by combined speech and facial expression analysis basing on multiple input analyses. Besides the aim of improved highest possible accuracy by subject adaptation, class balancing strategies, and fully automatic segmentation by individual audio and video stream analysis, cUSER 2 focuses on real-time application by cost-sensitive feature space optimization, graphics processing power utilization, and high-performance programming methods. Furthermore, feasibility and real recognition scenarios will be evaluated.
cUSER
Industry cooperation with Toyota
Runtime: 01.08.2005 – 30.09.2006
Role: Coauthor Proposal, Project Steering, Senior Researcher
Partners: TUM and Toyota
The aim of this project was an audiovisual approach to the recognition of spontaneous human interest. For a most robust estimate, information from four sources was combined for the first time (as reported in the literature) by a synergistic and individual failure tolerant early feature fusion: Firstly, speech is analyzed with respect to acoustic properties based on a high-dimensional prosodic, articulatory, and voice quality feature space plus the linguistic analysis of spoken content by ASR and bag-of-words vector space modeling. Secondly, visual analysis provides patterns of the facial expression by Active Appearance Models, and of the movement activity by eye tracking. Experiments were carried out on a video database of 10.5h of spontaneous human-to-human conversation collected throughout the project. It contains 20+ subjects in gender and age-class balance. Recordings were fulfilled considering diverse comfort and noise conditions. Multiple levels of interest were annotated within a rich transcription. Experiments aimed at person-independent and robust real-life usage and showed the high potential of such a multimodal approach. Benchmark results could further be provided based on transcription versus full automatic processing.
CEICES: Combining Efforts for Improving automatic Classification of Emotional user States
Reserach Initiative within the EU FP6 Network of Excellence (NoE) HUMAINE
Runtime: 2005 – 2008
Role: Invited Research Expert
Partners: Friedrich Alexander Universität Erlangen-Nürnberg (FAU), Fondazione Bruno Kessler (FBK, formerly ITC-IRST), Universität Karlsruhe (UKA), Universität Augsburg (UA), LIMSI-CNRS, Tel Aviv University (TAU), Tel Aviv Academic College of Engineering (AFEKA), TUM
CEICES is a co-operation between several sites dealing with classification of emotional user states conveyed via speech; this initiative was taken within the European Network of Excellence HUMAINE under the name CEICES (Combining Efforts for Improving automatic Classification of Emotional user States). The database used within CEICES is a German corpus with recordings of 51 ten- to thirteen-year old children communicating with Sony’s AIBO pet robot. Conceptualization, design and recordings were done at the originator site FAU. The approach to be followed within CEICES looked like this: the originator site provided speech files, phonetic lexicon, manually corrected word segmentation, and manually corrected F0 values, emotional labels, definition of train and test samples, etc. The data was annotated at the word level. All partners committed themselves to share with all the other partners their extracted feature values together with the necessary information (which feature models which acoustic or linguistic phenomenon, format of feature values, classifier used, etc.). Thus each site could assess the features provided by all other sites, together with their own features, aiming at a repertoire of optimal features.
SOMMIA: Speech-oriented Man-Machine-Interface in the Automobile
Industry Cooperation with SiemensVDO Automotive
Runtime: 01.06.2000 – 30.09.2001
Role: Coauthor Proposal, Project Steering, Researcher
Partners: TUM and SiemensVDO Automotive
The SOMMIA project focused on the design and evaluation of an ergonomic and generic operation concept for a speech-based MMI integrated in a car MP3-player or in comparable automotive applications. In addition, the system was subject to several economic and geometrical boundary conditions: a two line, 16 characters display with a small set of LEDs and a speaker- independent full word recognizer with 30 to 50 words active vocabulary. Nevertheless, the interface had to meet high technical requirements: its handling should be easy to learn, comfortable and, above all, intuitive and interactively explorable.
FERMUS: Fehlerrobuste Multimodale Sprachdialoge
Industry cooperation with BMW, DaimlerChrysler, Siemens, VDO Automotive
Runtime: 01.03.2000 – 30.06.2003
Role: Coauthor Proposal, Project Steering, Researcher
Partners: TUM and BMW, DaimlerChrysler, Siemens, VDO
The primary intention of the FERMUS-project was to localize and evaluate various strategies for a dedicated analysis of potential error patterns during human-machine interaction with information and communication systems in upper-class cars. For reaching this goal, we have employed a huge bundle of progressive and mainly recognition-based input modalities, like interfaces for natural speech and dynamic gestural input. Particularly, emotional patterns of the driver have beeen integrated for generating context-adequate dialog structures.
ADVIA: Adaptive Dialogverfahren im Automobil
Industry cooperation with BMW
Runtime: 01.07.1998 – 31.12.2000
Role: Coauthor Proposal, Project Steering, Researcher
Partners: TUM and BMW
In modernen Fahrzeugen der Luxusklasse, aber durch den ständigen Preisverfall bereits in Mittelklassefahrzeugen, koexistieren zahlreiche elektronische Geräte zur Steigerung des Komforts. Einige Beispiele hierfür sind: Klimaanlage, Navigations- und Telematiksysteme, Audiokomponenten wie CD-Wechsler und Radio, Funktelefon, Bordcomputer und vieles mehr. Da die meisten dieser Komponenten in irgendeiner Weise computerbasiert sind, kann man sich viele Funktionen ausdenken und diese implementieren. Spendiert man jeder dieser Funktionen einen eigenen mechanischen Schalter würde die Anzahl dieser Bedienelemente in der Größenordnung von grossen Konzertorgeln oder Studiomischpulten liegen, nur eben auf der kleinen Fläche im Fahrzeugcockpit. Werden nun die Geräte von verschiedenen Herstellern geliefert, sieht jedes anders aus und besitzt eine firmenspezifische Bedienlogik. Fazit: Trotz aller Funktionalität wird die Flut an Technik unbedienbar. Natürlich haben die Fahrzeughersteller schon längst reagiert und multifunktionale Schnittstellen mit Display und wenigen Bedienelementen entwickelt, wobei die eigentlichen Geräte räumlich über das Fahrzeug verteilt sind und über ein Bussystem mit der Bedieneinheit kommunizieren. Allerdings ergibt sich nun das nächste Problem: Wie gestaltet man eine solche Bedieneinheit, wie kann man Sprache, Akustik und Gesten integrieren und wie erleichtert man dem Fahrer den Umgang mit all der Technik. Hier kommen wir ins Spiel. Für unsere Forschungen steht uns ein speziell eingerichter Fahrssimulator zur Verfügung, der uns Usability-Versuche mit Fahrzeugbedienungen gestattet. Hierbei wird ein MMI (Man-Machine-Interface) durch einen Rechner simuliert. Durch Wizard-of-Oz Versuche können wir in einer Art Rapid Prototyping neue Techniken ausprobieren, indem ein “Wizard” die Versuchspersonen beobachtet und im Hintergrund den Versuchsrechner bedient. Die Versuchsperson bekommt dadurch den Eindruck, tatsächlich mit einer Geste oder einem Satz Aktionen zu bewirken. Erweisen sich bestimmte Bedienungsmethoden als vorteilhaft können diese tatsächlich umgesetzt werden und beispielsweise auf dem Lehrstuhlnetzwerk realisiert werden. Diese Vorgehensweise erscheint zunächst seltsam, hat sich aber als absolut effektiv bewährt.

Current Calls