Current Projects

LLM Cooperation through Mutual Feedback and Reward Shaping

TUM Global Incentive Fund

Runtime: 30.09.2025 – 30.09.2026
Role: Principal Investigator, Co-author Proposal
Partners: TUM, Imperial College London

Our project introduces a novel paradigm for training large language models (LLMs) that moves beyond static fine-tuning or isolated inference. We enable LLMs to instruct and provide feedback to one another within a reinforcement learning (RL) framework. Previous work only uses LLM as a judge but not mutual learning. Traditional RL depends heavily on manually designed reward functions or costly human feedback, but this approach utilises the pretrained knowledge of LLMs to generate more context-aware guidance signals. We will provide a strategy to integrate human preferences, not as static labels but as part of a dynamic learning loop to enable continual refinement. We hope this research could reduce reliance on human supervision while enabling AI systems to evolve collaboratively.

Development and Deployment of AI models in Predictive Maintenance Scenarios
Industry Cooperation with MAN Truck & Bus SE

Runtime: 36 months
Role: Principal Investigator, Co-author Proposal
Partners: TUM, MAN Truck & Bus SE

Warning systems that detect signs of future breakdowns ahead of time are promising to mitigate corresponding risks. Concretely, these systems can be based on remotely send measurements, which encode certain patterns indicative of the states of the involved pieces. This includes, for instance, the continuously captured battery voltage curve during a motor start cycle, which is expected to change in character over time due to wear and tear. While these changes are hard to model mathematically a priori and might be strongly affected by external variables, such as ambient or engine temperature, machine learning – and in particular – deep learning techniques are promising candidates to effectively learn signal patterns indicative of breakdown risks. The full potential of these algorithms is likely to be enabled if they are integrated in a deployed software framework, that allows to provide large amounts of data for training and testing. This allows to directly measure real-world-impact and in return improve the applied concepts.

AI-supported failure prevention for commercial vehicles
Industry Cooperation with MAN Truck & Bus SE

Runtime: 36 months
Role: Principal Investigator, Co-author Proposal
Partners: TUM, MAN Truck & Bus SE

A common problem in the everyday use of trucks in logistical chains is the risk of a breakdown en route, leading to financial losses. While the frequency of such breakdowns can be reduced to some degree through regular maintenance, estimating the exact time at which breakdowns occur due to worn-down or broken components remains close to guesswork. Warning systems that detect signs of future breakdowns ahead of time are thus promising to further mitigate corresponding risks. Concretely, these systems can be based on onboard measurements which are transferred via OTA to a database with large computing capacity for post-processing. There, certain patterns indicating damage or the beginning of the end of the lifetime of components can be calculated. These predictive maintenance systems need to be as reliable and accurate as possible given large economic impact both on customer side as well as ultimately on manufacturer side. The following use case scenarios underline how an accurate and timely prediction of the decay of certain truck components can effectively lead to (economic) benefits in the larger transportation and maintenance framework.

Emotion Computing in Speech – Perception with LLM
Industry Cooperation with HUAWEI TECHNOLOGIES

Runtime: 15.05.2025 – 14.09.2026
Role: Principal Investigator, Co-author Proposal
Partners: TUM, HUAWEI TECHNOLOGIES

This project presents a research initiative focused on developing a comprehensive model capable of predicting a wide array of speaker states, traits, and acoustic/prosodic description from a given speech utterance. The proposed model will leverage advanced deep learning techniques to generate clear, concise, and contextually accurate descriptions, exemplified by statements like “this is an utterance of a happy man, characterized by a rising pitch contour and a vibrant tone.” Through this initiative, we aim to enhance the accuracy and applicability of SER systems across diverse fields.

MENOSTIK: KI-gestützte Diagnostik der Wechseljahre durch Wearables und digitale Biomarker
BMBF “Start-interaktiv”

Runtime: 01.01.2026 – 31.12.2028
Role: Principal Investigator, Co-author Proposal
Partners: TUM

Menostik revolutiniert den Ansatz der Menopausen-Detektion durch die Entwicklung einer von Künstlicher Intelligenz (KI) gestützten Diagnostikplattform. Die Analyse von Daten aus handelsüblichen tragbaren Sensoren (Wearables) sowie Stimmaufnahmen ermöglicht eine präzise, nicht-invasive Früherkennung der Perimenpause.

TE(A)CHADOPT: Teaching students how children with neurodevelopmental disorders adopt and interact with technologies (#)
EU Horizon 2020 ERASMUS+

Runtime: 3 years
Role: Principal Investigator, Co-author Proposal
Partners: Medical University of Graz, Politechnika Gdanska, Yeditepe University, Istanbul Technical University, Beit Issie Shapiro – Amutat Avi, Alliance for applied psychology, TUM

We aim to advance technology adaptation to the needs of children with neurodevelopmental disorders. We search for concrete methods to evaluate how they interact with technologies and test the strategies in observational studies.
Our findings shall result in guidelines that aim to support people involved in the technology development to adapt their products to this user group. The guidelines will be disseminated to students, technology providers, therapists, researchers, families, and others. We will perform systematic literature reviews on technology adoption models and methods to evaluate the interaction of children with neurodevelopmental disorders with technologies. In 4 countries, we will perform observational studies with at least 25 children in total. We will develop guidelines on how to evaluate child-technology interaction, evaluate, revise, and translate them. We will publish our findings, and organise a student workshop as well as promotion events in the partner countries. TE(A)CHADOPT shall help to shift the focus of technology providers from producing pure learning applications to joyful and custom-tailored games for children with neurodevelopmental disorders. This will advance the inclusion of children with neurodevelopmental disorders and increase their quality of life. Our guidelines will help to identify challenges that children with neurodevelopmental disorders have when using technologies and provide support on how to adapt the products accordingly.

Silent Speech: Enabling Quiet Communication through EMG (#15 [2024-1])
Bavaria California Technology Center (BaCaTeC)

Runtime: 07/2024 – 12/2025
Role: Principal Investigator, Co-author Proposal
Partners: TUM, University of Southern California

Silent Computational Paralinguistics (SCP) focuses on recognizing speaker states as well as traits during non-audible speech from sources such as facial ElectroMyoGraphy (EMG) signals. SCP can help to interact with next generation socio-emotionally competent speech technology in a private manner or the mute. The cooperation aims to significantly advance the field of SCP by collecting a larger EMG-speech corpus and developing improved machine learning models. The project will advance research into SCP in the following directions: 1) Collecting a larger, more diverse, more expressive EMG-Silent Speech dataset with sessions being recorded from a more diverse speaker set consisting of project participants from both partner institutions, with the participants themselves performing more varied communication expressions. 2) Establishing relevant baseline metrics for modeling the collected dataset, this is achieved by applying more traditional machine learning approaches to establish baseline dataset modeling parameters. 3) Investigating advanced deep learning approaches for SCP modeling, methods into transfer learning from the speech modality to the EMG modality, and representation learning, with EMG-to-speech synthesis being of high priority for investigation.

VoCS: Voice Communication Sciences (#101168998)
EU Horizon 2020 Marie Sklodowska-Curie Innovative Training Networks European Training Networks (MSCA-2023-DN-01-01)

Runtime: 4 years
Role: Principal Investigator, Co-author Proposal
Partners: Université d’Aix Marseille, Friedrich Schiller University Jena, University of Maastricht, University of Oslo, University Jean Monnet Saint-Etienne, Eotvos Lorand Tudomanyegyetem, Universidad Pompeu Fabra, Univerzita Karlova, Ita-Suomen Yliopisto, University of Twente, Queen Mary University of London, Audeering GmbH, Oticon A/S, Universiy of Zurich, University of Augsburg, TUM, Oxford Wave Research Ltd, National Institute of Informatics, National Bureau of Investigation, Odia, Oticon Medical

With AI-driven advances, the rapidly developing field of voice technology (VT) has transformed European life through voice assistants, text-to- speech systems, and cochlear implants. However, severe challenges remain in processing paralinguistic information such as identity, emotional state or health in voices. The Voice Communication Sciences (VoCS) project’s innovative aspects lie in its comprehensive approach to voice processing, bridging disciplines from neuroscience to engineering. The VoCS research program is structured around three scientific objectives: (1) advancing basic knowledge of natural voice processing, exploring paralinguistic information in voices; (2) building on these insights to design more natural and flexible synthetic voices; (3) transferring this knowledge into user-oriented applications in health and forensics, including the improvement of voice perception for hearing-impaired individuals, advancements in forensic speaker comparison methods, and the development of tools to combat deepfake speech. VoCS aims to contribute not only to scientific knowledge but also to the exponential growth of the VT industry by creating a network of skilled experts shaping the future of VT in Europe.

INDUX-R: Transforming European INDUstrial Ecosystems through eXtended Reality enhanced by human-centric AI and secure, 5G-enabled IoT (#101135556)
EU Horizon 2020 Research & Innovation Action (RIA)

Runtime: 36 months
Role: Principal Investigator, Co-Author Proposal
Partners: CERTH, FORTH, CWI, University of Augsburg, TUM, University of Barcelona, Fundacio Eurecat, FINT, NOVA, ORAMA, INOVA, RINA-CSM, IDECO, Crealsa, Inventics, University of Geneva, EKTACOM, University of Jena

INDUX-R will create an XR ecosystem with concrete technological advances over existing offerings, validated in scenarios across the Industry 5.0 spectrum. Starting from the virtualization of the real world, INDUX-R will enable users to seamlessly create ad-hoc, realistic digital representations of their surroundings using commodity hardware and providing an immersive background for INDUX-R applications, by further researching Neural Radiance Fields (NeRF), 3D scanning and audio-reconstruction methodologies. This work will be enriched with an XR toolkit for; i) the synthesis of speech driven, lifelike face animations utilising Transformers and Generative Adversarial Networks, and; ii) the generation of photo-realistic human avatars driven by 3D human pose estimation and local radiance fields for accurately replicating human motion, modelling deformation phenomena and reproducing natural texture. INDUX-R will research real-time, egocentric perception algorithms, integrated in XR wearables to provide contextual analysis of the users’ surroundings and enable new ways of XR interaction using visual, auditory and haptic cues. Egocentric perception will be combined with virtual elastic objects that the user can manipulate and deform in XR according to material properties, getting multi-sensorial feedback in real-time. By exploiting this closed-feedback loop, INDUX-R will develop a dynamic and pervasive user interface environment that can adapt to user’s profile, abilities and task at hand. This adaptation process will be controlled by Reinforcement Learning algorithms that will adjust the presented XR content in an online, human-centric manner that improves accessibility. Through these interfaces human-in-the-loop pipelines based on Active Learning will be implemented where user feedback will be utilised to improve the quality of services and applications offered.

Wiss-KKI: Wissenschaftskommunikation über und mit kommunikativer künstlicher Intelligenz: Emotionen, Engagement, Effekte
BMBF (Förderrichtlinie Wissenschaftskommunikationsforschung, 7.9% Acceptance Rate in the Call)

Runtime: 01.01.2024 – 31.12.2026
Role: Principal Investigator, Co-author Proposal
Partners: University of Augsburg, TUM, TU Braunschweig

Dieses Projekt widmet sich der Rolle kommunikativer künstlicher Intelligenz (KKI) in der Wissenschaftskommunikation. Diese Technologie führt Aufgaben in Kommunikationsprozessen aus, die ehedem als genuin menschliche Aktivität wahrgenommen wurden (z.B ChatGPT). KKI hat eine Doppelrolle als Vermittler/Kommunikator über sozio-wissenschaftliche Themen und als Gegenstand der Wissenschaftskommunikation, etwa in der Medienberichterstattung.
Das Projekt hat zum Ziel, das Potential von KKI für Wissenschaftskommunikation in dieser Doppelrolle in einem interdisziplinären Verbund zwischen Kommunikationswissenschaft und Informatik systematisch zu untersuchen. In einer konzeptionellen Phase sollen zunächst Zielgrößen für Wissenschaftskommunikation über und mit KKI bestimmt werden. In einer darauffolgenden empirischen Phase wird (1) der Diskurs in traditionellen und sozialen Medien mit einer Verschränkung manueller und automatisierter Verfahren analysiert, (2) der Effekt des medialen Diskurses auf Emotionen und Bewertungen der Technologie in experimentellen Designs untersucht, und (3) das Engagement (Ausmaß und Qualität der Interaktion von User:innen mit KKI-Tools für Wissenschaftskommunikation) in einer Kombination von qualitativen und quantitativen Methoden exploriert. Dabei wird angenommen, dass Diskurs, Praktiken und Effekte eine für Wahrnehmung und Nutzung bedeutende emotionale Komponente haben. Schließlich wird in einem technischen Teil ein Anforderungsprofil an ein KKI-Tool für Wissenschaftskommunikation erstellt und ein KKI-basiertes Tool für die direkte Kommunikation zwischen Wissenschaft und Öffentlichkeit entwickelt. Dieses ermöglicht es Wissenschaftler:innen, aus Publikationen leicht verständliche, zielgruppenspezifische Pressemitteilungen und Social Media Posts zu erstellen. Zugleich soll das Tool auch von Laien genutzt werden können, um sich mit Themen der Wissenschaft auseinanderzusetzen. Die technische Entwicklung wird von einer formativen Evaluation begleitet.

COHYPERA: Computed hyperspectral perfusion assessment
Seed Funding UAU Project

Runtime: 24 months
Role: Principal Investigator, Co-author Proposal
Partners: University of Augsburg

Over the last years, imaging photoplethysmography (iPPG) has been attracting immense interest. iPPG assesses the cutaneous perfusion by exploiting subtle color variations from videos. Common procedures use RGB cameras and employ the green channel or rely on a linear combination of RGB to extract physiological information. iPPG can capture multiple parameters such as heart rate (HR), heart rate variability (HRV), oxygen saturation, blood pressure, venous pulsation and strength as well as spatial distribution of cutaneous perfusion. Its highly convenient usage and a wide range of possible applications, e.g. patient monitoring, using skin perfusion as early risk score and assessment of lesions, make iPPG a diagnostic mean with immense potential. Under real -world conditions, however, iPPG is prone to errors. Particularly regarding analyses beyond HR, the number of published works is limited, proposed algorithms are immature, basic mechanisms are not completely understood and iPPG’s potential is far from being exploited. We hypothesize that hyperspectral (HS) reconstruction by artificial intelligence (AI) methods can fundamentally improve iPPG and extend its applicability. HS reconstruction refers to the estimation of HS images from RGB images. The technique has recently gained much attention but is not common to iPPG. COHYPERA aims to prove the potential of HS reconstruction as universal processing step for iPPG. The pursued approach takes advantage of the fact that the HS reconstruction can incorporate knowledge and training data to yield a high dimensional data representation, which enables various analyses.

Silent Paralinguistics (#SCHU2508/15-1)
DFG (German Research Foundation) Project

Runtime: 01.09.2023 – 31.08.2026
Role: Principal Investigator, Co-author Proposal
Partners: TUM, University of Bremen

We propose to combine Silent Speech Interfaces with Computational Paralinguistics to form Silent Paralinguistics (SP). To reach the envisioned project goal of inferring paralinguistic information from silently produced speech for natural spoken communication, we will investigate three major questions: (1) How well can speaker states and traits be predicted from EMG signals of silently produced speech, using the direct and indirect silent paralinguistics approach? (2) How to integrate the paralinguistic predictions into the Silent Speech Interface to generate appropriate acoustic speech from EMG signals (EMG-to-speech)? and (3) Does the resulting paralinguistically enriched acoustic speech signal improve the usability of spoken communication with regards to naturalness and user acceptance?

HearTheSpecies: Using computer audition to understand the drivers of soundscape composition, and to predict parasitation rates based on vocalisations of bird species (#SCHU2508/14-1) (“Einsatz von Computer-Audition zur Erforschung der Auswirkungen von Landnutzung auf Klanglandschaften, sowie der Parasitierung anhand von Vogelstimmen“)
DFG (German Research Foundation) Project, Schwerpunktprogramm „Biodiversitäts-Exploratorien“

Runtime: 01.03.2023 – 29.02.2026
Role: Principal Investigator, Co-author Proposal
Partners: University of Augsburg, TUM, University of Freiburg

The ongoing biodiversity crisis has endangered thousands of species around the world and its urgency is being increasingly acknowledged by several institutions – as signified, for example, by the upcoming UN Biodiversity Conference. Recently, biodiversity monitoring has also attracted the attention of the computer science community due to the potential of disciplines like machine learning (ML) to revolutionise biodiversity research by providing monitoring capabilities of unprecedented scale and detail. To that end, HearTheSpecies aims to exploit the potential of a heretofore underexplored data stream: audio. As land use is one of the main drivers of current biodiversity loss, understanding and monitoring the impact of land use on biodiversity is crucial to mitigate and halt the ongoing trend. This project aspires to bridge the gap between existing data and infrastructure in the Exploratories framework and state-of-the-art computer audition algorithms. The developed tools for coarse and fine scale sound source separation and species identification can be used to analyse the interaction among environmental variables, local and regional land-use, vegetation cover and the different soundscape components: biophony (biotic sounds), geophony (abiotic sounds) and anthropophony (human-related sounds).

SHIFT: MetamorphoSis of cultural Heritage Into augmented hypermedia assets For enhanced accessibiliTy and inclusion (#101060660)
EU Horizon 2020 Research & Innovation Action (RIA)

Runtime: 01.10.2022 – 30.09.2025
Role: Principal Investigator, Workpackage Leader, Co-Author Proposal
Partners: Software Imagination & Vision, Foundation for Research and Technology, Massive Dynamic, Audeering, University of Augsburg, TUM, Queen Mary University of London, Magyar Nemzeti Múzeum – Semmelweis Orvostörténeti Múzeum, The National Association of Public Librarians and Libraries in Romania, Staatliche Museen zu Berlin – Preußischer Kulturbesitz, The Balkan Museum Network, Initiative For Heritage Conservation, Eticas Research and Consulting, German Federation of the Blind and Partially Sighted

The SHIFT project is strategically conceived to deliver a set of technological tools, loosely coupled that offers cultural heritage institutions the necessary impetus to stimulate growth, and embrace the latest innovations in artificial intelligence, machine learning, multi-modal data processing, digital content transformation methodologies, semantic representation, linguistic analysis of historical records, and the use of haptics interfaces to effectively and efficiently communicate new experiences to all citizens (including people with disabilities).

causAI: AI Interaktionsoptimierung bei Videoanrufen im Vertrieb (#03EGSBY853)
BMWi (Federal Ministry for Economic Affairs and Energy) EXIST Business Start-up Grant

Runtime: tba
Role: Mentor
Partners: University of Augsburg

causAI analysiert die Sprache, Gestik und Mimik von vertrieblichen Videoanrufen mithilfe von künstlicher Intelligenz, um die digitale Vertriebskompetenz zu verbessern. Ziel ist es, causAI als innovatives Softwareprodukt für Vertriebsgesprächsunterstützung und -schulung im Vertrieb zu etablieren.

AUDI0NOMOUS: Agentenbasierte, Interaktive, Tiefe 0-shot-learning-Netzwerke zur Optimierung von Ontologischem Klangverständnis in Maschinen
(Agent-based Unsupervised Deep Interactive 0-shot-learning Networks Optimising Machines’ Ontological Understanding of Sound) (# 442218748)
DFG (German Research Foundation) Reinhart Koselleck-Projekt
Runtime: 01.01.2021 – 30.06.2026
Role: Principal Investigator, Co-author Proposal
Partners: University of Augsburg, TUM

Soundscapes are a component of our everyday acoustic environment; we are always surrounded by sounds, we react to them, as well as creating them. While computer audition, the understanding of audio by machines, has primarily been driven through the analysis of speech, the understanding of soundscapes has received comparatively little attention. AUDI0NOMOUS, a long-term project based on artificial intelligent systems, aims to achieve a major breakthroughs in analysis, categorisation, and understanding of real-life soundscapes. A novel approach, based around the development of four highly cooperative and interactive intelligent agents, is proposed herein to achieve this highly ambitious goal. Each agent will autonomously infer a deep and holistic comprehension of sound. A Curious Agent will collect unique data from web sources and social media; an Audio Decomposition Agent will decompose overlapped sounds; a Learning Agent will recognise an unlimited number of unlabelled sound; and, an Ontology Agent will translate the soundscapes into verbal ontologies. AUDI0NOMOUS will open up an entirely new dimension of comprehensive audio understanding; such knowledge will have a high and broad impact in disciplines of both the sciences and humanities, promoting advancements in health care, robotics, and smart devices and cities, amongst many others.

Ready.

Current Calls