Call for Papers – Odyssey 2026

Odyssey 2026 invites you to submit your papers for the upcoming workshop in Lisbon! We encourage paper submissions in speaker and language characterization aligned with the 2026 workshop’s theme “Speech beyond words: Trustworthy Identity, Health, Emotion and more”. We aim to move beyond traditional speaker and language recognition, addressing disparities that affect individuals with diverse accents, linguistic backgrounds, or speech patterns, and also considering emotional states, physical traits, speech-affecting conditions, and more, including privacy and security concerns.

The research topics include but are not limited to:

Core Speaker and Language Technologies
Fundamental research areas that has been at the heart of Odyssey
- Speaker and Language Recognition: Identification, verification, and characterization of speakers, languages, dialects, and accents.
- Speech Signal Representation: Deep learning-based embeddings, representation learning, and feature extraction for speaker and language tasks.
- Spoken Language Dynamics: Multi-speaker segmentation, detection, and diarization; analysis of overlapping and conversational speech.
- Speech Separation and Source Localization: Methods for isolating individual speech streams from a mixture and locating sound sources in space.
- Adversarial and Countermeasures: Spoofing and presentation attacks, deepfake detection, robustness to adversarial attacks, and voice biometrics security.
- System Design and Evaluation: Confidence estimation, system calibration, fusion techniques, and the development of new corpora and evaluation methodologies.
Speech for Health, Emotion, and Interaction
Connection of speech processing to applied fields such as healthcare, psychology, and human-computer interaction
- Speech as a Biomarker: Pathological speech characterization, analysis of speech for mental state assessment, and acoustic-based diagnostics for health conditions.
- Emotion and Paralinguistic Recognition: Emotion, sentiment, and intent recognition from speech; analysis of non-verbal vocal cues.
- Multimodal and Cross-domain Applications: Integration of speech with other modalities (e.g., text, video, physiological data) for multi-modal emotion analysis, human-robot interaction, and context-aware systems.
Robustness and Generalization
Practical challenges of deploying speech systems in real-world scenarios
- Domain Adaptation and Generalization: Techniques for adapting models to new channels, acoustic environments, and low-resource scenarios.
- Speech Privacy, Anonymization, and Data Protection: Methods for anonymizing speaker identity and other sensitive attributes from speech to ensure privacy.
- Uncertainty and Fairness: Addressing equity and fairness issues, bias detection, and creating robust systems for diverse populations and languages.
- Low-Resource and Unsupervised Learning: Methods for training effective models with limited or lightly supervised data, including self-supervised learning techniques.
Generative Models and Speech Synthesis
Cutting-edge intersection of speech processing with generative AI.
- Speech and Voice Generation: Text-to-speech (TTS), controllable speech synthesis (prosody, emotion, style), and voice cloning.
- Speaker Transformation: Voice conversion, cross-lingual voice transfer, and speaker anonymization techniques for privacy protection.
- Generative AI in the Wild: The ethical implications, detection, and countermeasures for audio deepfakes and manipulated speech.
Foundational and Interdisciplinary Topics
Foundational research that pushes the boundaries of the field by drawing on insights from computer science, linguistics, neuroscience and other related fields
- Large-scale Speech Models: Application of self-supervised, pre-trained, and foundation models for speaker and language tasks.
- Interpretability and Explainable AI (XAI): Understanding and visualizing model decisions in speech processing systems to ensure transparency and trust.
- Speech in the Wild: Forensic and investigative speaker analysis, speech processing in multimedia content, and audio event detection.
- Human-in-the-Loop Systems: Human and human-assisted recognition, Active Learning, and systems that leverage both human expertise and machine intelligence.