Odyssey 2026

Special Sessions

General information for authors

Odyssey’2026 will feature four special sessions on topics closely related to the main theme of the workshop. The following aspects must be considered by potential contributors:

  • Contributions to Special Sessions follow the same format and will be go through the same peer-review process as regular contributions.
  • For a Special Session to take place as such, a minimum number of accepted papers is required. If only a small number of submissions is accepted, those papers will be incorporated into the main technical programme as regular contributions.
  • The final format of each Special Session will depend on the number of accepted contributions and on the time slots and rooms available at the venue.
  • In the CMT submission system, each Special Session is associated with a specific Subject Area. Authors wishing to submit to a Special Session must select that Subject Area at submission time.

Accepted Odyssey’2026 Special Sessions

6.01 – Applications of Speech and Language Technologies in Healthcare

This special session at Odyssey 2026 aims to bring together interdisciplinary research on speech and language technologies for healthcare and assistive communication. The session focuses on clinically grounded and deployable solutions, including assistive speech technologies, speech-based diagnosis and monitoring, and emerging communication paradigms such as silent speech interfaces, with an emphasis on inclusivity, personalisation, and real- world impact.

6.02 – Model fairness meets source tracing: Toward trustworthy AI for manipulated speech attribution

Source tracing is the digital forensic process of attributing synthetic or manipulated audio to its generative origin. It seeks to answer a critical question: “Which specific source (TTS/VC) system created this audio deepfake?” By identifying the source, whether it be Vendor A, Vendor B, or an open-source model-platform, providers and authorities can take decisive action, such as closing malicious accounts, tracking the spread of coordinated disinformation campaigns, and improving the accountability of generative AI.
However, many current models rely on “shortcuts”-spurious correlations like speaker identity, language, or recording conditions. While these models may perform perfectly in laboratory environments, they often fail in real-world scenarios and do not generalize to new conditions or unseen generative models. This special session explicitly addresses these challenges. We aim to foster the development of fair and robust attribution methods that capture genuine system-specific fingerprints and demonstrate real-world generalization.

6.03 – NIST SRE24 Deeper Analysis

The US National Institute of Standards and Technology (NIST) has been hosting evaluations of speaker recognition technology since 1996. The most recent NIST Speaker Recognition Evaluation (SRE) was held in 2024, with an evaluation workshop in December 2024. As is often the case, only preliminary results were available at the time of the workshop, and many of the subsequent publications have been summary in nature. This special session is intended to encourage papers that look more deeply at the evaluation, including the data and the results. We believe that the SRE24 data is rich enough to fuel deeper research, and want participants to feel free to go beyond tweaking parameters and adding yet another system to their fusion. New analysis is preferable, but not necessary and topics of interest include, but are not limited to: cross-lingual speaker recognition, mismatched data source performance effects (e.g., telephony vs audio from video), and enrollment duration performance effects.

6.04. TidyLR Challenge: Speaker-Controlled Language Recognition

This proposal introduces Speaker-Controlled and Zero-Shot Language Recognition: a Special Session and community challenge that targets language recognition in the realistic regime where each speaker contributes speech in multiple languages and models are tested on unseen languages. The central scientific question is: How can we develop language recognition systems that disentangle speaker identity from linguistic structure to ensure robust generalization across individuals and languages?