See Program at a glance here
Workshop Schedule
Tuesday 23rd
- 08:00 – 09:00Registration
- 09:00 – 09:30Opening
- 09:30 – 10:30Keynote 1: Rigorous Forensic Automatic Speaker Recognition: Bayesian Decision Theory, Probabilistic Calibration and Case-Specific Validation – Daniel Ramos
- 10:30 – 11:00Coffee Break
-
11:00 – 12:20Oral Presentations 1.1 – Deepfake and Spoofing Detection
Session Chair: TBD
-
11:00
An Intervention-Based Framework for Shortcut Diagnosis in Spoofing Countermeasures
Santiago Rubio (Universidad de Zaragoza)*; Pilar Bello (BTS, Business Telecommunications Services ); Dayana Ribas (BTS, Business Telecommunications Services ); Antonio Miguel (Universidad de Zaragoza); Eduardo Lleida (Universidad de Zaragoza); Alfonso Ortega (Universidad de Zaragoza)
-
11:20
Domain Adaptation for Deepfake Audio Detection under Degraded Channel Conditions
Ayuto Tsutsumi (Tokyo Metropolitan University)*; Akira Gotoh (NEC Corporation); Yuko Saito (NEC Corporation); Hiroki Matsuura (NEC Corporation); Sayaka Shiota (Tokyo Metropolitan University)
-
11:40
Speaker-Invariant Representation Learning for Spoofing Detection via Gradient Reversal and Information Bottleneck
Anh-Tuan DAO (LIA)*; Driss Matrouf (LIA); Mickael Rouvier (LIA); Nicholas Evans (EURECOM)
-
12:00
Large-Kernel 1D CNN for Raw Waveform Spoofing Countermeasures
Guy Perets (Ben Gurion University of the Negev)*; Yehuda Ben-Shimol (Ben Gurion University of the Negev); Itshak Lapidot (Afeka, Tel-Aviv College of Engineering)
- 12:20 – 13:45Lunch
-
13:45 – 15:30Early-Stage Researcher Symposium (ESRS) – Poster Session
Session Chair: TBD
-
Why Do You Say It Like That? A Phoneme-Level Framework for Explainable Speech Deepfake Detection
Anna Taylor (EURECOM)*; Michele Panariello (EURECOM); Massimiliano Todisco (EURECOM); Chiara Galdi (EURECOM); Nicholas Evans (EURECOM)
-
Interpreting SSL Representations for Spoof Detection: a WavLM Study
Mallat Mohamed (Eurecom)*; Michele Panariello (EURECOM); Massimiliano TODISCO (Eurecom); Nicholas EVANSS (Eurecom); Anthony LARCHER (LIUM)
-
Identity Disambiguation in Common Voice: Enabling Fairness Evaluation Across Demographic Subgroups
Chenyi Lin (Aalto University)*; D¯avis ˇSterns (Aalto University); Tom B¨ackstr¨om (Aalto University); Nicholas Evans (EURECOM)
-
Machine-Learning Benchmarking of Voice-Based Biomarkers for Parkinson’s Disease
Xiaowen Luo (Maastricht University)*; Ryszard Auksztulewicz (Maastricht University); Sonja Kotz (Maastricht University)
-
The Role of Voice Source and Filter in Speech Emotion Recognition
Yuhan Huang (University College London)*; Josef Schlittenlacher (University College London); Chris Carignan (University College London)
-
Transparent Exchange of Speaker Attributes
Jiusi Zheng (Radboud University)*; Martha Larson (Radboud University); Tom Bäckström (Aalto University)
-
Limitations of WER for Intelligibility Evaluation in Speech Anonymization
Victor Ménestrel (Technische Universität Berlin)*
-
Controllable Voice Anonymization for Privacy-Preserving Disease Detection from Speech
“Ben Luks (INESC-ID/Instituto Superior Técnico, University of Lisbon, Technical University of Berlin)*; Francisco Teixeira (INESC-ID); Alberto Abad ( INESC-ID/Instituto Superior Técnico, University of Lisbon); Isabel Trancoso (INESC-ID)”
-
Challenges in Multi-Speaker Privacy
Anastasiia Korenevskaia (Radboud University)*
-
Studying Voice Privacy Risks with Side Information through Partially Synthetic Data
Eulalie Thiombiano (Radboud University)*; Martha Larson (Radboud University); Vincent Colotte (Université de Lorraine); Emmanuel Vincent (Université de Lorraine)
-
Challenges in Protection against Deepfakes in Speech
Priyanshi Pal (Aalto University)*; Lauri Juvela (Aalto University); Isabel Trancoso (INESC-ID, IST); Alberto Abad (INESC-ID, IST)
-
Linkage-Based Adversarial Framework for Voice Privacy Evaluation
Dāvis Šterns (Aalto University)*; Tom Bäckström (Aalto University); Catuscia Palamidessi (INRIA); Natasha Fernandes (Macquarie University); Konstantinos Drosos (Nokia)
-
Why Voice Privacy Researchers Should Worry About Attribute Inference?
Mehtab Rahman (Radboud University)*; Eulalie Thiombiano (Radboud University); Martha Larson (Radboud University)
-
How Bilingual Are SSL Speech Models? Cross-Lingual Probing of Articulatory Encoding with Finnish and Russian EMA
Ailín Pollio (University of Eastern Finland)*; Tomi Kinnunen (University Of Eastern Finland ); Alexandre Nikolaev (University Of Eastern Finland); Ruchi Pandey (University of Eastern Finland)
- 15:30 – 15:50Coffee Break
-
15:50 – 16:50Oral Presentations 1.2 – Privacy-Aware Speech Processing and Watermarking
Session Chair: TBD
-
15:50
Analysis of embedding-based emotional preservation metrics for voice conversion models
Théo Nguyen (Aalto University)*; Tom Bäckström (Aalto University); Rainer Martin (Ruhr-Universität Bochum)
-
16:10
Latent Secret Spin: Keyed Orthogonal Rotations for Blind Speech Watermarking in Anisotropic Latent Spaces
Emma Coletta (EURECOM)*; Massimiliano Todisco (EURECOM); Michele Panariello (EURECOM); Antonio Faonio (EURECOM); Nicholas Evans (EURECOM)
-
16:30
Sensitive Speaker Attribute Leakage in Speech–LLM Pipelines
Siavosh Sepanta (Fondazione Bruno Kessler)*; Alessio Brutti (Fondazione Bruno Kessler)
-
16:50 – 17:50Oral Presentations 1.3 – Tools and Methods for Speaker Verification
Session Chair: TBD
-
16:50
Kiwano: A Cutting-Edge Open-Source Toolkit for Speaker Verification
Mickael Rouvier (LIA – Avignon University)*; Pierre Michel Bousquet (LIA – Avignon University)
-
17:10
Beyond CosFace: Analysing Sparsity-Inducing Losses in Speaker Verification
Ladislav Mosner (Brno University of Technology)*; Dimitrios Koutsianos (Athens University of Economics and Business); Themos Stafylakis (Omilia)
-
17:30
FM-SEE: Flow Matching-based Generative Model For Speaker Embedding Enhancement
Sergey Novoselov (ITMO University)*; Vladimir Volokhov (STC-Innovations, ITMO University); Nikita Khmelev (STC-Innovations, ITMO University); Anikin Alexandr (STC-Innovations, ITMO University); Anastasia Zorkina (STC-Innovations, ITMO University); Anastasia Korenevskaya (ITMO University)
- 18:30 – 20:00Welcome Reception
Wednesday 24th
-
08:30 – 09:30Oral Presentations 2.1 – Diarization
Session Chair: TBD
-
08:30
Scaling self-supervised pretraining for speaker diarization
Antoine Laurent (pyannoteAI)*; Joonas Kalda (pyannoteAI); Hervé Bredin (pyannoteAI)
-
08:50
Adapting Speaker Diarization to Code-Switched Medical Conversations: AUDIAS-UAM at the DISPLACE-M Challenge
Sara Barahona (AUDIAS Research Group, Universidad Autónoma de Madrid)*; Laura Herrera-Alarcón (AUDIAS Research Group, Universidad Autónoma de Madrid); Juan-Ignacio Alvarez-Trejos (AUDIAS Research Group, Universidad Autónoma de Madrid); Alicia Lozano-Diez (AUDIAS Research Group, Universidad Autónoma de Madrid)
-
09:10
Augmented State Space Speaker Clustering: Reformulating HMM Based Clustering To Improve Speaker Diarization
Anurag Chowdhury (Solventum)*; Abhinav Misra (Solventum); Yinong Wang (Solventum); Bongjun Kim (Solventum); Mark Fuhs (Solventum); Monika Woszczyna (Solventum)
- 09:30 – 10:30Keynote 2: Genetic information in human voice: how much do we know today and how much more will technology uncover? – Rita Singh
- 10:30 – 11:00Coffee Break
-
11:00 – 12:20Oral Presentations 2.2 – Speech Privacy and Anonymization
Session Chair: TBD
-
11:00
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for Audio
Tu Duyen Nguyen (Callyope); Adrien Lesage (Callyope); Clotilde Cantini (Callyope); Rachid Riad (Callyope)*
-
11:20
Privacy in Spoken Interaction: An Overview of Inferable Attributes
Eline Bijmold (Radboud University); Anastasiia Korenevskaia (Radboud University)*; Martha Larson (Radboud University)
-
11:40
Joint Timbral and Non-Timbral Speaker Anonymisation
Rayane Bakari (Orange )*; Olivier Le Blouch (Orange); Nicolas Gengembre (Orange); Nicholas Evans (Eurecom)
-
12:00
Evaluating voice anonymisation using similarity rank disclosure
Shilpa Chandra (EURECOM)*; Matteo Petteno (EURECOM); Michele Panariello (EURECOM); Nicholas Evans (EURECOM); Massimiliano Todisco (EURECOM); Tom Bäckström (Aalto University); Dorothea Kolossa (Technische Universität Berlin); Rainer Martin (Ruhr-Universität Bochum); Themos Stafylakis (Omilia); Nicolas Gengembre (Orange)
- 12:20 – 13:45Lunch
-
13:45 – 15:30Special Sessions – Oral Overviews (10 min. each) and parallel Poster Session
-
13:45 [SS1]
Special Session on Speech and Language Technologies in Healthcare – Oral Overview
J.A. Gonzalez-Lopez (Univ. of Granada)
-
13:55 [SS2]
Special Session on Model Fairness Meets Source Tracing: Toward Trustworthy AI for Manipulated Speech Attribution – Oral Overview
Nicolas Müller (Fraunhofer AISEC / Resemble AI)
-
14:05 [SS3]
Special Session on NIST SRE24 Deeper Analysis – Oral Overview
Craig Greenberg (NIST)
-
14:15 [SS4]
Special Session on TidyLang Challenge: Speaker-Controlled Language Recognition – Oral Overview
Aref Farhadipour (University of Zurich)
-
[SS1] MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
Xabier de Zuazo (HiTZ Center, University of the Basque Country – UPV/EHU)*; Ibon Saratxaga (HiTZ Center, University of the Basque Country – UPV/EHU); Eva Navas (HiTZ Center, University of the Basque Country – UPV/EHU)
-
[SS1] Adaptive Phone-Wise Weighted Loss for Silent Speech Restoration in Continuous Spanish
“Eder del Blanco Sierra (University of the Basque Country (UPV/EHU))*; David Gimeno-Gómez (Universitat Politècnica de València); Ibon Saratxaga ( University of the Basque Country (UPV/EHU)); Eva Navas (University of the Basque Country (UPV/EHU)); Inma Hernáez (University of the Basque Country (UPV/EHU))”
-
[SS1] Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring
Jacob Webber (SpeakUnique); Oliver Watts (SpeakUnique); Lovisa Wihlborg (SpeakUnique); Johnny Tam (Anne Rowling Regenerative Neurology Clinic, University of Edinburgh); Christine Weaver (Anne Rowling Regenerative Neurology Clinic, University of Edinburgh); Suvankar Pal (Anne Rowling Regenerative Neurology Clinic, University of Edinburgh); Siddharthan Chandran (Anne Rowling Regenerative Neurology Clinic, University of Edinburgh); Cassia Valentini (SpeakUnique)*
-
[SS1] Rapid Calibration for Cross-Subject Imagined Speech Decoding Toward Restoring Communication
Sanae Belfrouh (National School of Applied Sciences, University of Chouaib Doukkali)*; Rahhal Errattahi (National School of Applied Sciences, University of Chouaib Doukkali); Fatima zahra Salmam (National School of Applied Sciences, University of Chouaib Doukkali)
-
[SS1] Deep learning based analysis of spontaneous speech for diagnostic classification and biomarker prediction in Alzheimer’s disease and primary progressive aphasia
Roger Esteve (Universitat Politecnica de Catalunya)*; Pilar Armas (Universitat Politècnica de Catalunya and Sant Pau Memory Unit, IR SANT PAU, Hospital de la Santa Creu i Sant Pau); Marc Casals-Salvador (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya); Miguel A Santos-Santos (Sant Pau Memory Unit, IR SANT PAU, Hospital de la Santa Creu i Sant Pau); Alexandre Bejanin (Sant Pau Memory Unit, IR SANT PAU, Hospital de la Santa Creu i Sant Pau); Javier Hernando (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya)
-
[SS1] Vocal markers of Turner syndrome: a preliminary analysis of sustained vowel recordings
Marc Freixes (HER Human Environment Research Group, La Salle – URL)*; Jordi Sanz (HER Human Environment Research Group, La Salle – URL); Joan Claudi Socoró (HER Human Environment Research Group, La Salle – URL); Jordi Margalef (HER Human Environment Research Group, La Salle – URL); Isabella Monlleó (Universidade Federal de Alagoas); Debora Michelatto (Universidade Federal de Alagoas); Francesc Alías-Pujol (HER Human Environment Research Group, La Salle – URL); Neus Martínez-Abadías (Universitat de Barcelona); Xavier Sevillano (HER Human Environment Research Group, La Salle – URL)
-
[SS2] The Effect of Telephony Transmission on Source Tracing of Audio Deepfakes
Nicholas Klein (Pindrop Security)*; Hemlata Tak (Pindrop Security); Nikolay Gaubitch (Pindrop Security); David Looney (Pindrop Security); Tianxiang Chen (Pindrop Security); Elie Khoury (Pindrop Security)
-
[SS2] Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
Manasi Chhibber (University of Eastern Finland); Jagabandhu Mishra (University of Eastern Finland)*; Tomi Kinnunen (University of Eastern Finland)
-
[SS3] I4U’s Official and Streamlined Audio Systems for NIST SRE24
Daniele Colibro (Microsoft)*; Claudio Vair (Microsoft); Youzhi Tu (The Hong Kong Polytechnic University); Junjie Li (The Hong Kong Polytechnic University); Zilong Huang (The Hong Kong Polytechnic University); Yijia Chen (The Hong Kong Polytechnic University); Kong Aik Lee (The Hong Kong Polytechnic University); Man-Wai Mak (The Hong Kong Polytechnic University); Jagabandhu Mishra (School of Computing, University of Eastern Finland); Vishwanath Singh (School of Computing, University of Eastern Finland); Xi Xuan (School of Computing, University of Eastern Finland); Manasi Chhibber (School of Computing, University of Eastern Finland); Oguzhan Kurnaz (School of Computing, University of Eastern Finland); Tomi Kinnunen (School of Computing, University of Eastern Finland); Suyeon Lee (Korea Advanced Institute of Science and Technology); Chaeyoung Jung (Korea Advanced Institute of Science and Technology); Kihyun Nam (Korea Advanced Institute of Science and Technology); Joon Son Chung (Korea Advanced Institute of Science and Technology); Shuai Wang (Nanjing University)
-
[SS3] Analysis of the NIST 2024 Speaker Recognition Evaluation
Elliot Singer (MIT Lincoln lab)*; Craig Greenberg (NIST); Lukas Diduch (NIST); Trang Nguyen (MIT Lincoln Lab); Lisa Mason (US Government); Beth Matys (US Government); Bob Dunn (MIT Lincoln Lab); Audrey Tong (NIST)
-
[SS4] Spoken Language Identification with Pre-trained Models and Margin Loss
Zhihua Fang (Xinjiang University)*; Liang He (Tsinghua University); Weiwu Jiang (AgiBot)
-
[SS4] Disentangled Speech Encoder: A Robust Encoder with Dynamic Adapter for Language Identification
Barathi Ganesh HB (Kitami Institute of Technology); Jairam R (Amrita Vishwa Vidyapeetham)*; Ptaszynski Michal (Kitami Institute of Technology); Reshma Unnikrishnan (Resilience Business Grids); Jyothish Lal G (Amrita Vishwa Vidyapeetham); Premjith B (Amrita Vishwa Vidyapeetham)
-
[SS4] LLM-Based Language Verification and Multimodal Ensemble for Spoken Language Recognition
Aivo Olev (Tallinn University of Technology)*; Tanel Alumäe (Tallinn University of Technology)
-
[SS4] Speaker-Aware Language Verification Based on Attentive Pooling, Mixture of Experts and Neural PLDA
Mikel Penagarikano (University of the Basque Country); Luis Javier Rodriguez-Fuentes (University of the Basque Country)*; Amparo Varona (University of the Basque Country); Germán Bordel (University of the Basque Country)
- 15:30 – 15:50Coffee Break
-
15:50 – 16:50Oral Presentations 2.3 – Optimization and Efficiency in Speaker Recognition
Session Chair: TBD
-
15:50
Harmonizing data augmentation and loss function for speaker recognition: examples with speed perturbation, mixup and mixout
Pierre-Michel Bousquet (Avignon University)*; Mickaël Rouvier (Avignon University)
-
16:10
Assessing the Energy and Carbon Emissions of Neural Speaker Verification Model in Training and Inference
Hugo Leguillier (LIA – Avignon University); Driss Matrouf (LIA – Avignon University); Guillaume Lechien (Aday); Mickael Rouvier (LIA – Avignon University)*
-
16:30
On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation
Hugo LEGUILLIER (LIA (Laboratoire informatique d’Avignon))*; Driss Matrouf (LIA (Laboratoire informatique d’Avignon)); Guillaume LECHIEN (ADAY); Mickael ROUVIER (LIA (Laboratoire informatique d’Avignon))
-
16:50 – 17:50Oral Presentations 2.4 – Biomarkers
Session Chair: TBD
-
16:50
SLAP: Learning Speaker and Health-Related Representations from Natural Language Supervision
Angelika Andò (Callyope); Auguste Crabeil (Callyope); Quentin Spinat (Callyope.com); Adrien Lesage (Callyope); Rachid Riad (Callyope)*
-
17:10
Speech Quality Embeddings for Improved Detection and Classification of Degradations in Speech Signals
Michael Kuhlmann (Paderborn University)*; Tobias Cord-Landwehr (Paderborn University); Reinhold Haeb-Umbach (Paderborn University)
-
17:30
Dysarthria Severity Classification on the HeyJay! Dataset: A Parameter-Efficient Approach Using Self-Supervised Speech Representations
Davide Lillini (Department of Information Engineering, Università Politecnica delle Marche)*; Thomas Thebaud (Department of Electrical and Computer Engineering, Johns Hopkins University); Lucia Migliorelli (Department of Political Science, Università degli Studi di Teramo); Najim Dehak (Department of Electrical and Computer Engineering, Johns Hopkins University); Stefano Squartini (Department of Information Engineering, Università Politecnica delle Marche); Laureano Moro Velazquez (Department of Electrical and Computer Engineering, Johns Hopkins University)
- 19:30 – 23:30Gala Dinner
Thursday 25th
-
08:30 – 09:30Oral Presentations 3.1 – Spoofing
Session Chair: TBD
-
08:30
A comparison of SSL-Based Feature Extractors and Back-End Classifiers for Spoofing Detection: A Multi-Corpus Training and Cross-Linguistic Analysis
Anh-Tuan DAO (LIA)*; Driss Matrouf (LIA); Mickael Rouvier (LIA); Nicholas Evans (Eurecom)
-
08:50
From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing
Hugo Daumain (LIA – Avignon University, Airbus Defence & Space)*; Driss Matrouf (LIA – Avignon University); Khaled Khelif (Airbus Defence & Space); Mickael Rouvier (LIA – Avignon University)
-
09:10
Can SSL Frontend Generalize to All-Type Audio Spoofing?
“Arnab Das (Deutsches Forschungszentrum für Künstliche Intelligenz)*; Yassine El Kheir ( Deutsches Forschungszentrum für Künstliche Intelligenz); Fabian Ritter Guttierez (Nanyang Technological University); Tim Polzehl (Deutsches Forschungszentrum für Künstliche Intelligenz); Sebastian Möller (TU Berlin)”
- 09:30 – 10:30Keynote 3: Every breath you take: From Vocal Chords to Health Scores – Björn Schuller
- 10:30 – 11:00Coffee Break
-
11:00 – 12:20Oral Presentations 3.2 – Backend and Generalization in Speaker Verification
Session Chair: TBD
-
11:00
Spherical-Gaussian TPSDA: combining PLDA, T-PSDA and duration models for speaker verification
Sandro Cumani (Politecnico di Torino)*
-
11:20
Condition-Aware System Fusion for Speaker Verification
Jonas Borgstrom (MIT Lincoln Laboratory)*
-
11:40
Towards Language-Agnostic Speaker Verification: A Cross-Lingual Transfer Study of Architectures
Pol Buitrago (Universitat Politècnica de Catalunya – Barcelona Supercomputing Center)*; Javier Hernando (Universitat Politècnica de Catalunya – Barcelona Supercomputing Center)
-
12:00
Subtract to Clean, Add to Enrich: Dual-Path Disentanglement for Speaker and Language Recognition
Aref Farhadipour (University of Zurich)*
- 12:20 – 13:45Lunch
-
14:00 – 19:30Tour to Cascais and Sintra
Friday 26th
-
08:30 – 09:30Oral Presentations 4.1 – Representation Learning in Speaker and Language
Session Chair: TBD
-
08:30
Functionnally-grounded evaluation of dimensional interpretability in sparse speaker representations
Félix Saget (LIUM)*; Nicolas Dugué (LIUM); Marie Tahon (LIUM); Anthony Larcher (LIUM)
-
08:50
Multi-Axis Speech Similarity via Factor-Partitioned Embeddings
Jim O’Regan (KTH Royal Institute of Technology)*; Jens Edlund (KTH Royal Institute of Technology)
-
09:10
Flow-Enhanced Language Embeddings for Robust Language Recognition
Tianyu Cao (Johns Hopkins University)*; Laureano Moro-Velazquez (Johns Hopkins University); Jesús Villalba (Johns Hopkins University); Thomas Thebaud (Johns Hopkins University); Najim Dehak (Johns Hopkins University)
- 09:30 – 10:30Keynote 4: From Single-Channel Foundations to Multi-Speaker and Multi-Modal Understanding – Lukáš Burget
- 10:30 – 11:00Coffee Break
-
11:00 – 12:20Oral Presentations 4.2 – Spoofing Detection and Robust ASV
Session Chair: TBD
-
11:00
Sparse deepfake detection promotes better disentanglement
Marie Tahon (LIUM)*; Antoine Tessier (LIUM); Nicolas Dugué (LIUM); Aghilas Sini (LIUM)
-
11:20
I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors
Lelia Erscoi (University of Eastern Finland)*; Tomi Kinnunen (University of Eastern Finland)
-
11:40
PLDA Scoring for Spoofing-Robust Automatic Speaker Verification
Shani Budilovsky (Ben Gurion University of the Negev)*; Yehuda Ben-Shimol (Ben Gurion University of the Negev); Itshak Lapidot (Afeka the Academic College of Engineering in Tel Aviv)
-
12:00
J-SPAW2: A Japanese Corpus for Speaker Verification and Anti-Spoofing with Challenging Replay and Speech Synthesis Attacks
Sayaka Shiota (Tokyo Metropolitan University)*; Suzuka Horie (Tokyo Metropolitan University); Sawato Furubayashi (Tokyo Metropolitan University); Shinnosuke Takamichi (Tokyo Metropolitan University)
- 12:20 – 13:00Closing Ceremony