Keynote Speakers - IberSPEECH 2022 Conference, Granada

Keynote Speakers at IberSPEECH 2022 are the following:

Day: November 14th

Massimiliano Todisco, EURECOM Digital Security Department, France.

Title of the talk: Secure and explainable voice biometrics

Abstract: Anti-spoofing for voice biometrics is now an established area of research, thanks to the four competitive ASVspoof challenges (the fifth is currently underway) that have taken place over the past decade. Growing research effort has invested, firstly, in the development of front-end representations that capture more reliably the tell-tale artefacts that are indicative of utterances generated with text-to-speech and voice conversion algorithms and, secondly, in the development of deep and end-to-end solutions. Despite enormous efforts and positive achievements, little is still known about the artefacts these recognisers use to identify spoofing utterances or distinguish between bona fide and spoofed. Although many unanswered questions remain, this talk aims to provide insights and inspirations, through examples, into the behaviour of voice anti-spoofing systems. Particular attention will be given to data augmentation and boosting methods that have been shown instrumental to reliability. The ultimate goal is to better understand these artefacts from a physical and perceptual point of view and how they are actually seen by automatic processes, which puts us in a better position to design more reliable countermeasures.

Bio: Massimiliano Todisco is a professor of audio and speech technologies at the EURECOM Digital Security Department in France. He received his PhD in Sensorial and Learning Systems Engineering from the University of Rome Tor Vergata in 2012. From 2012 to 2015 he was a postdoctoral researcher at Fondazione Ugo Bordoni and Tor Vergata University in Rome. From 2015 to 2020, he was a senior research fellow at EURECOM. Massimiliano is best known for contributions to fake audio detection. He is the inventor of the constant Q cepstral coefficients (CQCCs), the most used features and source of inspiration for many researchers in speech processing, speaker recognition and anti-spoofing field. For this work, he was honoured with the ISCA 2020 award for the best article published in the journal «Computer Speech and Language» during the quinquennium 2015-2019. He co-organises the ASVspoof challenge series, which is community-led challenges that promote the development of countermeasures to protect automatic speaker verification (ASV) from the threat of spoofing. He is currently principal investigator and coordinator of TReSPAsS-ETN, an EU Marie Skłodowska-Curie Innovative Training Network (ITN) project, and RESPECT, a project funded by the national research agencies of France and Germany. His current interests are in developing explainable DNN architectures for speech processing and speaker recognition, fake audio detection and anti-spoofing, and the development of privacy preservation algorithms for speech signals based on encryption solutions that support computation upon signals, templates and models in the encrypted domain.

Day: November 15th

Isabel Trancoso, INESC-ID / IST / University of Lisbon, Portugal

Title of the talk: Disease biomarkers in speech

Abstract: Speech encodes information about a plethora of diseases, which go beyond the so-called speech and language disorders, and include neurodegenerative diseases, such as Parkinson’s, Alzheimer’s, and Huntington’s disease, mood and anxiety-related diseases, such as Depression and Bipolar Disease, and diseases that concern respiratory organs such as the common Cold, or Obstructive Sleep Apnea. This talk addresses the potential of speech as a health biomarker which allows a non-invasive route to early diagnosis and monitoring of a range of conditions related to human physiology and cognition. The talk will also address the many challenges that lie ahead, namely in the context of an ageing population with frequent multimorbidity, and the need to build robust models that provide explanations compatible with clinical reasoning. That would be a major step towards a future where collecting speech samples for health screening may become as common as a blood test nowadays. Speech can indeed encode health information au par with many other characteristics that make it viewed as Personal Identifiable Information. The last part of this talk will briefly discuss the privacy issues that this enormous potential may entail.

Bio: Isabel Trancoso is a full professor at Instituto Superior Técnico (IST, Univ. Lisbon), the University where she got her PhD degree in 1987. She was the founder of the Human Language Technology Lab and the former President of the Scientific Council of INESC ID Lisbon. She chaired the ECE Department of IST, was Editor-in-Chief of the IEEE Transactions on Speech and Audio Processing and had many leadership roles in SPS (Signal Processing Society of IEEE) and ISCA (International Speech Communication Association), namely having been President of ISCA and Chair of the Fellow Evaluation Committees of both SPS and ISCA. She was elevated to IEEE Fellow in 2011, and to ISCA Fellow in 2014.

Day: November 16th

Simon Wiesler, Applied Science Manager at Amazon, Germany

Title of the talk: Utilizing context information in speech recognition for voice assistants

Abstract: Utilizing contextual information plays a key role in achieving accurate speech recognition in the voice assistant domain. Context can be available in a number of ways, such as the acoustic environment, conversational context, or personalized information about the user. Other sources of context information are trending content at the time the user is speaking and the speaker’s location. While there are established methods for utilizing some of this information in traditional statistical speech recognition systems, contextualizing all-neural speech recognition systems is an active area of research. In my talk, I will present ongoing research at Amazon Alexa on this problem and discuss some of the challenges.

Bio: Simon Wiesler is a Science Manager in Alexa ASR at Amazon. He received his PhD from RWTH Aachen University, Germany, in 2016. Prior to his PhD, he completed a Diploma degree in Mathematics from the University of Marburg, Germany. At Amazon, he manages a team of scientists, which develops current and future technology for the Alexa cloud speech recognition system. His current research interests include machine learning for far-field speech recognition and use of context information in speech recognition systems.