Sunday, 19 June 2022

An Overview of ASVspoof Challenge - A Community Led Effort towards Voice Spoofing Detection Research

Dr. Bhusan Chettri earned his PhD in AI and Speech Technology from Queen Mary University of London. His research focussed on analysis and design of voice spoofing detection using machine learning and AI. In this article Bhusan Chettri gives an insight on ASVspoof following his experience as a participant in two consecutive ASVspoof challenges held in 2017 and 2019 edition. See this for related publications and this for his PhD research work.

It is well acknowledged how vulnerable today’s Automatic Speaker Verification (ASV) systems, trained on vast amount of speech data using complex deep learning algorithms, are. To address the issue, the ASV community came up with the idea of promoting the research in spoofing detection by providing common evaluation protocols (which enables fair comparison of research findings), free spoofing datasets by organising a bi-annual research challenge so called automatic speaker verification and spoofing countermeasures ASVspoof challenge. See the official website for further details here.

ASVspoof is an ASV community driven effort promoting research in developing anti-spoofing algorithms for secure voice biometrics. A number of independent research studies had confirmed the vulnerability of voice biometrics to spoofing attacks, before the ASVspoof series began in 2015. However, these studies were mostly performed on small in-house datasets comprising limited speakers and spoofing attack conditions. Therefore, research results were hard to reproduce and understanding the true generalisability of the reported anti-spoofing solutions in unseen attack conditions was difficult. The main motivation of the ASVspoof series was to overcome these issues by organizing open spoofing challenge evaluations, promoting awareness of the problem, making publicly available spoofing corpora comprising sufficiently varying attack conditions with standard evaluation protocols, and further ensuring transparent research leading to reproducible results.

The first ASVspoof challenge held in 2015 focused on the detection of artificial speech generated using either speech synthesis (TTS) or voice conversion (VC) algorithms in a text-independent setting. Clean speech recorded using high quality microphones was used as bonafide speech and seven VC and three TTS algorithms were used to produce spoofed speech. The second edition of the ASVspoof challenge held in 2017 focussed on text-dependent replay spoofing attack detection. The 2019 edition, ASVspoof 2019, combined both TTS, VC and replay attacks together, using advanced state-of-the art spoofing algorithms and methods to generate spoofed speech samples. The recent edition held in 2021, ASVspoof 2021 used both LA and PA attacks but this edition also added a new track: Audio Fake Detection challenge. In the 2021 edition, the training and development data were not provided to the challenge participants. They were required to use the ASVspoof 2019 edition training and development datasets to train and tune their anti-spoofing systems. This edition only provided the fresh new evaluation set to evaluate the models.

One key observation that is worth noting from the three ASVspoof challenges is the paradigm shift in the use of modelling approaches for spoofing detection. Gaussian mixture models (GMMs), which is a generative model, were popular during the first ASVspoof challenge in 2015 as evident from the winning system of this challenge which is a GMM-based system. However, the 2017 and 2019 spoofing challenges were mostly dominated by data-driven discriminatively trained deep models. The main task, however, in all the three editions of the ASVspoof challenge was to build a standalone countermeasure model (anti-spoofing algorithm) that determines if a given speech recording is bonafide or a fake recording (spoofed). As for the performance evaluation, the equal error rate (EER) was used as a primary metric in the 2015 and 2017 edition. As for the 2019 edition, a recently introduced tandem detection cost function (t-DCF) metric [Kinnunen et al., 2018] was used as a primary metric and EER as the secondary metric.

Thanks to the ASV community, we don't have to worry about putting our own money in purchasing the spoofing datasets. These are made public and can be downloaded at no cost. The ASVspoof 2017 dataset is the first publicly available replay spoofing dataset designed by playing back bonafide audio utterances and re-recording them in real ‘wild’ acoustic conditions. It has been widely used by researchers around the globe. It has two data versions: 1.0 and 2.0. The version 1.0 was used during the ASVspoof 2017 evaluation. Post evaluation due to biases found in the dataset, a corrected version was released by the ASVspoof organisers.

Datasets and metrics: Speaking about replay attack anti-spoofing datasets, Bhusan Chettri explains that the ASVspoof 2017 dataset is the first publicly available replay spoofing dataset designed by playing back bonafide audio utterances and re-recording them in real ‘wild’ acoustic conditions. It has been extensively used in research since its release in 2017 edition of ASVspoof series. The bonafide utterances were taken from a subset of RedDots dataset – which is a dataset for speaker verification collected under wild varied acoustic conditions.

The ASVspoof 2017 dataset has two different versions: version 1.0 and version 2.0. The version 1.0 was used during the official challenge evaluation in 2017. Post evaluation data anomalies were identified that showed biased model decisions, which eventually led to the release of version 2.0 dataset. The 2019 edition combined both the replay spoofing attacks (Physical access - PA) and text-to-speech and voice-conversion attack conditions (so called Logical access – LA) and released the LA and PA datasets respectively. Also, post 2019 challenge evaluation a real replayed utterances - a small subset of real replayed speech utterances were also made publicly available to perform research on replay spoofing attacks. During the latest edition of ASVspoof evaluation, the ASVspoof 2021, no training data were released to the challenge participants. The participants were required to use the ASVspoof 2019 training and development dataset to train and tune their anti-spoofing model parameters. Only a fresh set of evaluation set was released to the participants. For more details on this please see this.

Equal error rate metric (EER) was the primary (and the only metric) used to evaluate anti-spoofing performance during the 2015 and 2017 ASVspoof evaluation. However, in the 2019 edition EER was the secondary metric used where a new metric called tandem detection cost function that jointly optimises the performance of ASV and anti-spoofing system was used to evaluate the model performances of the challenge participants.

In the next article, Bhusan Chettri will be discussing more about different corpuses and the evaluation metrics used in voice anti-spoofing research.

References

[1] Bhusan Chettri scholar  

[2] M. Sahidullah et. al. Introduction to Voice Presentation Attack Detection and Recent Advances, 2019.

[3]. Bhusan Chettri. Voice biometric system security: Design and analysis of countermeasures for replay attacks. PhD thesis, Queen Mary University of London, August 2020.

[4] ASVspoof: The automatic speaker verification spoofing and countermeasures challenge website.

Bhusan Chettri London  | Bhusan Chettri Queen Mary University of London  | Dr. Bhusan Chettri | Bhusan Chettri social | Bhusan Chettri Research

Wednesday, 8 June 2022

Bhusan Chettri Visits a Child Orphanage Home

 

Bhusan Chettri is a professional personality, who has made India feel proud as by contributing massively to the research based on sound and speech technology. Coming from a small town in the South Sikkim district, his journey was not easy. He overcame challenges more than one and did not allow them impede his journey from a small hilly deprived location in India to London, UK where he completed his PhD in one of the most prestigious research universities in the world. This was all made possible because of his determination and indomitable spirit attitude throughout. His kind personality, positive attitude, and professionalism have made him a successful personality. Recently, he is working as a researcher in AI and Machine learning for audio and speech.

Having gained a lot of reputation in the professional industry, he believes it’s time to give back to the    society, sharing happiness and love towards needy people. This is why he started investing some of his valued time for the welfare of the society through volunteering activities. He has opted for visiting a child orphanage home. 

Bhusan Chettri visited a child orphanage home in Assam. This child orphanage home accommodates children of different ages who either have lost their families in natural calamities or have no family at all. He is a guy, who can deeply understand the emotions and needs of children who have no parents. He knows how hard it is to spend life without loved ones. This feeling and emotion convinced him to visit this place.

He felt very lucky and blessed to invest his time in between such lovely and needy children. He helped them by donating some essential supplies and food items to more than 25 children. There are both boys and girls ranging from as small as 3 years up until 18 years accommodated in this child shelter home. He was contented to see them live together like a family. They have created their own little world within the boundaries of this home. After getting gifts from Bhusan Chettri, they all had big smiles on their faces. They thanked him for this sweet gesture of love and care. They also liked to play and shared some precious moments with him. This one-day visit to a child orphanage home by him was respected.

Not only this, Bhusan Chettri also had a thorough discussion with the management of the place to understand the working and organization. He further investigated the immediate and future needs of the children and ensured every help possible from his end towards joyful living and healthy growth and development of the kids. Furthermore he looks forward to doing something profound for such helpless people including children, youngsters, or senior citizens who are ill-fated. He is planning to take a part in many other social activities in the future and contribute to the development of the society, as a result, assisting the country to grow in every sense.

References

[1] Tumblr: https://chettribhusan.tumblr.com/post/685306610792464384/bhusan-chettri-visits-old-age-home-with-a-great

[2] Diigo: https://www.diigo.com/item/image/9gcf9/tcke

Bhusan Chettri London | Bhusan Chettri Queen Mary University of London | Bhusan Chettri PhD | Dr. Bhusan Chettri | Bhusan Chettri Research

Unveiling the Clever Hans Effect in Audio Deepfakes: A Deep Dive by Bhusan Chettri

Unveiling the Clever Hans Effect in Audio Deepfakes: A Deep Dive by Bhusan Chettri : Unveiling the Clever Hans Effect in Audio Deepfakes: A ...