Research

Recent conference papers and publications

Image Description
Research

Highlights from Our Research Team

For a number of years, VBG has maintained an active relationship with a Tier 1 Research University. Through VBG's sponsorship of PhD students, we are able to work together to explore topics of interest in artificial intelligence and deep learning, machine learning, natural language processing, and digital signal processing among others. Not all research is commercially focused, but many projects do have commercial implications. Here we highlight some of the research team's recent work that VBG is adapting for its products.

Abstract

Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion. Recently, researchers developed anti-spoofing techniques to improve the reliability of ASV systems against spoofing attacks. However, most methods encounter difficulties in detecting unknown attacks in practical use, which often have different statistical distributions from known attacks. In this work, we propose an anti-spoofing system to detect unknown logical access attacks (i.e., synthetic speech) using one-class learning. The key idea is to compact the genuine speech representation and inject an angular margin to separate the spoofing attacks in the embedding space. Our system achieves an equal error rate of 2.19% on the evaluation set of ASVspoof 2019 Challenge, outperforming all existing single systems.

Link:

https://arxiv.org/abs/2010.13995

Primary Researcher:
Image Description

Abstract

State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies of speech utterances as input features. With the ability of deep neural networks to learn representations from raw data, recent studies attempted to extract speaker embeddings directly from raw waveforms and showed competitive results. In this paper, we propose a new speaker embedding called raw-x-vector for speaker verification in the time domain, combining a multi-scale waveform encoder and an x-vector network architecture. We show that the proposed approach outperforms existing raw-waveform-based speaker verification systems by a large margin. We also show that the proposed multi-scale encoder improves over single-scale encoders for both the proposed system and another state-of-the-art raw-waveform-based speaker verification systems. A further analysis of the learned filters shows that the multi-scale encoder focuses on different frequency bands at its different scales while resulting in a more flat overall frequency response than any of the single-scale counterparts.

Link:

https://arxiv.org/abs/2010.12951

Primary Researcher:
Image Description

Abstract

Commercial speaker verification systems are an important component in security services for various domains, such as law enforcement, government, and finance. These systems are sensitive to noise present in the input signal, which leads to inaccurate verification results and hence security breaches. Traditional speech enhancement (SE) methods have been employed to improve the performance of speaker verification systems. However, to the best of our knowledge, the impact of state-of-the-art speech enhancement techniques has not been analyzed for text-independent automatic speaker verification (ASV) systems using real-world utterances. In this work, our contribution is twofold. First, we propose two deep neural network (DNN) architectures for SE, and we compare the performance of the proposed networks with the existing work. We evaluate the resulting SE networks using the objective measures of perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). Second, we analyze the performance of ASV systems when SE methods are used as front-end processing to remove the non-stationary background noise. We compare the resulting equal error rate (EER) using our DNN based SE approaches, as well as existing SE approaches, with real customer data and the freely available RedDots dataset. Our results show that our DNN based SE approaches provide benefits for speaker verification performance.

Contact Us

Do You Have Any Questions?

Please let us know how we can help you and we'll respond promptly!

Image Description