Considering Voice Biometrics?

Start here for an overview of key considerations.

Image Description

Is Voice Biometrics Right For You?

What is voice biometrics? The term "voice biometrics" typically refers to a sophisticated set of software programs that processes speech samples for two main purposes: (1) to extract the unique vocal characteristics from an individual's speech sample(s) to order to create a voiceprint, or (2) to compare a speech sample to one or more stored voiceprints in order to obtain a match probability. Once you have a voiceprint for someone, you can perform two useful functions:

  • If someone is claiming a specific identity and you have a stored voiceprint for them, you can confirm their identity by comparing the characteristics of a speech sample they provide to their stored voiceprint. This is called verification, but it is also called voice verification, speaker verification, or voice authentication. Conceptually, this is like using your voice as a password. Verification is most often used to protect whatever comes after the verification process.
  • If there is no specific identity claim, you can determine who the individual is by comparing a speech sample against all available voiceprints. This is called identification, but is also called speaker identification, or voice identification. This is useful for fraud detection, such as checking whether a person claiming to be a new customer is in fact a repeat customer or a person who previously committed fraud. The point of identification is most often to detect fraud within a system.

Assuming one or both of these functions makes conceptual sense for your business, next consider basic, practical implications of using speech within your applications.

Easy access to speech samples? Voice biometric technology is extremely useful, but only if it is easy for you to get speech samples from your end users. If your end users will typically be in a sports stadium or using construction equipment or will in any other environment that is very loud and/or noisy, then voice biometric technology may not be the best fit. Also, consider how natural it is for your end users to speak. Will they have ready access to a phone or other microphone device? And, consider if there are better technologies for your end users in their intended usage environment. For example, a fingerprint sensor makes much more sense than voice biometrics for logging into a laptop.

The good news however: with everyone carrying mobile phones these days, there are many good and valid scenarios for using voice biometrics.

Will Voice Biometrics Be Accurate Enough?

To begin with, it is important to recognize that no biometric is 100% accurate. For example, a 2014 study on iris recognition determined system accuracy could be between 90 and 99%, a broad range. Voice biometric accuracy is also in this range for a variety of reasons. However, even with its imperfections, voice biometrics is an extremely valuable tool.

The method to assess the accuracy of a voice biometric system is "Equal Error Rate", or "EER". EER is the point where the "False Acceptance Rate" or "FAR" (i.e., letting an imposter through) is equal to the "False Rejection Rate" or "FRR" (i.e., denying access to a valid user). Voice biometric scoring systems are based on statistical probabilities, so there are trade-offs between these errors to consider. For instance, if you set your confidence levels "high" to prevent impostors, you may end up blocking more valid users, which causes annoyance. Setting a confidence level "lower" will provide more convenience for your valid users, but you may end up letting in more imposters. VBG will work with you to provide an optimal balance between security and convenience. Also note that EER is measured for a single attempt. By allowing retries in your application, you can increase the probability that a valid user gets through on a second or third retry, even if you originally set confidence levels high to reject impostors.

Note that EER, FAR, and FRR results are derived from a set of audio samples used to process and obtain these measurements. Beware of extremely low EER values advertised by some vendors, as laboratory-derived EER results can be easily manipulated by removing samples that negatively impact results. EER results are only as good as the data sampling that was performed for their calculation. Unexpected real-world results can (and will) occur if your sampling is not truly representative of the end user population, specific languages and dialects, types of devices being used, environment where speech is being collected, etc. Therefore, we recommend that you compare voice biometric systems based on real-world users, including running trials within your intended production environment, rather than relying on published EER results with no insight into the data used for the test.

Another significant factor impacting real-world EER is the content of the speech sample and presence of noise. If a speech sample is too noisy, or if the wrong information is spoken to the voice biometric system, then the voice biometric engine will have trouble using the speech sample to make an accurate determination. Using different types of devices can also influece results. For example, mobile phone networks use different compression techniques compared to landline phones -- this affects the voice biometric process which extracts the unique vocal characteristics from speech samples.

What Kind of ROI Will I Get?

Assuming that voice biometrics is relevant to your business problem, and is sufficiently accurate for you, the next question to consider is whether implementing a voice biometric project will give you sufficient return on investment (ROI). This is an important question for Voice Biometrics Group, too. If we cannot provide sufficient accuracy and performance at a cost that enables a successful business case for you, we won’t have a business.

The business case for voice biometrics typically involves one or more of three business drivers:

  • Reducing exposure to fraud
  • Reducing costs for authentication
  • Improving the customer experience

Fraud reduction is always specific to your particular situation. Sometimes employees commit fraud, while other times fraudsters target call center agents with social engineering. VBG cannot share specific fraud reduction numbers from our clients; however, the cost savings are large -- and tend to be well in excess of the cost to implement a solution.

A second driver, reducing costs for authentication, most often arises in the context of an IVR system and/or call center. Voice biometrics can automate the authentication process on a large percentage of calls that would otherwise require live agent "handle time". For example, the cost to authenticate a call with an agent is equal to the seconds of agent time for the authentication multiplied by the cost of the agent per second. If an IVR that includes voice biometrics authenticates the caller before transferring to an agent, or if a voice biometrics system authenticates the caller during the beginning of the call with the agent, then that saves agent time. Savings in agent time can often be at little as 20 seconds, or up to 60 seconds or more. When you run numbers with our ROI Calculator, you should find overall savings for your business, even if not every call is automated, and even with setup and startup costs.

The third driver for voice biometrics is improving the customer experience (making it easier, faster, less obtrusive, etc.). In the call center scenario, not only does it save time for the agent, but it also saves time for the caller. And, many callers are sophisticated enough to recognize you are using advanced technology to protect them and provide then with greater security (as compared to asking questions like your mother's maiden name).

Measuring the return on improved customer experience and the impact to revenue and customer retention is difficult. The Return On Effort metric may be one possible measure, as well as Customer Sat and Net Promoter Scores. However, most companies have multiple competing ideas for improving customer experience as well as many other projects that provide a “hard” ROI. Therefore, the first two drivers, reducing fraud exposure and reducing costs for authentication, are what typically push companies to adopt voice biometric technology.

Voice biometric systems will provide savings, even when factoring in the costs for the biometric system and the resources needed to integrate these systems into your busienss process(es) and maintain them over time. And while companies typically have a wide range of contraints when considering voice biometric systems, we offer flexible pricing and are able to adapt our pricing to accommodate special needs. Use our ROI Calculator to assess the business impact of applying voice biometrics in a typical contact center.

How Do I Apply Voice Biometrics?

There are many variables to consider when integrating voice biometric technology into your business process(es). Two of the most critical considerations are: what will be the source(s) for your speech samples, and what will be contained in them? Each is discussed in greater detail below.

Speech Sample Sources

Voice biometric systems process speech samples; therefore, the first step in implementation is knowing how you will get speech samples from your end users. You need a good way to capture speech from a specific user, without capturing speech from other users, or unwanted sounds and noises.

The most common device to capture speech samples is the telephone. Whether a mobile phone, landline phone, or VoIP phone, people speaking on telephones can have their speech recorded using a range of techniques made for this purpose, possibly without them ever knowing. If you have an IVR system, the IVR will likely be able to record audio that is then passed to a voice biometric system.

A second way to capture speech is through a web browser on a computer or tablet, using the embedded microphone. Because nearly every modern laptop and tablet has a microphone, capturing audio through the computer is practical for many applications. Likewise, voice capture is also easily done on a mobile phone.

However, it is important to note that VBG technology can work with almost any form of recorded speech, provided that it is in a supported format. For example, one of our customers records people speaking during a language test and uses that audio to create voiceprints. Also, we have call center customers sending speech samples to us (post-call). Regardless of how you get the speech samples to us, note that you want to only capture the audio of the desired (individual) speaker.

Identify Desired Use Case

Relative to voice biometrics, the term "use case" refers to how speech samples are collected. Use cases also determine the content of the speech samples being processed by the voice biometric system. Voice Biometrics Group supports all possible speech capture methods; the following sections explain the options.

Active Methods

An "active" use case means the person speaking is repeating something they are required to speak. They are active and knowing participants in the process. One minor downside of requiring participation is that end users have to follow instructions and behave in an expected manner. In some cases, such as monitoring a parolee, you may not have a willing or compliant participant -- which can impact system accuracy.

On the other hand, active approaches have several advantages. First, because the speech content is known in advance, less time is needed to create a voiceprint. Also, active approaches tend to use short phrases, so they are easy for end users to repeat and require very little storage space. And finally, fraudsters will know about active systems. By simply having the threat of a voice biometric system, the rate of fraud decreases. Some people will of course try to crack the voice biometric process, but our customers tell us that the overall attempts at cheating decline significantly.

Active Method 1 - Static Text Passphrase

In this active variant, the user speaks the same phrase 2-3 times to create a voiceprint, then once to verify. For example, the phrase may be, “at VBG, my voice in my unique identity." Note that your passphrase can be the same for everyone or you could make the phrase unique for each person, although unique phrases create operational complexity.

Most voice biometric vendors provide static text passphrases as their primary offering due to its simplicity. However, a weakness of this approach is the ease at which a third party could record the person speaking and then use the recording to authenticate (what is called a "recorded playback attack"). This is a genuine concern with the quality of recording devices and a determined fraudster. And although there are techniques to detect "liveness", VBG generally recommends using RandomPIN™ (see below) for applications that are at higher risk of playback attacks.

Active Method 2 - Static Numeric Passphrase

This is identical to the Static Text Passphrase, except that the static phrase is a number. One very common approach for this method is to have the end user repeat his or her mobile phone number 3 times to create their voiceprint. This is easy to remember, changes infrequently, etc. Like text-based passphrases, this method is also susceptible to recorded playback attacks. However, if static passphrases must be used, VBG recommends that their use be combined with an outbound call to a mobile phone to help strengthen confidence (i.e., provide another factor -- possession of a cell phone).

Active Method 3 - RandomPIN™

With the RandomPIN™ use case, we require the end user to speak a set of diverse digit strings in order to enroll. To verify, we then ask the user to repeat a random number, typically 4 or 5 digits in length. A new number is generated for every verify attempt, for every user, for every call. The VBG system can provide the random string (of any length) or you can create your own. More than 60% of our customers choose this approach due to the robustness against fraud, speed of verification, and simplicity for the person speaking. There is nothing for the end user to remember, and both young and old, and cross-culturally, numbers are easy to speak.

Passive Methods

A "passive" use case means that the person speaking does not have to speak anything in particular. In fact, there is no prompting for speech at all, and the end user may often be unaware that they are involved in a voice biometric process. For example, you can capture the speech of a caller during their conversation with a call center agent. The recording of the caller's leg of the conversation can then be used to either create a voiceprint, or be compared to one or more voiceprints for verification or identification. Sometimes passive voice biometrics is called "conversational biometrics".

The most common source of passive speech samples is call center recordings. Before considering this approach, please note that many call center systems record both agent and caller into the same (mono) channel. In these situations, you will have additional work to do to capture the audio before it gets to the recording system. VBG has a number of partners who specialize in recording IVR and call center speech, so we may be able to assist you should you not have the ability to record individual callers into a separate channel.

The primary benefit of passive approaches is that you do not need to train end users or ask them to say anything specifically. Many business managers consider this the ideal, "frictionless" user experience for voice biometrics. The downside of passive approaches is that you are relying on whatever the speaker happens to be saying as input for the voice biometric system. In order to create a robust voiceprint using passive approaches, we need to capture all the phonemes of the language the user is speaking, preferably twice. This means more time is needed (compared to active approaches).

As a rule of thumb, end users will typically speak all phonemes of their language roughly twice in a 2-3 minute conversation. The VBG system removes silence and non-speech signals from passive recordings to arrive at a measure called “seconds of usable speech” (or SUS). Enrollment samples having SUS values of 60-90 or more will typically provide very good results. We can create voiceprints with only a few seconds of speech, but they will lack phonetic diversity (and will not lead to robust or accurate performance). So, consider 30 seconds of usable speech to be the minimum amount of speech for a successful passive enrollment (assuming this speech has phonetic diversity). And remember, saying "yes", "no", "umm" and other short-sylable words multiple times will not provide good speech diversity.

Passive Method - NaturalSpeech™

VBG has created the NaturalSpeech™ method to address all forms of passive speech collection. This use case can accept any speech recording for enrollment or verification, provided that it contains only one speaker and is in a format we accept. Once you have a robust NaturalSpeech™ voiceprint, it is possible to verify with only a few seconds of speech -- a short sentence, phrase, digits, almost anything. And, NaturalSpeech™ allows you to create multiple separate fraudster (i.e., "blacklist") voiceprint databases to determine if fraud is occurring.

Verify or Identify or Both?

By far the most common problem companies solve with voice biometrics is verification (user authentication). Voice verification allows you to take a speech sample from someone claiming a specific identity, and compare it to the stored voiceprint for that identity. This is a 1:1 process that is extremely fast. Verification needs only a couple seconds of speech and the voice biometric scoring process is sub-second. VBG systems process many millions of verification requests a year.

However, some situations, particularly fraud detection and prevention, require that a speech sample be compared to multiple voiceprints in order to determine the best match. Many scenarios exist for voice-based fraud to occur, such as rampant language testing cheating where an individual pays a professional test-taker to take their exam for them. VBG technology can detect the presence of a professional test-taker by analyzing past tests and comparing them with current speech samples.

Far more prevalent is the fraud that occurs in a call center or IVR system. Some utilities and other organizations with field-based sales personnel have issues where friends or co-workers of a sales person pretend to be a customer agreeing to purchase services -- so the sales person can earn a commission dishonestly. VBG systems detect this type of fraud. Within a call center conducting retail sales, "blacklists" can be setup to capture the speech of a customer making a purchase. Is this person claiming to be a new customer actually an existing customer? Or, is this person someone who previously defrauded you? VBG systems also detect these types of fraud.

It is important to note that depending on the size of the blacklist, comparing a speech sample to all voiceprints in the database may take significant processing time -- potentially hours. If you suspect that you will require large blacklists and many comparisons per day, VBG can arrange for a customized system to be deployed in our data center -- or can even arrange to put a system in your own data center.

Using Both. Many customers use both verification and identification as they have separate business processes that benefit from them. However, we even have customers using verify and identify together within a single scenario. Consider the use of "continuous verification" by a prison system using VBG technology. A prisoner is verified throughout the call, every 15-20 seconds, to validate that he or she has not handed-off the phone to another prisoner. If a score change is detected during a verify event, the same speech sample is also identified against the entire database to determine who the phone was given to. There are other similar scenarios, but VBG technology is extremely flexible and can be adapted to almost any conceivable situation.

What About Languages?

Since voice biometric technology deals exclusively with speech samples, language becomes an important topic. Fortunately, our technology is both text-independent and language-independent. That is, VBG technology can be deployed for any language and for any use case. Our core technology works by measuring how individuals uniquely say things -- not by what language they are speaking (or what they are saying). To-date, VBG has deployed production applications for customers in many countries, speaking 40 different languages/dialects.

And while we are able to score any language and use case with our core technology, we do perform separate speech collection projects and create separate models for each language and use case pairing. This is done to improve accuracy and to customize engine settings to your particular situation. For example, if you have one application where you are using our RandomPIN™ use case for U.S. English speakers, and another application where you are using a Static Text Passphrase for Latin American Spanish users, you should not expect to have robust performance (low EER) using one model. Separate language-use case models allow us to account for the different phonemes and expected content while creating voiceprints or performing verify or identify comparisons. Not every voice biometrics vendor takes this approach.

In the event VBG does not have your language and use case pre-modeled, we will work with you to capture a large enough sample of your language to set up your language on our voice biometric engine. For each new data collection effort, we like to gather voice samples from 75 males and 75 females. We can of course use more, but this is typically sufficent to help us build accurate models. Other voice biometric companies require substantially more data collection. This is due to the fact that their underlying algorithms requires more data for training, often on the order of a few thousand people.

What About Implementation?

At this point we've explained the situations where voice biometrics make sense, have addressed accuracy expectations, have discussed financial considerations and provided a calculator, have provided background on possible use cases, etc. If you are ready to get started, VBG strongly recommends that you participate in either a free trial with our VBG Enterprise™ system, or try our self-serve approach with VBG Pro™. In both cases, we'll give you access to the VBG Hosting System, API access credentials, and access to online development resources. And, our team will work with yours to get you up and running quickly!

VBG does not rely on salespeople or slick marketing materials to sell our software. We instead prefer a try-before-you-buy approach where you can prove to yourself that VBG technology performs as advertised, that VBG personnel are easy to work with, and that VBG has a fair and reasonable approach to subscriptions and licensing. We won't ask you for money or to sign a contract until you are 100% convinced that VBG is right for you. In preparing to test VBG technology, there are three general approaches available to you:

  • Send Speech Samples From Your System. If you are already working with IVR and/or call center applications, you can send your speech samples directly to us using our RESTful API. This option is appropriate for companies who wish to embed voice biometrics into their own product or want to include voice biometrics into a business process. In general, we have found that customers with existing speech application developers can have a solid prototype up in hours or days -- not weeks or months! Companies using mobile applications for tablets and smartphones can also use the RESTful API.
  • Use Our IVR System. VBG has integrated IVR capabilities, with pre-built IVR dialogs to enroll and verify users. During your trial, you can use inbound and outbound telephone calls in North America. Outside of North America, only inbound telephone calls are possible. Our IVR system is fully integrated into both our RESTful API and our VoiceXML API -- so you can choose whatever tools are most apppriate for your developers. And, our generic dialogs can be customized for a very reasonable fee should you wish to deploy a test system to outside users.
  • Turn-key Solutions. Finally, if you have little or no development resources available to you, VBG can help pair you with an appropriate development and IT integration services partner. VBG has a growing list of such partners globally and are happy to recommend one. See our Partners Page for more information. Regardless of the partner chosen, VBG will coordinate activities with our partner to design and implement a solution appropriate for you.

There are of course many other considerations when implementing a voice biometrics solution. However, the best way to get started is to prototype with some of the tools described above. There is no cost to you, other than the time to design a prototype and coordinate your development resources. And, you'll have the assurance of knowing exactly what you are getting -- before making any long-term financial commitment.

Want to Read More?

All Documents in this Series

Click on the title of the document you want to read next

Contact Us

Do You Have Any Questions?

Please let us know how we can help you and we'll respond promptly!

Image Description