“Passwords are inconvenient and insecure, and their end is in sight"
Emilio Martínez, CEO of Agnitio, the company specializing in voice biometric systems, highlights the benefits of "voiceprints" for personal authentication.
How did Agnitio start?
AGNITIO started in 2004 as a spin-off from the Madrid Polytechnic University. A group of the signal treatment department had worked for years with the Guardia Civil police force to develop a voice biometric system that would enable them to identify criminals and terrorists from their voices. This system was known from other European police forces and sparked considerable interest. The founding members negotiated a contract with the university for the transfer of the technology, and that's how AGNITIO came about.
What's the profile of the employees?
92% of the employees are engineers or graduates in areas of advanced technology. This is an extremely technological company with a very high component of R+D+i.
How many countries do you operate in and what sector interests you most or is most interested in your services?
Agnitio has facilities in over 40 countries around the world. The geographical areas where we have the greatest presence are eastern Europe, Latin America and the United States. In the US we have a subsidiary that sells to and supports our customers in North America. With regard to market segments, our presence is above all in the government security sector (police and intelligence corps and defense organizations), followed by the financial sector.
What are the features of biometric voice verification?
When someone speaks, as well as the information they transmit in the form of words, the unique features of their vocal tract –the larynx, the nasal cavity, the palate and so on– make an imprint on the sound waves. This voiceprint is independent of the language the person is speaking, the phrases they're saying, or the individual's state of mind. Card technology is capable of extracting this information and storing it in the form of a numerical “voice imprint”. When an unknown person is speaking on the phone, for example, that voice can be compared with the stored voiceprint, and allows the person speaking to be identified with an extremely high degree of accuracy.
Do they use complex and expensive tools?
The system is purely software. In other words, the voice can be recorded with any microphone and there's no need for any special IT processing system. Normal servers and computers are used. What's complex is the software for processing the signal –highly complex mathematical algorithms efficiently programmed to respond in fractions of a second with today's processors. The cost depends on the size of the project, measured in number of voiceprints that are to be used and/or the number of parallel processes that our technology is going to use. These systems are not accessible to individual consumers because of their cost, but are acquired and implemented by government organizations and financial institutions.
How has the technology advanced in this field?
The study of voice biometrics began in the 1990s, but it wasn't until the beginning of the 21st century that we saw the first systems that could be efficiently used. The UPM delivered the prototype of our forensic identification system to the Guardia Civil in 2000. Since then several pioneering research groups –of which we are proud to be one– have gradually improved and modified the algorithms to obtain better results. Our products are already now in the fourth generation of the technology.
Along with the improvements in accuracy, the calculations can now be done much faster and with fewer resources. Our earliest call-monitoring systems with voice biometrics required fifty times more processors than they do today.
What advantages does it have compared to other identification methods (fingerprints, iris…)?
The various biometric models are not in competition, but complement each other in a multi-factor authentication environment. When we come near a person we use all our “sensors” to identify them: their faces, their way of walking, their voice and so on. In the same way, in the future people will be automatically identified using all the possibilities available on the devices. Sometimes some will be more important than others. Voice is very important in situations of remote authentication when you can't see or "touch" the person –in telephone communications for example.
Are they more efficient?
The sensors needed to make a voice authentication are microphones, and today we all carry these around in our pocket (in our cellphones) or we have them on our computer. In terms of cost this makes it much more efficient than a fingerprint or iris recognition, which needs a special sensor requiring special care. And it's also the most natural system. Everyone's used to talking, to speaking on the phone. It's not something that's considered intrusive, unlike taking a photo of your eye or taking your fingerprint.
What's more we're increasingly using our voices to interact with our surroundings. In the future we'll communicate with a whole range of devices in our cars and homes and at work, by using our voices to give commands and obtain information. Using our same voice as a way of authenticating ourselves will become more natural than using any other biometric model. We'll identify ourselves with the same phrases we use when we to talk with virtual assistants.
What can it provide for a bank?
Voice biometric applications in the world of finance are only just beginning. Perhaps the application with the best return on investment today is fraud detection in call centers. The use of “blacklists” of voices of people who have committed fraud can be used to filter incoming calls. This is having a massive effect on preventing the proliferation of organized gangs who used the call center to commit fraud or obtain information with which to commit fraud through other channels.
Voice authentication also reduces fraud and makes the user experience on the telephone or mobile channel much faster and more user-friendly.
Multichannel systems can be deployed that combine authentication in the Bank's different channels: telephone, mobile, Internet, video-chat, social networks and so on. It can also be used to sign documents using the voice. Customers can use their voice to accept a contract, and this voice converted into a voiceprint can be used as the “signature” for the document. This way a customer can open a bank account via only the Internet and/or telephone channel without having to go to the branch office.
Of course many of these advances must pass through regulatory filters and be accompanied by other security and control systems.
In the case of a physical presence, a handwritten signature is normally used as authentication when making a transaction. And in many cases it includes the comparison of an identity document with a photograph. These procedures also have their weak points when the comparison is done by people who are not expert in handwriting or in document recognition. As far as I know there's no comparison of false positives between physical presence systems and biometric models, but I'm sure there would be no surprises, and particularly in the case of very well planned attacks.
Emilio Martínez, CEO of Agnitio
Is it very difficult to copy a voice?
Voices can be recorded and reproduced later in attempt to imitate the legitimate user. This is why the measures to prevent these recorded attacks (anti-spoofing in biometric slang) are very important in any implementation that does not involve an agent speaking to the person on a direct line. Agnitio has several patented systems that guard against these attacks using very sophisticated technology that detects the recording/reproduction cycle and compares it to the direct use of the voice.
It is impossible today to generate a voice which so closely resembles someone's voice that it can deceive the system. Both professional imitators and voice synthesis systems can easily be detected by our software.
Do you believe that man-in-the-middle attacks are more dangerous for voice systems, and is it easier to obtain a voice sample than one from another part of the body?
MITM attacks are normally related with encrypted communication between two systems. This occurs in voice biometrics when the communication takes place via the IP network and not the telephone channel. In these cases, protecting the voiceprint by correctly encrypting it is an additional problem to be resolved by systems integrators, to avoid the possibility of these voiceprints (and digital or facial biometrics) being intercepted and changed. This is why voice biometrics –in particular encrypted voiceprints (not the voice itself)– is neither more nor less likely to suffer attack by MITM.
However, the voice is often used as authentication via a different channel (out of band authentication). Using a brief telephone call on a previously agreed phone number and making the user say a certain phrase is the most convenient and secure way of making an out-of-band authentication.
Will biometric systems do away with passwords?
I'm convinced that the end of passwords is in sight, at least as a basic authentication tool. Companies, banks and online retailers can’t go on torturing their customers by making them remember secure passwords that need to be frequently changed. Today an average user can have as many as 60 passwords, with 20 being the average in a European country. Some studies indicate that ten of them are used on a daily basis. And of course to keep them safe they have to be long, contain uppercase and lowercase symbols, and be frequently changed! That's impossible to maintain.
There's no doubt that biometrics is one of the ways in which this problem is going to be resolved. It will be assisted by other security technologies to ensure that the entire process is secure and that the security is sufficiently dimensioned for the information it's protecting. I'm convinced that sooner or later we'll perhaps have just a few very secure passwords that we won't need to use very often, and which will only be used as backup systems in extreme cases.
What are the advantages of biometric verification by voice compared to passwords?
Passwords are impossible to remember when they're even remotely secure. And they're also easy to steal or find out. They have the worst of both worlds: they're inconvenient and not very secure. Multimodal biometric verification is both convenient for the user and extremely secure. It can be flexibly adapted to the environment.
Which country is the most advanced?
The use of voice biometrics in governments is more advanced in Europe than in the United States for several historic reasons. However the implementation in the financial sector is more advanced in the United States.
What are the next challenges for Agnitio?
The most important challenge for any company at the forefront of a technology sector is to stay ahead –like the Queen of Hearts in Alice in Wonderland, who had to keep running to stay in the same place. In other words, the challenge is to continue with the work of research into new algorithms in order to improve both the accuracy, speed and the size of the systems. Agnitio is now in the fourth generation of voice identification technology. We're now working on the fifth generation, which will be much more robust against environmental noise, have spectacular improvements in accuracy in short speech, and be able to learn to recognize users very fast over time. These are some of the requirements for authentication in the cloud services that we’ll be seeing on a massive scale in the coming years.