Speaker recognition in a conference
Speaker recognition in a conference
Abstract:
A system and method for identifying a participant during a conference call include the capability to receive a packet containing data that represents audible sounds spoken by one of a plurality of participants in a conference call and to determine a speaker of the audible sounds using voice profile information of the participants. The system and method further include the capability to provide identification information of the speaker to the other participants in the conference call contemporaneously with providing audible sounds based on the data to those participants.
Shmuel Shaffer and Michael E. Knappe, US patent 6,853,716.
Slide Notes
Shmuel Shaffer and Michael E. Knappe, “System and method for identifying a participant during a conference call”, Assignee: Cisco Technology, Inc. (San Jose, CA), United States Patent 6,853,716, February 8, 2005, Filed: April 16, 2001.
Transcript
[slide451] I mentioned the system for speaker recognition, and this is the information on the patent, you can read about it if you're interested. But speaker recognition is also powerful in a number of other settings. So I had a thesis some time ago of looking at speaker recognition in a mobile device to be able to do user authentication. Why is this an interesting thing? Well, how many people typically use a given handset? A fairly small number of people, right? One or two people, maybe a slightly bigger number in a family. So what's the advantage of using speaker recognition to recognize the user? It means as soon as they pick the device up and start talking, we can authenticate the user. And if they put it down and someone else picks it up, will it behave as their device? No, because it will say, no, you're not my user. Mark Smith, some time ago, who's responsible for the chip that's used in lots and lots of these optical mice, said, hey, why don't, yes, use the camera in a phone, so that you recognize the user when they pick it up, and you can say, no, no glasses, no beard, no service. Why is this so powerful? Because it means the user doesn't have to remember a PIN code and put it in. It makes the user experience much better. And it increases the security. So why do we want to have speaker recognition in a conference? Because quite often we're in conferences with people where we don't know all of the other people, so we don't recognize the voices. In fact, in many cases, particularly, I know in my case, in a foreign language, for me, it may be very, very hard to tell people apart, because I'm not used to the difference in the accents of the particular speakers. And to me, I can't discriminate between them. And so the result is, OK, now, who said that? Now, as you get to know people better and better, then you'll recognize them by their distinctive speech. But initially, this is incredibly helpful.