Voice over IP (VoIP) - high level view

Voice over IP (VoIP)

VoIP is an end-to-end architecture [RFC 1958] which exploits processing in the end points. Unlike the traditional Public Switch Telephony Network - where processing is done inside the network.

IP cloud connecting a Cellular IP Terminal and a Fixed IP Terminal with a VoIP Server attached to the cloud
VoIP architecture: IP end-to-end

 

Cellular IP terminal:

  • CODEC
  • IP stack
  • radio

VoIP server

  • call/session
  • routing
  • transcoding

Fixed IP terminal

  • CODEC
  • IP stack
  • ethernet

 

Network Convergence:

In the past, many different networks - each optimized for a specific use: POTS, data networks (such as X.25), broadcast radio and television, … and each of these in turn often had specific national, regional, or proprietary implementations) (Now) we think about a converged network which is a global network

Slide Notes

B. Carpenter, ‘Architectural Principles of the Internet’, Internet Request for Comments, vol. RFC 1958 (Informational), June 1996, Available at http://www.rfc-editor.org/rfc/rfc1958.txt.


Transcript

[slide21] But fundamentally, here's the picture of what's going on, and why it works. It's an end-to-end architecture. Now, why is that an important change? Well in traditional telephoning systems, it's a centralized architecture with a single clock. All control happens in the switches. All the timing is done off that single clock. The result is, it takes a very, very long time to change anything about traditional telephoning systems. Because you have to change the whole infrastructure. In this, the advantage is, we have the smarts on the end. So each of the devices, the terminals, chooses its own CODEC. And as long as the communicating parties choose CODECs that each of them is able to handle, it doesn't matter what CODECs they use to anyone else. They, of course, send IP packets, they can use whatever media that they want, and deliver them to the other party. That doesn't mean we can't add extra infrastructure. So we can add servers to do things like transcoding. And today there are systems that will transcode in real time between, for instance, English and Spanish, English and Japanese, and bunches of other languages. And Microsoft has a new system that they've just recently released, doing translations in something like 60 different languages. So, very, very powerful utility. And, also transcoding. And transcoding means we can take the output of one CODEC, and we can transcode the media into the format wanted by the destination CODEC. Now why is this so powerful? That the CODECs at the end, and the end devices are doing all the work. So, the first young lady here in the first row. Why is this so powerful? Okay. Why? No, why have the CODECs in the endpoints, not in the switches? [student answers] That's right. In particular, we can make it so individualized, as to have a personal CODEC. And I've been, for years, trying to get someone to implement this. But, the idea is that, of course, traditionally, if we think about information theory, what's the relation between the information that can be transferred from one party to another, versus the channel capacity? What's the relation that ties those things together? Anyone know? Shannon, right, has told us about the information capacity of a channel. He has given us an equation that you can use. So, if you know the channel has a certain capacity, you can figure out how many bits per unit time can be transported across that channel. But that assumes that the parties don't have any pre-shared information. The simplest example that I can give is, you see this cute guy, girl, whatever, across the room, and you wink at them. One bit of information. But think how powerful that is in what you've communicated. So the idea of a personal CODEC is, the CODEC recognizes, for instance, my voice, encodes it, and now sends it to someone else who I've given my personal DECODEC to. And so now, when it receives the coded voice, what does it do? It synthesizes my voice using the whole set of information that it has in its synthesizer. So that means we can have 64 kilobits or 192 kilobits per second 16-bit linear encoded audio that gets coded down to perhaps a few hundred bits per second that gets delivered at the other end with very high fidelity. It doesn't break Shannon's Law. Why not? Next gentleman. Why doesn't it break Shannon's Law? [student answers] That's right, because we're only having to say what thing in the dictionary I already gave you to play out. That's all we actually have to send across the channel. That's limited by Shannon's. The fact that we gave them a quarter of a gigabyte worth of data previously in the dictionary, well, that just gives them a better base to work on. But the essential thing is these are both computers at the ends. Because we actually have active processing at the ends, we can do much more clever things than we can do with a traditional analog telephony handset, which has no intelligence, no processing in it whatsoever. And that processing on the edges suddenly boosts the set of things we can do. Now, in a traditional network, all the processing happens inside the network, inside the switches. So it would take years to roll out a new feature in the switches. How long does it take to roll out a new feature in a voice-over IP system? The time it takes you to program it and give it to your friends, right? So the result is it can spread at enormous speed. It very fundamentally changes the kinds of services we can have. It's led to network convergence. In the past, what happened is we built a different network for every different service we wanted. We had a telegraphy network, we had a voice network, we had a broadcast radio network, a broadcast TV network, etc. And what's happening? It's all merging today to a converged network, which is actually a global network. So very, very big change.