Real-time multimedia data is expensive to transport and must be received in a timely manner, with utmost importance placed on preserving the linear progression of sequential timed data at the user's end.
Two major obstacles stand in the way of high-fidelity, real-time multimedia data transport over an internet: limited available network bandwidth and dynamically varying transmission delays. Understandably, the larger problem of the two is the high bandwidth requirement necessitated by real time audio and video. By way of example, a full-motion video and audio stream with television resolution (512x512 pixels) at 256 colors per pixel (8 bits per pixel) and 30 frames per second, and with 8-bit 22.05 kHz sampled audio has the following bandwidth requirements for the video signal (EQ 4.1):
and for the audio signal (EQ 4.2):
Yielding a total far greater than that which we'd want to even think about transferring over the Internet: 63,090,960 bits/second - over six conventional (thinnet) Ethernets running in parallel. A more realistic data stream would be that found in a conventional teleconferencing system: 160x120, 6-bit gray scale, 10 frames per second and better than telephone quality 8-bit, 11.05 kHz audio: total required bandwidth just over 1 megabit/s. Steep, but not unreasonable on a LAN. If transmission over the actual Internet is required, then some amount of compression will be necessary on the data stream to obtain low enough rates.
Once the difficulties associated with bandwidth limitations have been overcome, any good transmission system needs to be able to deal with the finite and possibly dynamic delay between sender and receiver. In the case of the most familiar of multimedia data streams, the television signal, engineers needed only to worry about a small, fixed delay due to the distance between the transmitter and the television set to which the television signal in the form of an electro-magnetic wave propagates at the speed of light (as in figure 4.1.)
For a modern packet switching internetwork, the delay is not fixed, and dynamically varies with the bandwidth utilization that the network is currently enduring (as well as with other variables.) When networks become congested, extreme delays many orders of magnitude greater than that of the average expected delay can result. With a conventional networked connection, the ideal, timely delivered multimedia data stream (as depicted in Figure 4.2, representing a set of images comprising a movie of a bouncing ball) can undergo delays, distorting the presentation of timed sequential data beyond what is acceptable (Figure 4.3.)
The motion depicted by the images in figure 4.3 defies our common sense about what happens when a ball bounces, and is different than figure 4.4, in which we close our eyes for a period of one frame. In the second case, the world with linear time keeps moving while we ignore it; in the first, time stops, and picks up again where it left off!
For this reason, multimedia streaming was developed to attempt to overcome, or at least stave off temporarily the effects of varying transmission delays.
Multimedia streaming works by buffering a good amount of the data before presenting it to the user, much like a typical household water heater heats a cache of water before anybody actually needs it, rather than heating it on demand. Figure 4.5 illustrates the concept of data flow in a multimedia stream through this ``water heater'' analogy. The rate of data output is independent of the input rate, as long as there is enough data (or water) in the cache to source the required amount of output. If the input rate begins to lag behind the output rate, eventually there will not be enough data in the cache to support the high output rate, and our stream runs dry (and we don't have any hot water!)
Depending on the speed of the network that will be used to transport multimedia data, streaming applications may buffer from a few seconds of data to a few minutes of data. The stream is a reservoir, providing an illusion of continuous delivery rather than the actual delivery rate that the network can provide. Designers of streaming applications hope that the reservoir is never emptied because the output rate was in excess of the input rate; they wish that the reservoir only dries up when the end of the multimedia stream has been reached.
Although this method of multimedia data presentation offers high-fidelity output at the user's end, it can tend to utilize quite a bit more of the available network bandwidth than is acceptable for a ``nicely behaved application.'' If the streaming protocol is built on top of a fully reliable protocol such as TCP, segments of data that are lost due to congestion will be retransmitted until they have been acknowledged properly, adding to the congestion of the network. What's worse, if difficulty is encountered in transmitting a particular segment, the protocol will not attempt to bypass it, and the transmission will become stuck on that segment. Often, these difficulties result in the depletion of the multimedia reservoir, forcing the incoming network data to completely source the presented data stream.
Reliable data transmission protocols such as TCP were not designed to handle the special requirements of real-time applications. They were designed for low bandwidth interactive applications such as telnet and potentially high bandwidth non-interactive applications such as electronic mail handling and ftp. The best-effort transport services inherent to UDP are more suited to delivering multimedia data payloads.
The User Datagram Protocol is becoming an important player in the realm of multimedia protocols. Because it is essentially an interface to the low-level Internet Protocol, and because it offers a speedy checksum and I/O multiplexing through Berkeley sockets, it is an ideal choice for applications that do not wish to be constrained to the flow control mechanism in TCP. However, operation without any flow control in place will quickly fill the local socket-level buffers and UDP datagrams will be discarded before they even reach the physical network.
Work performed by myself and graduate student Brian Hazzard at WPI while under contract from Lockheed Martin led to the creation of a network testing system with a quickly discovered ability to overload local buffers. We found, when operating with no flow control, save for an adjustable time delay between datagrams, exceeding an experimentally determinable maximum transmission rate would degrade network throughput as well as load the CPU tremendously; in effect, the application could create a much worse performance situation by sending out data ``as fast as it could'' rather than by limiting itself to a maximum transmission rate. An excerpt from Hazzard's Evaluation of Optimal Socket Buffer Sizes for the Internet Protocol Suite (draft)  illustrates this experimentally derived fact:
... applications  that generate datagrams faster than the kernel can handle the data, result in poor utilization of CPU time and degradations in performance will be observed, resulting in larger Round Trip Times (RTTs) [in the case of our RTT measuring network testing system] and slower network throughputs.
In effect, protocols such as TCP, with a highly refined flow control mechanism, attempt to dynamically center in on that optimum transmission rate through the feedback loop formed by data transmission and subsequent data acknowledgment. Given this observation, operation without flow control has become out of the question for high-bandwidth multimedia applications.
A last important consideration for the transport of multimedia data remains, and that is unique to this project. With earlier protocol efforts, all data was assumed to be important and therefore was delivered reliably; there would be very little chance that a transferred file would be acceptable if corrupt or incomplete in any way. With real-time multimedia data, the primary concern is for preserving the linear presentation of timed sequential data. Other aspects of the end product's quality will be allowed to suffer, as long as the remaining data is delivered so as not to disturb the linear time of the presentation.