OpenTNL - Torque Network Library Design Fundamentals

Torque Network Library Design Fundamentals

The Torque Network Library was designed to overcome, as much as possible, the three fundamental limitations of network programming - high packet latency, limited bandwidth and packet loss. The following strategies were identified for each of the limitations:

Bandwidth conservation

Bandwidth conservation in multiuser simulations is of premium importance, not only for clients that may be connected over very low bandwidth transports, but also for servers whose operators have to pay for bandwidth usage. Realizing this, the following techniques are used to conserve bandwidth as much as possible:

Send static data once, or not at all: Often in a networked application, a server will transmit some amount of static data to a client. This data may be initialization parameters for a 3D world, a client's name, or some object state that is immutable for objects of a given type. TNL provides direct facilities for caching string data (client names, mission descriptions, etc), sending a given string only once and an integer representation thereafter. The Torque Game Engine also shows how simulations can cache common object instance data in DataBlock objects, which are transmitted only when a client connects.

Compress data to the minimum space necessary: When conserving bandwidth, every bit counts. TNL uses a utility class call BitStream to write standard data types into a packet compressed to the minimum number of bits necessary for that data. Boolean values are written as a single bit, integer writes can specify the bit field width, and floating point values can be specified as 0 to 1 compressed to a specified bit count. The BitStream also implements Huffman compression of string data and compression of 3D positions and surface normals.

Only send information that is relevant to the client: In a client/server simulation, the server often has information that is not relevant at all to some or all of the clients. For example, in a 3D world, if an object is outside the visible distance of a given client, there is no reason to consume valuable packet space describing that object to the client. The TNL allows the application level code to easily specify which objects are relevant, or "in scope" for each client.

See also:: NetObject

Prioritize object updates: Because bandwidth is limited and there is generally a much greater amount of data a server could send to a given client than it has capacity for, the TNL implements a very fine-grained prioritization scheme for updating objects. User code can determine the policy by which objects are judged to be more or less "important" to each client in the simulation, and objects with more importance are updated with a greater frequency.

See also:: NetObject,
GhostConnection

Only update the parts of an object that have changed: Often in a networked simulation, not all object state is updated at the same time - for exampe, a player in a 3D action game may have state that includes the player's current position, velocity, health and ammunition. If the player moves, only the position and velocity states will change, so sending the full state, including the health and ammunition values would waste space unnecessarily. The TNL allows objects individual objects to have state that are updated independently of one another.

See also:: NetObject

Coping with Packet Loss

In any networked simulation, packet loss will be a consideration - whether because of network congestion, hardware failure or software defects, some packets will inevitably be lost. One solution to the packet loss problem is to use a guaranteed messaging protocol like TCP/IP. Unfortunately, TCP has some behavioral characteristics that make it a poor choice for real-time networked simulations.

TCP guarantees that all data sent over the network will arrive, and will arrive in order. This means that if a data packet sent using TCP is dropped in transit, the sending host must retransmit that data before any additional data, that may have already arrived at the remote host, can be processed. In practice this can mean a complete stall of ANY communications for several seconds. Also, TCP may be retransmitting data that is not important from the point of view of the simulation - holding up data that is.

The other possible protocol choice would be to use UDP for time critical, but unguaranteed data, and use TCP only for data that will not hold up the real-time aspects of the simulation. This solution ends up being non-optimal for several reasons. First, maintaining two communications channels increases the complexity of the networking component. If an object is sent to a client using the guaranteed channel, unguaranteed messages sent to that object at a later time may arrive for processing before the original guaranteed send. Also, because the unguaranteed channel may lose information, the server will have to send more redundant data, with greater frequency.

To solve these problems, TNL implements a new network protocol that fits somewhere between UDP and TCP in its feature set. This protocol, dubbed the "Notify" protocol, does not attempt to hide the underlying unreliability of the network as TCP does, but at the same time it provides more information to the application than straight UDP. The notify protocol is a connection-oriented unreliable communication protocol with packet delivery notification. When a datagram packet is sent from one process, that process will eventually be notified as to whether that datagram was received or not. Each data packet is sent with a packet header that includes acknowledgement information about packets the sending process has received from the remote process. This negates the need for seperate acknowledgement packets, thereby conserving additional bandwidth.

See also:: NetConnection

The notify protocol foundation allows the TNL to provide a rich set of data transmission policies. Rather than simply grouping data as either guaranteed or unguaranteed, the TNL allows at least five different categorizations of data:

Guaranteed Ordered data: Guaranteed Ordered data are data that would be sent using a guaranteed delivery protocol like TCP. Messages indicating clients joining a simulation, text messages between clients, and many other types of information would fall into this category. In TNL, Guaranteed Ordered data are sent using Event objects and RPC method calls. When the notify protocol determines that a packet containing guaranteed ordered data was lost, it requeues the data to be sent in a future packet.

See also:: NetEvent

Guaranteed data: TNL processes Guaranteed data is in a way similar to Guaranteed Ordered data, with the only difference being that a client can process Guaranteed data as soon as it arrives, rather than waiting for any ordered data to arrive that was dropped.

See also:: NetEvent

Unguaranteed data: Unguaranteed data is sent, and if the packet it is sent in arrives, is processed. If a packet containing unguaranteed data events is dropped, that data is not resent. The unguaranteed data sending policy could be used for information like real-time voice communication, where a retransmitted voice fragment would be useless.

See also:: NetEvent

Current State data: For many objects in a simulation, the client isn't concerned with "intervening" states of an object on the server, but only its current state. For example, in a 3D action game, if an enemy player moves from point A to B to C, another client in the simulation is only interested in the final position, C of the object. With the notify protocol, TNL is able to resend object state from a dropped packet only if that state was not updated in a subsequent packet.

See also:: NetObject

Quickest Delivery data: Some information sent in a simulation is of such importance that it must be delivered as quickly as possible. In this situation, data can be sent with every packet until the remote host acknowledges any of the packets known to contain the data. Client movement information is an example of data that might be transmitted using this policy.

By implementing various data delivery policies, the TNL is able to optimize packet space utilization in high packet loss environments.

Strategies for Dealing With Latency

Latency is a function of the time-based limitations of physical data networks. The time it takes information to travel from one host to another is dependent on many factors, and can definitely affect the user's perceptions of what is happening in the simulation.

For example, in a client-server 3D simulation, suppose the round-trip time between one client and server is 250 milliseconds. If the client is observing an object that is moving. If the server is sending position updates of the object to the client, those positions will be "out of date" by 125 milliseconds by the time they arrive on the client. Also, suppose that the server is sending packets to the client at a rate of 10 packets per second. When the next update for the object arrives at the client, it may have moved a large distance relative to the perceptions of the client.

Also, if the server is considered to be authoritative over the client's own position in the world, the client would have to wait at least a quarter of a second before its inputs were validated and reflected in its view of the world. This gives the appearance of very perceptible input "lag" on the client.

In the worst case, the client would always see an out of date version of the server's world, as objects moved they would "pop" from one position to another, and each keypress wouldn't be reflected until a quarter of a second later. For most real-time simulations, this behavior is not optimal.

Because TNL is predominantly a data transport and connection management API, it doesn't provide facilities for solving these problems directly. TNL provides a simple mechanism for computing the average round-trip time of a connection from which the following solutions to connection latency issues can be implemented:

Interpolation: Interpolation is used to smoothly move an object from where the client thinks it is to where the server declares it to be over some short period of time. Parameters like position and rotation can be interpolated using linear or cubic interpolation to present a consistent, no "pop" view of the world to the client. The downside of interpolation when used alone is that it actually exacerbates the time difference between the client and the server, because the client is spending even more time than the one-way message time to move the object from its current position to the known server position.

Extrapolation: To solve the problem of out-of-date state information, extrapolation can be employed. Extrapolation is a best guess of the current state of an object, given a known past state. For example, suppose a player object has a last known position and velocity. Rather than placing the player at the server's stated position, the player object can be placed at the position extrapolated forward by velocity times the time difference.

In the Torque Game Engine, player objects controlled by other clients are simulated using both interpolation and extrapolation. When a player update is received from the server, the client extrapolates that position forward using the player's velocity and the sum of the time it will use to interpolate and the one-way message time from the server - essentially, the player interpolates to an extrapolated position. Once it has reached the extrapolated end point, the player will continue to extrapolate new positions until another update of the obect is received from the server.

By using interpolation and extrapolation, the client view can be made to reasonably, smoothly approximate the world of the server, but neither approach is sufficient for real-time objects that are directly controlled by player input. To solve this third, more difficult problem, client-side prediction is employed.

Client-side prediction is similar to extrapolation, in that the client is attempting to guess the server state of an object the server has authority over. In the case of simple extrapolation, however, the client doesn't have the benefit of the actual input data. Client-side prediction uses the inputs of the user to make a better guess about where the client-controlled object will be. Basically, the client performs the exact same object simulation as the server will eventually perform on that client's input data. As long as the client-controlled object is not acted upon by forces on the server that don't exist on the client or vice versa, the client and server state information for the object should perfectly agree. When they don't agree, interpolation can be employed to smoothly move the client object to the known server position and client-side prediction can be continued.