|
Introduction to Network Programming Concepts
Introduction to Network Programming ConceptsComputer networking technology allows computers that share the network to send messages to one another. Computer networks vary greatly in complexity, from two machines connected together with a link cable, to the globe-spanning internet connecting millions of machines together over fiber-optics, satellite links, phone lines and other mediums.Fundamental Limitations of Computer NetworksComputer networks of any size share some common limitations to varying degrees that must be accounted for in network simulations, with the internet being the most limited in all three regards. These three fundamental problems in network simulation are:Limited Bandwidth - There is a limit to the rate at which hosts on the network can send data to one another. If a computer is connected to the network with a 56 kbps modem, this might be 5 Kbytes per second, while computers on a local area network might be able to communicate at 128 megabytes per second. For service providers, additional bandwidth capacity can be costly, so even if there is no physical bandwidth limitation, bandwidth conservation is important for many projects. Packet Loss - Computer networks are inherently unreliable. Information transmitted over a network may become corrupted in transit, or may be dropped at a router where traffic has become congested. Even when (especially when) using a guaranteed message delivery protocol such as TCP, the unreliable nature of the underlying network still must be taken into account for network applications. Latency - Messages sent from one host to another on the network take time to arrive at the destination. The time can be influenced by many factors, including the medium over which the messages travel, how many intermediate hosts must route the message, an the level of traffic congestion at each of those network nodes. Latency becomes particularly problematic in network simulations that attempt to present a real-time interface to the client, when the latency of the connection may be perceptible in time. See Torque Network Library Design Fundamentals for information on how TNL deals with the fundamental limitations of computer networks. Standard Network ProtocolsWhen computers communicate over networks, they send and receive data using specific network protocols. These protocols ensure that the computers are using the same specifications to address, forward and process data on the network. The internet, certainly the most widely used computer network today, uses a stack of three primary protocols that facilitate communication over the network. They are:IP - Internet Protocol: The Internet Protocol is the basic building block for internet communications. IP is a routing protocol, which means that it is used to route information packets from a source host to a destination host, specified by an IP address. IP packets are not guaranteed to arrive at the destination specified by the sender, and those packets that do arrive are not guaranteed to arrive in the order they were sent. IP packet payloads may also be corrupted when they are delivered. IP is not useful as an application protocol - it is used mainly as a foundation for the higher level TCP and UDP protocols. UDP - User Datagram Protocol: The User Datagram Protocol supplies a thin layer on top of IP that performs error detection and application level routing on a single host. UDP packets are addressed using both an IP address to specify the physical host, and a port number, to specify which process on the machine the packet should be delivered to. UDP packets also contain a checksum, so that corrupted packets can be discarded. UDP packets that are corrupted or dropped by an intermediate host are not retransmitted by the sender, because the sender is never notified whether a given packet was delivered or not. TCP/IP - Transmission Control Protocol: TCP was designed to make internet programming easier by building a reliable, connection-based protocol on top of the unreliable IP. TCP does this by sending acknowledgements when data packets arrive, and resending data that was dropped. TCP is a stream protocol, so the network connection can be treated like any other file stream in the system. TCP is not suitable for simulation data, because any dropped packets will stall the data pipeline until the dropped data can be retransmitted. Network Protocols and TNLSome network systems use both TCP and UDP - TCP for messages that must arrive, but are not time sensitive, and UDP for time-sensitive simulation updates.Torque Network Library Design Fundamentals contains a discussion of why this is not optimal for bandwidth conservation, and an explanation of the protocol solution implemented in TNL. Berkeley (BSD) Sockets - the standard network APIThe BSD Sockets API describes a set of C language interface routines for communicating using the TCP protocol suite, including IP and UDP. The sockets API allows processes to open communcation "sockets" that can then be assigned to a particular integer port number on the host.Sockets created to use the TCP stream protocol can either be set to "listen()" for connections from remote hosts, or can be set to "connect()" to a remote host that is currently listening for incoming connections. Once a connection is established between two stream sockets, either side can send and receive guaranteed data to the other. Sockets can also be created in datagram mode, in which case they will use the underlying UDP protocol for transmission of datagram packets. Since the datagram socket mode is connectionless, the destination IP address and port must be specified with each data packet sent. The socket API contains other utility routines for performing operations such as host name resolution and socket options configuration. The Microsoft Windows platform supplies a Windows socket API called Winsock that implements the BSD socket API.
Application Network TopologiesNetworked applications can be designed to communicate with each other using different topological organization strategies. Some common communcations organization paradigms are discussed below.Peer-to-Peer: In a peer-to-peer network application, the client processes involved in the network communicate directly with one another. Though there may be one or more hosts with authoritative control over the network of peers, peer-to-peer applications largely distribute the responsibility for the simulation or application amongst the peers. Client-Server: In the client-server model, one host on the network, the server, acts as a central communications hub for all the other hosts (the clients). Typically the server is authoritative, and is responsible for routing communication between the several clients. Client-Tiered Server Cluster: In network applications where more clients want to subscribe to a service than can be accomodated by one server, the server's role will be handled by a cluster of servers peered together. The servers in the network communicate using the peer-to-peer model, and communicate with the clients using the client-server model. The servers may also communicate with an authoritative "super-server" to handle logins or resolve conflicts. TNL imposes no specific network topology on applications using it. Security and EncryptionWhen a message is sent between two hosts on a network, that message may pass through any number of intermediate wires, hosts and routers or even sent wirelessly to a satellite or a WiFi base station. This opens up the potential that some third party who has access to any one of the intermediate communication forwarding points may eavesdrop on and/or change the contents of the message. Often networked applications send sensitive user data, like credit card numbers, bank account information or private correspondence. These applications rely on encryption algorithms to protect messages sent over an unsecure network.Symmetric EncryptionSymmetric Encryption algorithms operate under the assumption that the two parties sending messages to each other share a common, secret key. This key is used to encode the data in such a way that only users with the secret key can decode it. In the ideal case, the ciphertext (encoded version of the message) looks like a random string of bits to any eavesdropper.There are many Symmetric Encryption algorithms of varying levels of security, but all of them when used alone have several drawbacks. First, both parties to the communication must have a copy of the shared secret key. If a client is attempting to communicate with a service it has never contacted before, and doesn't share a key with, it won't be able to communicate securely with it. Also, although an intermediate eavesdropper cannot read the data sent between the hosts, it can alter the messages, potentially causing one or more of the systems to fail.
Message AuthenticationIn order to detect when a message has been altered by a third party, a secure messaging system will send a message authentication code (MAC) along with the data. The MAC is generated using a cryptographically secure hashing function, such as MD5 or SHA. Secure hash algorithms take an array of bytes and compute a cryptographically secure message digest of some fixed number of bytes - 16 in the case of MD5, 32 for SHA-256 and so on.When a message is sent using a symmetric cipher, the hash or some portion of it is encrypted and sent along as well. Any changes in the message will cause the hash algorithm to compute a different hash than that which is encrypted and included with the message, notifying the receiving party that the message has been tampered with. Public Key Cryptography and Key ExchangePublic Key Cryptography algorithms were invented to solve the symmetric key distribution problem. In public key algorithms, each participant in the communication has a key pair composed of a public key and a private key. The theory of public keys suggests that a message encrypted with the public key can only be decrypted with the private key and vice-versa.Key exchange algorithms (Diffie-Helman, ECDH) have certain properties such that two users, A and B can share their public keys with each other in plain text, and then, using each other's public key and their own private keys they can generate the same shared secret key. Eavesdroppers can see the public keys, but, because the private keys are never transmitted, cannot know the shared secret. Because public key algorithms are computationally much more expensive than symmetric cryptography, network applications generally use public key cryptography to share a secret key that is then used as a symmetric cipher key.
Digital Signatures/Certificate AuthorizationPublic key algorithms still have one vulnerability, known as the Man-in-the-Middle attack. Basically, an eavesdropper in the communication between A and B can intercept the public keys in transit and substitute its own public key, thereby establishing a secure connection with A and B - decrypting incoming data and reencrypting it with its shared key to the opposite party.To combat this attack, the concept of certificates was introduced. In this model, the public key of one or both of the participants in the communication is digitally signed with the private key of some known, trusted Certificate Authority (CA). Then, using the Certificate Authority's public key, the parties to the communication can validate the public key of the opposite parties.
Security and the TNLThe TNL uses the publicly available libtomcrypt (http://libtomcrypt.org) as its encryption foundation. The TNL features key exchange, secret session keys, message authentication and certificate verification.More Information on CryptographyThe preceding sections were only a very high level overview of some cryptographic algorithm theories. For more information, see the USENET cryptography FAQ at http://www.faqs.org/faqs/cryptography-faq/Additional Challenges Facing Network ProgrammersMalicious Attackers and Denial of ServiceA problem facing developers of internet applications and web sites are Denial-of-Service attacks. Malicious users employing custom tools often attempt to shut down publicly available internet servers. There are a variety of well-known categories of DoS attacks.Traffic flooding: This attack sends a barrage of data to the public address of the server, in an attempt to overwhelm that server's connection to the internet. This attack often employs many machines, hijacked by the attacker and acting in concert. These attacks are often the most difficult to mount since they require a large number of available machines with high-speed internet access in order to attack a remote host effectively. Connection depletion: The connection depletion attack works by exploting a weakness of some connection based communications protocols. Connection depletion attacks work by initiating a constant flow of spurious connection attempts. When a legitimate user attempts to connect to the server, all available pending connection slots are already taken up by the spoofed connection attempts, thereby denying service to valid clients.
The TNL uses a two-phase connection protocol to protect against connection depletion attacks, and implements a client-puzzle algorithm to prevent CPU depletion attacks. Also, TNL is built so that bogus packets are discarded as quickly as possible, preventing flooding attacks from impacting the server CPU.
Firewalls and Network Address Translation (NAT) routersFirewalls are software or hardware devices designed to protect local networks from malicious external attackers. Most firewalls filter all incoming, unsolicited network traffic. NAT routers exist to allow potentially many machines to share a single IP address, but for practical purposes they share similar characteristics with firewalls, by often filtering out unsolicited network traffic.In client/server applications, firewalls and NATs aren't generally a problem. When a client behind a firewall or NAT makes a request to the server, the response is allowed to pass through the firewall or NAT because the network traffic was solicited first by the client. In peer-to-peer applications, NATs and firewalls pose a greater challenge. if both peers are behind different firewalls, then getting them to communicate to each other requires that they both initiate the connection in order for the other's data to be able to flow through the firewall or NAT. This can be facilitated by a third party server or client that both are able to communicate with directly. The TNL provides functionality for arranging a direct connection between two firewalled hosts via a third party master server.
|