Saturday, August 15, 2009

Balancing Traffic Across Data Centres Using LVS

The LVS (Linux Virtual Server) project was launched in 1998 and is meant to eliminate Single Point of Failures (SPOF). According to the linuxvirtualserver.org website: “LVS is a highly scalable and available server built on a cluster of real servers, with the load balancer running on Linux. The architecture of the server cluster is fully transparent to the end user, and the users interact as if it were a single high-performance virtual server. The real servers and the load balancers may be interconnected by either a high speed LAN or by a geographically dispersed WAN.”

The load balancer is the single entry point into the cluster. The client connects to a single known IP address, and then inside the virtual server the load balancer redirects the incoming connections to the server(s) that actually does the work according to the scheduling algorithm chosen. The nodes of the cluster (real servers) can be transparently added/removed, providing a high level of scalability. The LVS detects node failures on-the-fly and reconfigures the system accordingly, automatically, thus providing high availability. Theoretically, the load balancer can either run IPVS or KTCPVS techniques for load balancing, but owing to a very high stability of IPVS, it is used in almost all the implementations I have seen. See the sidebar titled “IPVS v/s KTCPVS” for a brief note on the differences between the two. IPVS provides Layer 4 load balancing and KTCPVS provides Layer 7 load balancing (see the sidebar).

There are three load balancing techniques used in IPVS:
LVS/NAT – Virtual Server via NAT
LVS/TUN – Virtual Server via Tunnelling
LVS/DR – Virtual Server via Direct Routing



IPVS v/s KTCPVS
IPVS or IP Virtual Server is an implementation of Layer 4 load balancing inside the Linux kernel. Layer 4 load balancing works on OSI Layer 4 (Transport Layer) and distributes requests to the servers at the transport layer without looking at the content of the packets.

KTCPVS or Kernel TCP Virtual Server is an implementation of Layer 7 load balancing in the Linux kernel. Layer 7 load balancing is also known as application-level load balancing. The load balancer parses requests in the application layer and distributes requests to servers based on the content. The scalability of Layer 7 load balancing is not high because of the overhead of parsing the content.



IPVS Load Balancing Techniques
LVS/NAT: This technique is one of the simplest to set up but could present an extra load on the load balancer, because the load balancer needs to rewrite both the request and response packets. The load balancer needs to also act as a default gateway for all the real servers, which does not allow the real servers to be in a geographically different network. The packet flow in this technique is as follows:

• The load balancer examines the destination address and port number on all incoming packets from the client(s) and verifies if they match any of the virtual services being served.

• A real server is selected from the available ones according to the scheduling algorithm and the selected packets are added to the hash tables recording the connections.

• The destination address and port numbers on the packets are rewritten to match that of the real server and the packet is forwarded to the real server.

• After processing the request, the real server passes the packets back to the load balancer, which then rewrites the source address and port of the packets to match that of the real service and sends it back to the client.

LVS/DR: DR stands for Direct Routing. This technique utilises MAC spoofing and demands that at least one of the load balancer’s NIC and real server’s NIC are in the same IP network segment as well as the same physical segment. In this technique, the virtual IP address is shared by the load balancer as well as all the real servers. Each real server has a loop-back alias interface configured with the virtual IP address. This loop-back alias interface must be NOARP so that it does not respond to any ARP requests for the virtual IP. The port number of incoming packets cannot be remapped, so if the virtual server is configured to listen on port 80, then real servers also need to service on port 80. The packet flow in this technique is as follows:

• The load balancer receives the packet from the client and changes the MAC address of the data frame to one of the selected real servers and retransmits it on the LAN.

• When the real server receives the packet, it realises that this packet is meant for the address on one of its loopback aliased interfaces.

• The real server processes the request and responds directly to the client.

LVS/TUN: This is the most scalable technique. It allows the real servers to be present in different LANs or WANs because the communication happens with the help of the IP tunnelling protocol. The IP tunnelling allows an IP datagram to be encapsulated inside another IP datagram. This allows IP datagrams destined for one IP address to be wrapped and redirected to a different IP address. Each real server must support the IP tunnelling protocol and have one of its tunnel devices configured with the virtual IP. If the real servers are in a different network than the load balancer, then the routers in their network need to be configured to accept outgoing packets with the source address as the virtual IP. This router reconfiguration needs to be done because the routers are typically configured to drop such packets as part of the anti-spoofing measures. Like the LVS/DR method, the port number of incoming packets cannot be remapped. The packet flow in this technique is as follows:

• The load balancer receives the packet from the client and encapsulates the packet within an IP datagram, and forwards it to a dynamically selected real server.

• The real server receives the packet, ‘de-encapsulates’ it and finds the inner packet with a destination IP that matches with the virtual IP configured on one of its tunnel devices.

• The real server processes the request and returns the result directly to the user.

Source of Information : Linux For You May 2009

No comments: