Sunday, May 18, 2008

Windows Server 2008 Distributed File System

The Distributed File System (DFS) is a technology that allows several distinct filesystems, potentially on multiple servers, to be mounted from one place and appear in one logical representation. The different shared folders, which likely reside on different drives in different server machines, can all be accessed from one folder, known as the namespace. Folder targets serve to point from shared folder to shared folder to mimic a directory tree structure, which can be rearranged and altered according to a particular implementation's needs. DFS also allows the clients to know only the name of the share point and not the name of the server on which it resides, a big boon when you field help-desk calls asking, "What server is my last budget proposal located on?"

DFS namespaces come in two basic flavors: standalone namespaces, which store the folder topology information locally, and domain-based namespaces, which store the topology structure in Active Directory and thereby replicate that information to other domain controllers. In this case, if you have multiple namespaces, you might have multiple connections to the same data—it just so happens that they appear in different shared folders. You even can set up two different share points to the same data on two different physical servers, because DFS is intelligent enough to select the folder set that is geographically closest to the requesting client, saving network traffic and packet travel time.

DFS in Windows Server 2008 is, essentially, made of two components:

DFS namespaces

These allow you to group shared folders stored on different servers and present them to users in one coherent tree, making the actual location of the files and folders irrelevant to the end user.


DFS replication

This is a multimaster replication engine that supports scheduling, bandwidth throttling, and compression. Most notably, DFS Replication now uses an algorithm known as Remote Differential Compression (RDC), which efficiently updates files over a limited-bandwidth network by looking at insertions, removals, and rearrangements of data in files, and then replicating only the changed file blocks. There is substantial savings to this method.

Let's walk through this. When an end user wants to open a folder that is included within a DFS namespace, the client sends a message to the namespace server (which is simply a machine running the Windows Server 2008 or Windows Server 2003 R2 version of DFS). That machine will then refer the client to a list of servers that host copies of those shared folders (these copies are called folder targets). The client machine stores a copy of that referral in its cache and then goes down the referral list in order, which is automatically sorted by proximity so that a client is always using servers within his Active Directory site before traversing to machines located outside of his current location.

But let's crack the nut a little further and see where DFS replication comes into play. In a very basic scenario, you can store a folder on a server in New York and the same folder on a server in London, and replication will take care of keeping the copies of the folders synchronized. Users, of course, have no idea that these folders are kept in geographically disparate locations. However, the replication mechanism is incredibly optimized: it determines what has changed about two files and then, using remote differential compression, only sends the differences between the files and folders over the wire. Over slow WAN links and other bandwidth-metered lines, you'll see a real cost savings. You really see the benefits when relatively minor changes to large files are made. According to Microsoft, a change to a 2 MB PowerPoint presentation can result in only 60 KB being transmitted across the wire—which equates to a 97.7% savings in terms of amount of data sent. Delving a bit further, we get this explanation according to the product team, they "ran a test on a mix of 780 Office files (.doc, .ppt, and .xls) replicating from a source server to a target server using DFS Replication with RDC. The target server had version n of the files and the source server had version n+, and the two versions differed with significant edits. The percent savings in bytes transferred was on average 50 percent and significantly better for large files."

Of course, with DFS you also get the fault tolerance benefit of "failing over" to a functional server if a target on another server isn't responding. In previous versions of DFS in earlier releases of Windows Server, there wasn't a simple way for you to instruct clients, after a failure, to resume back to their local DFS servers once the machines came back online. Now you can specify that clients should fail back to a closer, less costly server when services are restored.

Although Windows Server 2008's DFS components are two separate technologies, when they're used in tandem, they solve some real problems companies face. Take branch office backup, for instance. Instead of tasking your administrators in these offices with tape drive maintenance, backing up, storing data off site, and everything else associated with disaster avoidance, simply configure DFS to replicate data from the servers in the branch office back up to a hub server in the home office or another data center. Then, run the backup from the central location. You are more efficient in three ways:

  • You save on tape hardware costs.

  • You save time through the efficiencies in DFS replication.

  • You save manpower costs because your IT workers at the branch offices can move on to other problems and not spend their time babysitting a backup process.

What about software distribution? DFS really excels at publishing documents, applications, and other files to users that are separated geographically. By using namespaces in conjunction with replication, you can store multiple copies of data and software at multiple locations throughout the world, better ensuring constant availability and good transfer performance while still making it transparent to users where they're getting their files from. DFS replication and namespaces automatically look at your AD site structure to determine the lowest cost link to use in the event that a local namespace server isn't available to respond to requests.

The UI for managing DFS has also improved over the more clunky and less put-together MMC snap-in in Windows 2000 and the original version of Windows Server 2003. The new snap-in offers you the ability to configure namespaces and other abilities that previously only existed through the command-line interface.


*.* Source of Information : O'Reilly Windows Server 2008: The Definitive Guide

No comments: