A Distributed Computing Lecture by Steven Choy
NFS (Network File Systems)
NFS Concepts
- A typical client/server application
- Client side import file system from remote machine
- Server side export file system to remote machine
- Each machine can be either client or server, or can be both client and server
- Based on RPC (remote procedure call)
- NFS can be used over any kind of datagram (UDP) or stream protocols (TCP)
- Many RPC requests in NFS protocol are idempotent
NFS Architecture
NFS Protocol
- NFS protocol designed without states (Stateless protocol)
- No need for server to hold information about which client is working with which file.
- To get their work done, server need only information from RPC requests.
- NFS designed to support UNIX file system semantic, but protocol design can be adopted to support any file system semantic
- Security and access check mechanisms based on Unix UID and GID mechanism.
- NFS protocol design doesn’t depend on transport protocols. It’s used with UDP by default, but still can be used with TCP protocol.
- NFS Commands: Some Examples
NFS Implementation
- Each file on the server are identified by the file handler for clients to access
- FreeBSD NFS implementation create file handlers using inode + file system id + generation number.
- The main aim of this manipulation to create file handler globally unique.
- NFS VFS (Virtual File System)
- VFS added to UNIX kernel
- Access-transparent file access
- Distinguishes between local and remote access
- At client Side:
- Processes file system calls to determine whether access is local (passes it to UNIX FS) or remote (passes it to NFS client).
- At server side:
- NFS server receives request and passes it to local FS through VFS.
- If local, translates file handle to internal file id’s (in UNIX i-nodes).
- If file local, reference to file’s i-node.
- If file remote, reference to file handle.
- File handle: uniquely distinguishes file.
Starting up NFS
- There are three key things you need to start on Linux to make NFS work.
/usr/sbin/rpc.portmap
/usr/sbin/rpc.mountd
/usr/sbin/rpc.nfsd
rpcinfo -p localhost
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100005 1 udp 679 mountd
100005 1 tcp 681 mountd
100003 2 udp 2049 nfs
100003 2 tcp 2049 nfs
Exporting File System
- To make parts of your file system accessible over the network to other systems, the
/etc/exports file must be set up to define which of the local directories will be available to remote users and how each is used.
- A sample of
/etc/exports file
/home/yourname 192.168.12.1(rw)
/master(rw) trusty(rw,no_root_squash)
/projects proj*.local.domain(rw)
/usr *.local.domain(ro) @trusted(rw)
/home/joe pc001(rw,all_squash,anonuid=150,anongid=100)
/pub (ro,insecure,all_squash)
/pub/private (noaccess)
- To stop and restart the server
# etc/rc.d/init.d/nfs stop
# etc/rc.s/init.d/nfs start
- Local and remote file systems accessible on an NFS client
mount –t nfs Server1:/export/people /usr/students
mount –t nfs Server2:/nfs/users /usr/staff
NFS Transport protocol
- Originally used UDP.
- Better performance in LANs.
- NFS and RPC do their own reliability checks.
- Most current implementations support both UDP and TCP.
- WANs: congestion control.
- TCP officially integrated in NFS v.3.
Demonstration
NFS Server
apt-get install nfs-kernel-server nfs-common
rpcinfo -p
- Edit /etc/exports and add the share
/root/shared 202.40.219.247(rw,sync,no_subtree_check)
exportfs -ra
/etc/init.d/portmap restart
/etc/init.d/nfs-kernel-server restart
NFS Client
apt-get install portmap nfs-common
mount 202.40.219.240:/root/shared /root/import
AFS (Andrew File Systems)
AFS Overview
- AFS tries to solve complex issues such as
- uniform name space,
- location-independent file sharing,
- client-side caching (with cache consistency),
- secure authentication (via Kerberos)
- Also includes server-side caching (via replicas), high availability
- Can span 5,000 workstations
- Clients have a partitioned space of file names:
- a local name space and a shared name space
- Dedicated servers, called Vice, present the shared name space to the clients as an homogeneous, identical, and location transparent file hierarchy
- Workstations run the Virtue protocol to communicate with Vice.
- Servers collectively are responsible for the storage and management of the shared name space
- Clients and servers are structured in clusters interconnected by a backbone LAN
- A cluster consists of a collection of workstations and a cluster server and is connected to the backbone by a router
- A key mechanism selected for remote file operations is whole file caching
- Opening a file causes it to be cached, in its entirety, on the local disk
AFS Shared Name Space
- The server file space is divided into volumes. Volumes contain files of only one user. It's these volumes that are the level of granularity attached to a client.
- A vice file can be accessed using a fid = <volume number, vnode >. The fid doesn't depend on machine location. A client queries a volume-location database for this information.
- Volumes can migrate between servers to balance space and utilization. Old server has "forwarding" instructions and handles client updates during migration.
- Read-only volumes ( system files, etc. ) can be replicated. The volume database knows how to find these.
Operations and Consistency
- AFS clients caches entire files form servers
- A client workstation interacts with Vice servers only during opening and closing of files
- Venus – caches files from Vice when they are opened, and stores modified copies of files back when they are closed
- Reading and writing bytes of a file are done by the kernel without Venus intervention on the cached copy
AFS Features - Brief Summary
Reference: Andrew File System : from Wikipedia
- AFS uses Kerberos for authentication (requiring the user to obtain the ticket for a given cell), and implements access control lists on directories for users and groups.
- Each client caches files on the local filesystem for increased speed on subsequent requests for the same file.
- Read and write operations on an open file are directed only to the locally cached copy. When a modified file is closed, the changed portions are copied back to the file server.
- Cache consistency is maintained by a mechanism called callback. When a file is cached the server makes a note of this and promises to inform the client if the file is updated by someone else.
- A significant feature of AFS is the volume (a tree of files, sub-directories and AFS mountpoints). Volumes are created by administrators and linked at a specific named path in an AFS cell. Once created, users of the filesystem may create directories and files as usual without concern for the physical location of the volume. A volume may have a quota assigned to it in order to limit the amount of space consumed. As needed, AFS administrators can move that volume to another server without the need to notify users.
- AFS volumes can be replicated to read-only cloned copies. When accessing files in a read-only volume, a client system will retrieve data from a particular read-only copy. If at some point that copy becomes unavailable, clients will look for any of the remaining copies. Again, users of that data are unaware of the location of the read-only copy; administrators can create and relocate such copies as needed. The AFS command suite guarantees that all read-only volumes contain exact copies of the original read-write volume at the time the read-only copy was created.
- The file name space on an Andrew workstation is partitioned into a shared and local name space. The shared name space (usually mounted as /afs on the Unix filesystem) is identical on all workstations. The local name space is unique to each workstation. It only contains temporary files needed for workstation initialization and symbolic links to files in the shared name space.
Scalability of AFS
Why whole-file serving and caching is more suitable for large distributed file systems?
- Most files in a client computer can be classified into two kinds. The first kind is shared files that are infrequently updated. They are usually some special UNIX commands and libraries. Another kind is files that are normally accessed by one single user. Usually they are the files stored in the user’s home directory and its subdirectories. Both kinds of file can be easily handled by using locally cached copies in the client’s cache because, in the first situation, they are seldom updated and, in the second, the client cache is sufficient for them to update their own files in the client computer.
- A modern workstation can allocate quite a large portion of the disk space for the local cache (client cache) such as 100 MB. That is sufficient for a single user to put all his or her files in the local cache. Thus, it is not necessary to worry about the cache space for users in the client computer. Therefore, our design can assume that all files stored in the cache will not be removed because of insufficient cache space and whole-file caching will not cause any problems with the cache space.
- Some researchers (Satyanarayanan, Ousterhout et al., and Floyd) gave very important observations about the usage of UNIX file systems:
- Files are small; most of them are less than 10 KB.
- Read operations on files are much more common than write operations (the ratio is around 6, i.e. R/W = 6:1).
- Sequential access is common, and random access is rare.
- Most files are read and written by one user. If a file is shared, one user makes most of the modifications.
- Files are referenced in bursts. That means if a file has been referenced recently, it is very possible that it will be referenced again in the near future.
Based on the above observations, you can see that whole-file serving and caching is more suitable for large distributed file systems.
More on Network File System and Andrew File System
Network File System (NFS) is a network file system protocol originally jointly developed by Sun Microsystems and IBM in 1984, allowing a user on a client computer to access files over a network as easily as if the network devices were attached to its local disks. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The Network File System protocol is specified in RFC 1094, RFC 1813, and RFC 3530.
NFS為 Network FileSystem 的簡稱,最早之前是由 Sun 這家公司所發展出來的,他的目的就是想讓不同的機器、 不同的作業系統可以彼此分享個別的檔案啦!目前在 Unix Like 當中用來做為 file server 是相當不錯的一個方案喔!基本上, Unix Like 主機連接到另一部 Unix Like 主機來分享彼此的檔案時,使用 NFS 要比 SAMBA 這個伺服器快速且方便的多了!此外, NFS 的設定真的很簡單,幾乎只要記得啟動 Remote Procedure Call 這個咚咚 (RPC, 就是 portmap 這個套件啦!) 就一定可以架設的起來!真是不錯啊!不過,如果要達成 Windows 與 Linux 之間的溝通,那麼還是以 SAMBA 比較容易啊!無論如何, NFS 還是可以做為小公司或學校單位內部 Unix Like 機器共享 file 的一個 Server 喔!
AFS is a distributed filesystem product, pioneered at Carnegie Mellon University and supported and developed as a product by Transarc Corporation (now IBM Pittsburgh Labs). It offers a client-server architecture for federated file sharing and replicated read-only content distribution, providing location independence, scalability, security, and transparent migration capabilities. AFS is available for a broad range of heterogeneous systems including UNIX, Linux, MacOS X, and Microsoft Windows
This article describes how to install OpenAFS with MIT Kerberos 5 on Ubuntu 8.04 for UC Berkeley assuming berkeley.edu as the default AFS cell.
This chapter introduces basic AFS concepts and terms. It assumes that you are already familiar with standard UNIX commands, file protection, and pathname conventions.
Thanks for Reading
If you would rather like to have this lecture note in printed format, please click the print action link in the top right corner.
If you find any problem in this lecture note, please feel free to reach Steven by steven@findaway.hk