diff options
| author | Jan Lübbe <jluebbe@debian.org> | 2009-09-10 13:06:15 +0000 |
|---|---|---|
| committer | Jan Lübbe <jluebbe@debian.org> | 2009-09-18 08:04:44 +0000 |
| commit | b60e3a3e546091edaa9a85cada6b044c7ae89368 (patch) | |
| tree | 1694e764c252bcc7fa3b4ad8f77216214871e069 | |
| parent | 7c54aba8056c12678e6225418f889348c3e06b72 (diff) | |
linux_2.6.30: add and enable aufs2
| -rw-r--r-- | recipes/linux/linux-2.6.30/aufs2-30.patch | 26318 | ||||
| -rw-r--r-- | recipes/linux/linux-2.6.30/i686/defconfig | 14 | ||||
| -rw-r--r-- | recipes/linux/linux_2.6.30.bb | 3 |
3 files changed, 26333 insertions, 2 deletions
diff --git a/recipes/linux/linux-2.6.30/aufs2-30.patch b/recipes/linux/linux-2.6.30/aufs2-30.patch new file mode 100644 index 0000000000..2ab5e8eb49 --- /dev/null +++ b/recipes/linux/linux-2.6.30/aufs2-30.patch @@ -0,0 +1,26318 @@ +diff --git a/Documentation/ABI/testing/debugfs-aufs b/Documentation/ABI/testing/debugfs-aufs +new file mode 100644 +index 0000000..4110b94 +--- /dev/null ++++ b/Documentation/ABI/testing/debugfs-aufs +@@ -0,0 +1,40 @@ ++What: /debug/aufs/si_<id>/ ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ Under /debug/aufs, a directory named si_<id> is created ++ per aufs mount, where <id> is a unique id generated ++ internally. ++ ++What: /debug/aufs/si_<id>/xib ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ It shows the consumed blocks by xib (External Inode Number ++ Bitmap), its block size and file size. ++ When the aufs mount option 'noxino' is specified, it ++ will be empty. About XINO files, see ++ Documentation/filesystems/aufs/aufs.5 in detail. ++ ++What: /debug/aufs/si_<id>/xino0, xino1 ... xinoN ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ It shows the consumed blocks by xino (External Inode Number ++ Translation Table), its link count, block size and file ++ size. ++ When the aufs mount option 'noxino' is specified, it ++ will be empty. About XINO files, see ++ Documentation/filesystems/aufs/aufs.5 in detail. ++ ++What: /debug/aufs/si_<id>/xigen ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ It shows the consumed blocks by xigen (External Inode ++ Generation Table), its block size and file size. ++ If CONFIG_AUFS_EXPORT is disabled, this entry will not ++ be created. ++ When the aufs mount option 'noxino' is specified, it ++ will be empty. About XINO files, see ++ Documentation/filesystems/aufs/aufs.5 in detail. +diff --git a/Documentation/ABI/testing/sysfs-aufs b/Documentation/ABI/testing/sysfs-aufs +new file mode 100644 +index 0000000..ca49330 +--- /dev/null ++++ b/Documentation/ABI/testing/sysfs-aufs +@@ -0,0 +1,25 @@ ++What: /sys/fs/aufs/si_<id>/ ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ Under /sys/fs/aufs, a directory named si_<id> is created ++ per aufs mount, where <id> is a unique id generated ++ internally. ++ ++What: /sys/fs/aufs/si_<id>/br0, br1 ... brN ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ It shows the abolute path of a member directory (which ++ is called branch) in aufs, and its permission. ++ ++What: /sys/fs/aufs/si_<id>/xi_path ++Date: March 2009 ++Contact: J. R. Okajima <hooanon05@yahoo.co.jp> ++Description: ++ It shows the abolute path of XINO (External Inode Number ++ Bitmap, Translation Table and Generation Table) file ++ even if it is the default path. ++ When the aufs mount option 'noxino' is specified, it ++ will be empty. About XINO files, see ++ Documentation/filesystems/aufs/aufs.5 in detail. +diff --git a/Documentation/filesystems/aufs/README b/Documentation/filesystems/aufs/README +new file mode 100644 +index 0000000..83089b7 +--- /dev/null ++++ b/Documentation/filesystems/aufs/README +@@ -0,0 +1,342 @@ ++ ++Aufs2 -- advanced multi layered unification filesystem version 2 ++http://aufs.sf.net ++Junjiro R. Okajima ++ ++ ++0. Introduction ++---------------------------------------- ++In the early days, aufs was entirely re-designed and re-implemented ++Unionfs Version 1.x series. After many original ideas, approaches, ++improvements and implementations, it becomes totally different from ++Unionfs while keeping the basic features. ++Recently, Unionfs Version 2.x series begin taking some of the same ++approaches to aufs1's. ++Unionfs is being developed by Professor Erez Zadok at Stony Brook ++University and his team. ++ ++This version of AUFS, aufs2 has several purposes. ++- to be reviewed easily and widely. ++- to make the source files simpler and smaller by dropping several ++ original features. ++ ++Through this work, I found some bad things in aufs1 source code and ++fixed them. Some of the dropped features will be reverted in the future, ++but not all I'm afraid. ++Aufs2 supports linux-2.6.27 and later. If you want older kernel version ++support, try aufs1 from CVS on SourceForge. ++ ++Note: it becomes clear that "Aufs was rejected. Let's give it up." ++According to Christoph Hellwig, linux rejects all union-type filesystems ++but UnionMount. ++<http://marc.info/?l=linux-kernel&m=123938533724484&w=2> ++ ++ ++1. Features ++---------------------------------------- ++- unite several directories into a single virtual filesystem. The member ++ directory is called as a branch. ++- you can specify the permission flags to the branch, which are 'readonly', ++ 'readwrite' and 'whiteout-able.' ++- by upper writable branch, internal copyup and whiteout, files/dirs on ++ readonly branch are modifiable logically. ++- dynamic branch manipulation, add, del. ++- etc... ++ ++Also there are many enhancements in aufs1, such as: ++- readdir(3) in userspace. ++- keep inode number by external inode number table ++- keep the timestamps of file/dir in internal copyup operation ++- seekable directory, supporting NFS readdir. ++- support mmap(2) including /proc/PID/exe symlink, without page-copy ++- whiteout is hardlinked in order to reduce the consumption of inodes ++ on branch ++- do not copyup, nor create a whiteout when it is unnecessary ++- revert a single systemcall when an error occurs in aufs ++- remount interface instead of ioctl ++- maintain /etc/mtab by an external command, /sbin/mount.aufs. ++- loopback mounted filesystem as a branch ++- kernel thread for removing the dir who has a plenty of whiteouts ++- support copyup sparse file (a file which has a 'hole' in it) ++- default permission flags for branches ++- selectable permission flags for ro branch, whether whiteout can ++ exist or not ++- export via NFS. ++- support <sysfs>/fs/aufs and <debugfs>/aufs. ++- support multiple writable branches, some policies to select one ++ among multiple writable branches. ++- a new semantics for link(2) and rename(2) to support multiple ++ writable branches. ++- no glibc changes are required. ++- pseudo hardlink (hardlink over branches) ++- allow a direct access manually to a file on branch, e.g. bypassing aufs. ++ including NFS or remote filesystem branch. ++- and more... ++ ++Currently these features are dropped temporary from this version, aufs2. ++See design/08plan.txt in detail. ++- test only the highest one for the directory permission (dirperm1) ++- show whiteout mode (shwh) ++- copyup on open (coo=) ++- nested mount, i.e. aufs as readonly no-whiteout branch of another aufs ++ (robr) ++- statistics of aufs thread (/sys/fs/aufs/stat) ++- delegation mode (dlgt) ++ a delegation of the internal branch access to support task I/O ++ accounting, which also supports Linux Security Modules (LSM) mainly ++ for Suse AppArmor. ++- intent.open/create (file open in a single lookup) ++ ++Features or just an idea in the future (see also design/*.txt), ++- reorder the branch index without del/re-add. ++- permanent xino files for NFSD ++- an option for refreshing the opened files after add/del branches ++- 'move' policy for copy-up between two writable branches, after ++ checking free space. ++- O_DIRECT ++- light version, without branch manipulation. (unnecessary?) ++- copyup in userspace ++- inotify in userspace ++- readv/writev ++- xattr, acl ++ ++ ++2. Download ++---------------------------------------- ++Kindly one of aufs user, the Center for Scientific Computing and Free ++Software (C3SL), Federal University of Parana offered me a public GIT ++tree space. ++ ++There are three GIT trees, aufs2-2.6, aufs2-standalone and aufs2-util. ++While the aufs2-util is always necessary, you need either of aufs2-2.6 ++or aufs2-standalone. ++ ++The aufs2-2.6 tree includes the whole linux-2.6 GIT tree, ++git://git.kernel.org/.../torvalds/linux-2.6.git. ++And you cannot select CONFIG_AUFS_FS=m for this version, eg. you cannot ++build aufs2 as an externel kernel module. ++If you already have linux-2.6 GIT tree, you may want to pull and merge ++the "aufs2" branch from this tree. ++ ++On the other hand, the aufs2-standalone tree has only aufs2 source files ++and a necessary patch, and you can select CONFIG_AUFS_FS=m. In other ++words, the aufs2-standalone tree is generated from aufs2-2.6 tree by, ++- extract new files and modifications. ++- generate some patch files from modifications. ++- generate a ChangeLog file from git-log. ++- commit the files newly and no log messages. this is not git-pull. ++ ++Both of aufs2-2.6 and aufs2-standalone trees have a branch whose name is ++in form of "aufs2-xx" where "xx" represents the linux kernel version, ++"linux-2.6.xx". ++ ++o aufs2-2.6 tree ++$ git clone --reference /your/linux-2.6/git/tree \ ++ http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-2.6.git \ ++ aufs2-2.6.git ++- if you don't have linux-2.6 GIT tree, then remove "--reference ..." ++$ cd aufs2-2.6.git ++$ git checkout origin/aufs2-xx # for instance, aufs2-27 for linux-2.6.27 ++ # aufs2 (no -xx) for the latest -rc version. ++ ++o aufs2-standalone tree ++$ git clone http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-standalone.git \ ++ aufs2-standalone.git ++$ cd aufs2-standalone.git ++$ git checkout origin/aufs2-xx # for instance, aufs2-27 for linux-2.6.27 ++ # aufs2 (no -xx) for the latest -rc version. ++ ++o aufs2-util tree ++$ git clone http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-util.git \ ++ aufs2-util.git ++$ cd aufs2-util.git ++- no particular tag/branch currently. ++ ++o for advanced users ++$ git clone git://git.kernel.org/.../torvalds/linux-2.6.git linux-2.6.git ++ It will take very long time. ++ ++$ cd linux-2.6.git ++$ git remote add aufs2 http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-2.6.git ++$ git checkout -b aufs2-27 v2.6.27 ++$ git pull aufs2 aufs2-27 ++ It may take long time again. ++ Once pulling completes, you've got linux-2.6.27 and aufs2 for it in a ++ branch named aufs2-27, and you can configure and build it. ++ ++Or ++ ++$ git checkout -t -b aufs2 master ++$ git pull aufs2 aufs2 ++ then you've got the latest linux kernel and the latest aufs2 in a ++ branch named aufs2, and you can configure and build it. ++ But aufs is released once a week, so you may meet a compilation error ++ due to mismatching between the mainline and aufs2. ++ ++Or you may want build linux-2.6.xx.yy instead of linux-2.6.xx, then here ++is an approach using linux-2.6-stable GIT tree. ++ ++$ cd linux-2.6.git/.. ++$ git clone -q --reference ./linux-2.6.git git://git.kernel.org/.../linux-2.6-stable.git \ ++ linux-2.6-stable.git ++ It will take very long time. ++ ++$ cd linux-2.6-stable.git ++$ git remote add aufs2 http://git.c3sl.ufpr.br/pub/scm/aufs/aufs2-2.6.git ++$ git checkout -b aufs2-27.1 v2.6.27.1 ++$ git pull aufs2 aufs2-27 ++ then you've got linux-2.6.27.1 and aufs2 for 2.6.27 in a branch named ++ aufs2-27.1, and you can configure and build it. ++ But the changes made by v2.6.xx.yy may conflict with aufs2-xx, since ++ aufs2-xx is for v2.6.xx only. In this case, you may find some patchces ++ for v2.6.xx.yy in aufs2-standalone.git#aufs2-xx branch if someone else ++ have ever requested me to support v2.6.xx.yy and I did it. ++ ++You can also check what was changed by pulling aufs2. ++$ git diff v2.6.27.1..aufs2-27.1 ++ ++If you want to check the changed files other than fs/aufs, then try this. ++$ git diff v2.6.27.1..aufs2-27.1 | ++> awk ' ++> /^diff / {new=1} ++> /^diff.*aufs/ {new=0} ++> new {print} ++> ' ++ ++ ++3. Configuration and Compilation ++---------------------------------------- ++For aufs2-2.6 tree, ++- enable CONFIG_EXPERIMENTAL and CONFIG_AUFS_FS. ++- set other aufs configurations if necessary. ++ ++For aufs2-standalone tree, ++There are several ways to build. ++ ++You may feel why aufs2-standalone.patch needs to export so many kernel ++symbols. Because you selected aufs2-standalone tree instead of aufs2-2.6 ++tree. The number of necessary symbols to export essentially is zero. ++All other symbols are for the external module. ++If you don't like aufs2-standalone.patch, then try aufs2-2.6 tree. ++ ++1. ++- apply ./aufs2-kbuild.patch to your kernel source files. ++- apply ./aufs2-base.patch too. ++- apply ./aufs2-standalone.patch too, if you have a plan to set ++ CONFIG_AUFS_FS=m. otherwise you don't need ./aufs2-standalone.patch. ++- copy ./{Documentation,fs,include} files to your kernel source tree. ++- enable CONFIG_EXPERIMENTAL and CONFIG_AUFS_FS, you can select either ++ =m or =y. ++- and build your kernel as usual. ++- install it and reboot your system. ++ ++2. ++- module only (CONFIG_AUFS_FS=m). ++- apply ./aufs2-base.patch to your kernel source files. ++- apply ./aufs2-standalone.patch too. ++- build your kernel and reboot. ++- edit ./config.mk and set other aufs configurations if necessary. ++ Note: You should read ./fs/aufs/Kconfig carefully which describes ++ every aufs configurations. ++- build the module by simple "make". ++- you can specify ${KDIR} make variable which points to your kernel ++ source tree. ++- copy the build ./aufs.ko to /lib/modules/..., and run depmod -a (or ++ reboot simply). ++- no need to apply aufs2-kbuild.patch, nor copying source files to your ++ kernel source tree. ++ ++And then, ++- read README in aufs2-util, build and install it ++- if you want to use readdir(3) in userspace, then run ++ "make install_ulib" too. And refer to the aufs manual in detail. ++ ++ ++4. Usage ++---------------------------------------- ++At first, make sure aufs2-util are installed, and please read the aufs ++manual, aufs.5 in aufs2-util.git tree. ++$ man -l aufs.5 ++ ++And then, ++$ mkdir /tmp/rw /tmp/aufs ++# mount -t aufs -o br=/tmp/rw:${HOME} none /tmp/aufs ++ ++Here is another example. The result is equivalent. ++# mount -t aufs -o br=/tmp/rw=rw:${HOME}=ro none /tmp/aufs ++ Or ++# mount -t aufs -o br:/tmp/rw none /tmp/aufs ++# mount -o remount,append:${HOME} /tmp/aufs ++ ++Then, you can see whole tree of your home dir through /tmp/aufs. If ++you modify a file under /tmp/aufs, the one on your home directory is ++not affected, instead the same named file will be newly created under ++/tmp/rw. And all of your modification to a file will be applied to ++the one under /tmp/rw. This is called the file based Copy on Write ++(COW) method. ++Aufs mount options are described in aufs.5. ++ ++Additionally, there are some sample usages of aufs which are a ++diskless system with network booting, and LiveCD over NFS. ++See sample dir in CVS tree on SourceForge. ++ ++ ++5. Contact ++---------------------------------------- ++When you have any problems or strange behaviour in aufs, please let me ++know with: ++- /proc/mounts (instead of the output of mount(8)) ++- /sys/module/aufs/* ++- /sys/fs/aufs/* (if you have them) ++- /debug/aufs/* (if you have them) ++- linux kernel version ++ if your kernel is not plain, for example modified by distributor, ++ the url where i can download its source is necessary too. ++- aufs version which was printed at loading the module or booting the ++ system, instead of the date you downloaded. ++- configuration (define/undefine CONFIG_AUFS_xxx) ++- kernel configuration or /proc/config.gz (if you have it) ++- behaviour which you think to be incorrect ++- actual operation, reproducible one is better ++- mailto: aufs-users at lists.sourceforge.net ++ ++Usually, I don't watch the Public Areas(Bugs, Support Requests, Patches, ++and Feature Requests) on SourceForge. Please join and write to ++aufs-users ML. ++ ++ ++6. Acknowledgements ++---------------------------------------- ++Thanks to everyone who have tried and are using aufs, whoever ++have reported a bug or any feedback. ++ ++Especially donors: ++Tomas Matejicek(slax.org) made a donation (much more than once). ++Dai Itasaka made a donation (2007/8). ++Chuck Smith made a donation (2008/4, 10 and 12). ++Henk Schoneveld made a donation (2008/9). ++Chih-Wei Huang, ASUS, CTC donated Eee PC 4G (2008/10). ++Francois Dupoux made a donation (2008/11). ++Bruno Cesar Ribas and Luis Carlos Erpen de Bona, C3SL serves public ++aufs2 GIT tree (2009/2). ++William Grant made a donation (2009/3). ++Patrick Lane made a donation (2009/4). ++The Mail Archive (mail-archive.com) made donations (2009/5). ++Nippy Networks (Ed Wildgoose) a donation (2009/7). ++ ++Thank you very much. ++Donations are always, including future donations, very important and ++helpful for me to keep on developing aufs. ++ ++ ++7. ++---------------------------------------- ++If you are an experienced user, no explanation is needed. Aufs is ++just a linux filesystem. ++ ++ ++Enjoy! ++ ++# Local variables: ; ++# mode: text; ++# End: ; +diff --git a/Documentation/filesystems/aufs/design/01intro.txt b/Documentation/filesystems/aufs/design/01intro.txt +new file mode 100644 +index 0000000..ac678c0 +--- /dev/null ++++ b/Documentation/filesystems/aufs/design/01intro.txt +@@ -0,0 +1,137 @@ ++ ++# Copyright (C) 2005-2009 Junjiro R. Okajima ++# ++# This program is free software; you can redistribute it and/or modify ++# it under the terms of the GNU General Public License as published by ++# the Free Software Foundation; either version 2 of the License, or ++# (at your option) any later version. ++# ++# This program is distributed in the hope that it will be useful, ++# but WITHOUT ANY WARRANTY; without even the implied warranty of ++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ++# GNU General Public License for more details. ++# ++# You should have received a copy of the GNU General Public License ++# along with this program; if not, write to the Free Software ++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA ++ ++Introduction ++---------------------------------------- ++ ++aufs [ei ju: ef es] | [a u f s] ++1. abbrev. for "advanced multi-layered unification filesystem". ++2. abbrev. for "another unionfs". ++3. abbrev. for "auf das" in German which means "on the" in English. ++ Ex. "Butter aufs Brot"(G) means "butter onto bread"(E). ++ But "Filesystem aufs Filesystem" is hard to understand. ++ ++AUFS is a filesystem with features: ++- multi layered stackable unification filesystem, the member directory ++ is called as a branch. ++- branch permission and attribute, 'readonly', 'real-readonly', ++ 'readwrite', 'whiteout-able', 'link-able whiteout' and their ++ combination. ++- internal "file copy-on-write". ++- logical deletion, whiteout. ++- dynamic branch manipulation, adding, deleting and changing permission. ++- allow bypassing aufs, user's direct branch access. ++- external inode number translation table and bitmap which maintains the ++ persistent aufs inode number. ++- seekable directory, including NFS readdir. ++- file mapping, mmap and sharing pages. ++- pseudo-link, hardlink over branches. ++- loopback mounted filesystem as a branch. ++- several policies to select one among multiple writable branches. ++- revert a single systemcall when an error occurs in aufs. ++- and more... ++ ++ ++Multi Layered Stackable Unification Filesystem ++---------------------------------------------------------------------- ++Most people already knows what it is. ++It is a filesystem which unifies several directories and provides a ++merged single directory. When users access a file, the access will be ++passed/re-directed/converted (sorry, I am not sure which English word is ++correct) to the real file on the member filesystem. The member ++filesystem is called 'lower filesystem' or 'branch' and has a mode ++'readonly' and 'readwrite.' And the deletion for a file on the lower ++readonly branch is handled by creating 'whiteout' on the upper writable ++branch. ++ ++On LKML, there have been discussions about UnionMount (Jan Blunck and ++Bharata B Rao) and Unionfs (Erez Zadok). They took different approaches ++to implement the merged-view. ++The former tries putting it into VFS, and the latter implements as a ++separate filesystem. ++(If I misunderstand about these implementations, please let me know and ++I shall correct it. Because it is a long time ago when I read their ++source files last time). ++UnionMount's approach will be able to small, but may be hard to share ++branches between several UnionMount since the whiteout in it is ++implemented in the inode on branch filesystem and always ++shared. According to Bharata's post, readdir does not seems to be ++finished yet. ++Unionfs has a longer history. When I started implementing a stacking filesystem ++(Aug 2005), it already existed. It has virtual super_block, inode, ++dentry and file objects and they have an array pointing lower same kind ++objects. After contributing many patches for Unionfs, I re-started my ++project AUFS (Jun 2006). ++ ++In AUFS, the structure of filesystem resembles to Unionfs, but I ++implemented my own ideas, approaches and enhancements and it became ++totally different one. ++ ++ ++Several characters/aspects of aufs ++---------------------------------------------------------------------- ++ ++Aufs has several characters or aspects. ++1. a filesystem, callee of VFS helper ++2. sub-VFS, caller of VFS helper for branches ++3. a virtual filesystem which maintains persistent inode number ++4. reader/writer of files on branches such like an application ++ ++1. Caller of VFS Helper ++As an ordinary linux filesystem, aufs is a callee of VFS. For instance, ++unlink(2) from an application reaches sys_unlink() kernel function and ++then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it ++calls filesystem specific unlink operation. Actually aufs implements the ++unlink operation but it behaves like a redirector. ++ ++2. Caller of VFS Helper for Branches ++aufs_unlink() passes the unlink request to the branch filesystem as if ++it were called from VFS. So the called unlink operation of the branch ++filesystem acts as usual. As a caller of VFS helper, aufs should handle ++every necessary pre/post operation for the branch filesystem. ++- acquire the lock for the parent dir on a branch ++- lookup in a branch ++- revalidate dentry on a branch ++- mnt_want_write() for a branch ++- vfs_unlink() for a branch ++- mnt_drop_write() for a branch ++- release the lock on a branch ++ ++3. Persistent Inode Number ++One of the most important issue for a filesystem is to maintain inode ++numbers. This is particularly important to support exporting a ++filesystem via NFS. Aufs is a virtual filesystem which doesn't have a ++backend block device for its own. But some storage is necessary to ++maintain inode number. It may be a large space and may not suit to keep ++in memory. Aufs rents some space from its first writable branch ++filesystem (by default) and creates file(s) on it. These files are ++created by aufs internally and removed soon (currently) keeping opened. ++Note: Because these files are removed, they are totally gone after ++ unmounting aufs. It means the inode numbers are not persistent ++ across unmount or reboot. I have a plan to make them really ++ persistent which will be important for aufs on NFS server. ++ ++4. Read/Write Files Internally (copy-on-write) ++Because a branch can be readonly, when you write a file on it, aufs will ++"copy-up" it to the upper writable branch internally. And then write the ++originally requested thing to the file. Generally kernel doesn't ++open/read/write file actively. In aufs, even a single write may cause a ++internal "file copy". This behaviour is very similar to cp(1) command. ++ ++Some people may think it is better to pass such work to user space ++helper, instead of doing in kernel space. Actually I am still thinking ++about it. But currently I have implemented it in kernel space. +diff --git a/Documentation/filesystems/aufs/design/02struct.txt b/Documentation/filesystems/aufs/design/02struct.txt +new file mode 100644 +index 0000000..11cee07 +--- /dev/null ++++ b/Documentation/filesystems/aufs/design/02struct.txt +@@ -0,0 +1,218 @@ ++ ++# Copyright (C) 2005-2009 Junjiro R. Okajima ++# ++# This program is free software; you can redistribute it and/or modify ++# it under the terms of the GNU General Public License as published by ++# the Free Software Foundation; either version 2 of the License, or ++# (at your option) any later version. ++# ++# This program is distributed in the hope that it will be useful, ++# but WITHOUT ANY WARRANTY; without even the implied warranty of ++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ++# GNU General Public License for more details. ++# ++# You should have received a copy of the GNU General Public License ++# along with this program; if not, write to the Free Software ++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA ++ ++Basic Aufs Internal Structure ++ ++Superblock/Inode/Dentry/File Objects ++---------------------------------------------------------------------- ++As like an ordinary filesystem, aufs has its own ++superblock/inode/dentry/file objects. All these objects have a ++dynamically allocated array and store the same kind of pointers to the ++lower filesystem, branch. ++For example, when you build a union with one readwrite branch and one ++readonly, mounted /au, /rw and /ro respectively. ++- /au = /rw + /ro ++- /ro/fileA exists but /rw/fileA ++ ++Aufs lookup operation finds /ro/fileA and gets dentry for that. These ++pointers are stored in a aufs dentry. The array in aufs dentry will be, ++- [0] = NULL ++- [1] = /ro/fileA ++ ++This style of an array is essentially same to the aufs ++superblock/inode/dentry/file objects. ++ ++Because aufs supports manipulating branches, ie. add/delete/change ++dynamically, these objects has its own generation. When branches are ++changed, the generation in aufs superblock is incremented. And a ++generation in other object are compared when it is accessed. ++When a generation in other objects are obsoleted, aufs refreshes the ++internal array. ++ ++ ++Superblock ++---------------------------------------------------------------------- ++Additionally aufs superblock has some data for policies to select one ++among multiple writable branches, XIB files, pseudo-links and kobject. ++See below in detail. ++About the policies which supports copy-down a directory, see policy.txt ++too. ++ ++ ++Branch and XINO(External Inode Number Translation Table) ++---------------------------------------------------------------------- ++Every branch has its own xino (external inode number translation table) ++file. The xino file is created and unlinked by aufs internally. When two ++members of a union exist on the same filesystem, they share the single ++xino file. ++The struct of a xino file is simple, just a sequence of aufs inode ++numbers which is indexed by the lower inode number. ++In the above sample, assume the inode number of /ro/fileA is i111 and ++aufs assigns the inode number i999 for fileA. Then aufs writes 999 as ++4(8) bytes at 111 * 4(8) bytes offset in the xino file. ++ ++When the inode numbers are not contiguous, the xino file will be sparse ++which has a hole in it and doesn't consume as much disk space as it ++might appear. If your branch filesystem consumes disk space for such ++holes, then you should specify 'xino=' option at mounting aufs. ++ ++Also a writable branch has three kinds of "whiteout bases". All these ++are existed when the branch is joined to aufs and the names are ++whiteout-ed doubly, so that users will never see their names in aufs ++hierarchy. ++1. a regular file which will be linked to all whiteouts. ++2. a directory to store a pseudo-link. ++3. a directory to store an "orphan-ed" file temporary. ++ ++1. Whiteout Base ++ When you remove a file on a readonly branch, aufs handles it as a ++ logical deletion and creates a whiteout on the upper writable branch ++ as a hardlink of this file in order not to consume inode on the ++ writable branch. ++2. Pseudo-link Dir ++ See below, Pseudo-link. ++3. Step-Parent Dir ++ When "fileC" exists on the lower readonly branch only and it is ++ opened and removed with its parent dir, and then user writes ++ something into it, then aufs copies-up fileC to this ++ directory. Because there is no other dir to store fileC. After ++ creating a file under this dir, the file is unlinked. ++ ++Because aufs supports manipulating branches, ie. add/delete/change ++dynamically, a branch has its own id. When the branch order changes, aufs ++finds the new index by searching the branch id. ++ ++ ++Pseudo-link ++---------------------------------------------------------------------- ++Assume "fileA" exists on the lower readonly branch only and it is ++hardlinked to "fileB" on the branch. When you write something to fileA, ++aufs copies-up it to the upper writable branch. Additionally aufs ++creates a hardlink under the Pseudo-link Directory of the writable ++branch. The inode of a pseudo-link is kept in aufs super_block as a ++simple list. If fileB is read after unlinking fileA, aufs returns ++filedata from the pseudo-link instead of the lower readonly ++branch. Because the pseudo-link is based upon the inode, to keep the ++inode number by xino (see above) is important. ++ ++All the hardlinks under the Pseudo-link Directory of the writable branch ++should be restored in a proper location later. Aufs provides a utility ++to do this. The userspace helpers executed at remounting and unmounting ++aufs by default. ++ ++ ++XIB(external inode number bitmap) ++---------------------------------------------------------------------- ++Addition to the xino file per a branch, aufs has an external inode number ++bitmap in a superblock object. It is also a file such like a xino file. ++It is a simple bitmap to mark whether the aufs inode number is in-use or ++not. ++To reduce the file I/O, aufs prepares a single memory page to cache xib. ++ ++Aufs implements a feature to truncate/refresh both of xino and xib to ++reduce the number of consumed disk blocks for these files. ++ ++ ++Virtual or Vertical Dir ++---------------------------------------------------------------------- ++In order to support multiple layers (branches), aufs readdir operation ++constructs a virtual dir block on memory. For readdir, aufs calls ++vfs_readdir() internally for each dir on branches, merges their entries ++with eliminating the whiteout-ed ones, and sets it to file (dir) ++object. So the file object has its entry list until it is closed. The ++entry list will be updated when the file position is zero and becomes ++old. This decision is made in aufs automatically. ++ ++The dynamically allocated memory block for the name of entries has a ++unit of 512 bytes (by default) and stores the names contiguously (no ++padding). Another block for each entry is handled by kmem_cache too. ++During building dir blocks, aufs creates hash list and judging whether ++the entry is whiteouted by its upper branch or already listed. ++ ++Some people may call it can be a security hole or invite DoS attack ++since the opened and once readdir-ed dir (file object) holds its entry ++list and becomes a pressure for system memory. But I'd say it is similar ++to files under /proc or /sys. The virtual files in them also holds a ++memory page (generally) while they are opened. When an idea to reduce ++memory for them is introduced, it will be applied to aufs too. ++For those who really hate this situation, I've developed readdir(3) ++library which operates this merging in userspace. You just need to set ++LD_PRELOAD environment variable, and aufs will not consume no memory in ++kernel space for readdir(3). ++ ++ ++Workqueue ++---------------------------------------------------------------------- ++Aufs sometimes requires privilege access to a branch. For instance, ++in copy-up/down operation. When a user process is going to make changes ++to a file which exists in the lower readonly branch only, and the mode ++of one of ancestor directories may not be writable by a user ++process. Here aufs copy-up the file with its ancestors and they may ++require privilege to set its owner/group/mode/etc. ++This is a typical case of a application character of aufs (see ++Introduction). ++ ++Aufs uses workqueue synchronously for this case. It creates its own ++workqueue. The workqueue is a kernel thread and has privilege. Aufs ++passes the request to call mkdir or write (for example), and wait for ++its completion. This approach solves a problem of a signal handler ++simply. ++If aufs didn't adopt the workqueue and changed the privilege of the ++process, and if the mkdir/write call arises SIGXFSZ or other signal, ++then the user process might gain a privilege or the generated core file ++was owned by a superuser. But I have a plan to switch to a new ++credential approach which will be introduced in linux-2.6.29. ++ ++Also aufs uses the system global workqueue ("events" kernel thread) too ++for asynchronous tasks, such like handling inotify, re-creating a ++whiteout base and etc. This is unrelated to a privilege. ++Most of aufs operation tries acquiring a rw_semaphore for aufs ++superblock at the beginning, at the same time waits for the completion ++of all queued asynchronous tasks. ++ ++ ++Whiteout ++---------------------------------------------------------------------- ++The whiteout in aufs is very similar to Unionfs's. That is represented ++by its filename. UnionMount takes an approach of a file mode, but I am ++afraid several utilities (find(1) or something) will have to support it. ++ ++Basically the whiteout represents "logical deletion" which stops aufs to ++lookup further, but also it represents "dir is opaque" which also stop ++lookup. ++ ++In aufs, rmdir(2) and rename(2) for dir uses whiteout alternatively. ++In order to make several functions in a single systemcall to be ++revertible, aufs adopts an approach to rename a directory to a temporary ++unique whiteouted name. ++For example, in rename(2) dir where the target dir already existed, aufs ++renames the target dir to a temporary unique whiteouted name before the ++actual rename on a branch and then handles other actions (make it opaque, ++update the attributes, etc). If an error happens in these actions, aufs ++simply renames the whiteouted name back and returns an error. If all are ++succeeded, aufs registers a function to remove the whiteouted unique ++temporary name completely and asynchronously to the system global ++workqueue. ++ ++ ++Copy-up ++---------------------------------------------------------------------- ++It is a well-known feature or concept. ++When user modifies a file on a readonly branch, aufs operate "copy-up" ++internally and makes change to the new file on the upper writable branch. ++When the trigger systemcall does not update the timestamps of the parent ++dir, aufs reverts it after copy-up. +diff --git a/Documentation/filesystems/aufs/design/03lookup.txt b/Documentation/filesystems/aufs/design/03lookup.txt +new file mode 100644 +index 0000000..7510fdb +--- /dev/null ++++ b/Documentation/filesystems/aufs/design/03lookup.txt +@@ -0,0 +1,104 @@ ++ ++# Copyright (C) 2005-2009 Junjiro R. Okajima ++# ++# This program is free software; you can redistribute it and/or modify ++# it under the terms of the GNU General Public License as published by ++# the Free Software Foundation; either version 2 of the License, or ++# (at your option) any later version. ++# ++# This program is distributed in the hope that it will be useful, ++# but WITHOUT ANY WARRANTY; without even the implied warranty of ++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ++# GNU General Public License for more details. ++# ++# You should have received a copy of the GNU General Public License ++# along with this program; if not, write to the Free Software ++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA ++ ++Lookup in a Branch ++---------------------------------------------------------------------- ++Since aufs has a character of sub-VFS (see Introduction), it operates ++lookup for branches as VFS does. It may be a heavy work. Generally ++speaking struct nameidata is a bigger structure and includes many ++information. But almost all lookup operation in aufs is the simplest ++case, ie. lookup only an entry directly connected to its parent. Digging ++down the directory hierarchy is unnecessary. ++ ++VFS has a function lookup_one_len() for that use, but it is not usable ++for a branch filesystem which requires struct nameidata. So aufs ++implements a simple lookup wrapper function. When a branch filesystem ++allows NULL as nameidata, it calls lookup_one_len(). Otherwise it builds ++a simplest nameidata and calls lookup_hash(). ++Here aufs applies "a principle in NFSD", ie. if the filesystem supports ++NFS-export, then it has to support NULL as a nameidata parameter for ++->create(), ->lookup() and ->d_revalidate(). So the lookup wrapper in ++aufs tests if ->s_export_op in the branch is NULL or not. ++ ++When a branch is a remote filesystem, aufs trusts its ->d_revalidate(). ++For d_revalidate, aufs implements three levels of revalidate tests. See ++"Revalidate Dentry and UDBA" in detail. ++ ++ ++Loopback Mount ++---------------------------------------------------------------------- ++Basically aufs supports any type of filesystem and block device for a ++branch (actually there are some exceptions). But it is prohibited to add ++a loopback mounted one whose backend file exists in a filesystem which is ++already added to aufs. The reason is to protect aufs from a recursive ++lookup. If it was allowed, the aufs lookup operation might re-enter a ++lookup for the loopback mounted branch in the same context, and will ++cause a deadlock. ++ ++ ++Revalidate Dentry and UDBA (User's Direct Branch Access) ++---------------------------------------------------------------------- ++Generally VFS helpers re-validate a dentry as a part of lookup. ++0. digging down the directory hierarchy. ++1. lock the parent dir by its i_mutex. ++2. lookup the final (child) entry. ++3. revalidate it. ++4. call the actual operation (create, unlink, etc.) ++5. unlock the parent dir ++ ++If the filesystem implements its ->d_revalidate() (step 3), then it is ++called. Actually aufs implements it and checks the dentry on a branch is ++still valid. ++But it is not enough. Because aufs has to release the lock for the ++parent dir on a branch at the end of ->lookup() (step 2) and ++->d_revalidate() (step 3) while the i_mutex of the aufs dir is still ++held by VFS. ++If the file on a branch is changed directly, eg. bypassing aufs, after ++aufs released the lock, then the subsequent operation may cause ++something unpleasant result. ++ ++This situation is a result of VFS architecture, ->lookup() and |
