Reflinks for XFS are available in Fedora 27, so you no longer need to pull and compile xfsprogs
from git.
To leverage reflinks in XFS, you need to create a file system with the reflink=1
flag.
[root@starscream mnt]# mkfs.xfs -m reflink=1 filesystem
In my example I just created a file and mounted it on a loop device.
[root@starscream mnt]# mkfs.xfs -m reflink=1 test.img
Then I’ll mount it
[root@starscream mnt]# mount -o loop test.img /mnt
We can now create a file with some random information to copy.
[root@starscream mnt]# dd if=/dev/urandom of=test bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.594451 s, 176 MB/s
Df shows us 140M used:
/dev/loop0 1014M 140M 875M 14% /mnt
So let’s copy the file with reflinks enabled:
[root@starscream mnt]# cp -v --reflink=always test testfile
'test' -> 'testfile'
[root@starscream mnt]# ls -lsh
total 200M
100M -rw-r--r--. 1 root root 100M Mar 4 18:40 test
100M -rw-r--r--. 1 root root 100M Mar 4 18:43 testfile
So we can see both copies of the file are 100M but df shows the same amount of space used:
/dev/loop0 1014M 140M 875M 14% /mnt
So this is helpful for copying data, but what about existing data? For existing data we can use a tool like duperemove
. You can find it here.
With duperemove
we can do out of band deduplication. I’ll make two more normal copies of our test file:
[root@starscream mnt]# cp test test{2,3}
[root@starscream mnt]# ls -lsh
total 300M
100M -rw-r--r--. 1 root root 100M Mar 4 18:40 test
100M -rw-r--r--. 1 root root 100M Mar 4 18:47 test2
100M -rw-r--r--. 1 root root 100M Mar 4 18:47 test3
Df shows 340M used:
/dev/loop0 1014M 340M 675M 34% /mnt
So let’s run duperemove against the directory:
[root@starscream mnt]# duperemove -hdr --hashfile=/tmp/test.hash /mnt
Kernel processed data (excludes target files): 400.0M
Comparison of extent info shows a net change in shared extents of: 300.0M
[root@starscream mnt]# ls -lsh /mnt
total 300M
100M -rw-r--r--. 1 root root 100M Mar 4 18:40 test
100M -rw-r--r--. 1 root root 100M Mar 4 18:47 test2
100M -rw-r--r--. 1 root root 100M Mar 4 18:47 test3
And here’s our df output:
/dev/loop0 1014M 140M 875M 14% /mnt
We’re back to where we started.