Gionn.net

scorp@sony-stark:~$

Zram on Debian/Ubuntu for Memory Overcommitment

In recent Linux releases, it’s available a tiny module called zram, that permits us to create RAM based block devices (named /dev/zramX), which will be kept in memory as compressed data. These ram-based block devices allow very fast I/O, and compression provides a reasonable amounts of memory saving.

We can use it as a drop-in replacement for the well-known tmpfs (used for speeding up compilation tasks or for /tmp), or better as a primary swap device, that will lead to virtually increase memory capacity, at the expense of a slightly increased CPU usage to compress/decompress the swapped data.

Nowadays RAM is very cheap, so why bother with compression? Because there are some situations where you can’t upgrade memory (netbooks) or you want to over-commit real resources (virtualization hosts).

For Ubuntu Precise and later:

Starting with Ubuntu Precise, there is an official upstart script for Ubuntu by Adam Conrad to configure zram in the main repository:

1
sudo apt-get install zram-config

For other distributions or older Ubuntu:

Googlin’ around to find a nice way to configure zram devices as swap, I found a very nice upstart script that will create a bunch of ramz devices depending on the number of CPU cores available, with a total size of the available memory: https://raw.github.com/gionn/etc/master/init.d/zram

Copy the script to the init.d folder, mark it as executable and enable autostart on boot:

1
2
3
sudo wget https://raw.github.com/gionn/etc/master/init.d/zram -O /etc/init.d/zram
sudo chmod +x /etc/init.d/zram
sudo update-rc.d zram defaults

Try it manually executing it for the first time with:

1
/etc/init.d/zram start

Depending on the kernel version you are running, you may need to adjust the module parameter name to num_devices on line 26 to:

1
modprobe zram num_devices=$num_cpus

or keep as is for newer kernels:

1
modprobe zram zram_num_devices=$num_cpus

Checking if it’s working

If everything went smooth, you will find a few notices on dmesg:

1
2
3
4
5
6
zram: module is from the staging directory, the quality is unknown, you have been warned.
zram: Creating 4 devices ...
Adding 1497864k swap on /dev/zram0.  Priority:100 extents:1 across:1497864k SS
Adding 1497864k swap on /dev/zram1.  Priority:100 extents:1 across:1497864k SS
Adding 1497864k swap on /dev/zram2.  Priority:100 extents:1 across:1497864k SS
Adding 1497864k swap on /dev/zram3.  Priority:100 extents:1 across:1497864k SS

meaning that the zram device have been created and enabled as swap devices with highest priority.

You can discover the increased swap space available with free -m:

1
2
3
4
         total       used       free     shared    buffers     cached
Mem:          5851       5696        154          0         85       4310
-/+ buffers/cache:       1300       4550
Swap:         5851          0       5850

Happy zramming!

Read Reddit comments

Open@BNCF - LinuxDay 2011 a Pisa

This year I’ve joined the LinuxDay of Pisa with a talk about my recent works in LiberSoft to the BNCF.

These includes: * Desktop migration to Ubuntu Linux (10.04 LTS) with centralized login management (OpenLDAP) and shared /home with MooseFS; * OpenNebula KVM-based Cloud with MooseFS as backend storage (and publishing of the relative transfer manager: https://github.com/libersoft/opennebula-tm-moosefs); * GlusterFS-based infrastructure for the national italian legal deposit of books (italian article: http://www.bncf.firenze.sbn.it/pagina.php?id=212&rigamenu=Magazzini%20Digitali)

ZFS + GlusterFS on Linux

Time is almost ripe for start using the native ZFS port on Linux (http://zfsonlinux.org/), and to increase the performances, reliability and space usage of our affordable distributed opensource storage solution.

Installing ZFS on Debian/Ubuntu is straightforward: you need first to build the SPL (Solaris Porting Layer) and after ZFS itself.

Download the latest SPL package, unpack it and build:

1
2
3
4
sudo apt-get install build-essential gawk alien fakeroot linux-headers-$(uname -r)
./configure
make deb
dpkg -i *.deb

Download the latest ZFS package, unpack it and build:

1
2
3
4
sudo apt-get install zlib1g-dev uuid-dev libblkid-dev libselinux-dev parted lsscsi
./configure
make deb
dpkg -i *.deb

ZFS is now ready to be used.

Let’s create our first pool with:

1
zpool create tank raidz [devices]

Devices can be partitions, UUIDs or entire disk. Is often a very good practice to use an entire disk using the disks id as found on /dev/disk/by-id/* (it should be advisable to not mix up existing drives with one of an existing volume).

1
2
sudo zpool status tank
df -h

Check pool status. :)

Now, the interesting features:

1
2
zfs set compression=on tank
zfs set dedup=on tank

Et voilĂ , your space usage is highly optimized compressing and deduplicating data.

Now it’s the GlusterFS turn, download the latest version and install it:

1
2
3
dpkg -i glusterfs_3.2.2-1_amd64.deb
update-rc.d glusterd defaults
/etc/init.d/glusterd start

Only on one node do the peer probing for every peer:

1
gluster peer probe ip.add.re.ss

Only on one node, create and start the volume:

1
2
gluster volume create gtank replica 2 transport tcp ip.add.re.ss1:/tank ip.add.re.ss2:/tank
gluster volume start gtank

On every node, mount it with:

1
mount -t glusterfs localhost:/gtank /gtank

Now some live example:

1
2
3
4
5
root@debz-1:/gtank# df -h
File system           Dim. Usati Disp. Uso% Montato su
[..]
tank                   10G  1,8G  8,2G  18% /tank
localhost:/gtank       10G  1,8G  8,2G  18% /gtank

GlusterFS replica 2 on 2 servers, anything unexpected here.

1
2
3
4
5
6
7
8
9
10
11
12
13
root@debz-1:/gtank# ls -lh
totale 1,8G
-rw-r--r-- 1 scorp scorp  253K  1 gen  1980 Fattura Garanzia redcoon dns-323.pdf
-rw-r--r-- 1 scorp scorp   68K  1 gen  1980 fattura scontrino eeepc 900a.pdf
-rw-r--r-- 1 scorp scorp   55K  1 gen  1980 Fattura Scontrino WD HD Caviar Green Videofantasy.pdf
-rw-r--r-- 1 scorp scorp  681M 23 ago 19.29 lubuntu-11.04-desktop-i386.iso
-rw-r--r-- 1 scorp scorp 1021K  1 gen  1980 Scontrino xbox 360.pdf
-rw-r--r-- 1 root  root    98M 27 ago 12.00 test2_random.dat
-rw-r--r-- 1 root  root    98M 27 ago 11.59 test_random_copy.dat
-rw-r--r-- 1 root  root    98M 27 ago 11.55 test_random.dat
-rw-r--r-- 1 root  root   293M 27 ago 12.06 test_random-with-zero-hole.dat
-rw-r--r-- 1 root  root    98M 27 ago 11.53 test_zero.dat
-rw-r--r-- 1 root  root   674M 18 ago 16.43 ubuntu-11.04-server-amd64.iso

Here I’ve copied some PDFs, 2 ISO, and some dd generated file: test_zero.dat is a dd if=/dev/zero test_random.dat and test2_random.dat are 2 different iteration with dd if=/dev/urandom test_random_copy.dat is a cp of test_random.dat test_random-with-zero-hole.dat is the result of cat test_random.dat + test_random.dat + test2_random.dat

The following are the real disk usage:

1
2
3
4
5
6
7
8
9
10
11
12
root@debz-1:/gtank# du -sh *
259K  Fattura Garanzia redcoon dns-323.pdf
69K   fattura scontrino eeepc 900a.pdf
55K   Fattura Scontrino WD HD Caviar Green Videofantasy.pdf
675M  lubuntu-11.04-desktop-i386.iso
1,1M  Scontrino xbox 360.pdf
98M   test2_random.dat
98M   test_random_copy.dat
98M   test_random.dat
196M  test_random-with-zero-hole.dat
512   test_zero.dat
667M  ubuntu-11.04-server-amd64.iso

As you can see, compression is doing its work with ISO and test_zero.dat, but isn’t effective with PDFs and random data (do you remember that if you zip an already zip file the total size will increase?).

And what about the dedup? You should check it with:

1
2
3
root@debz-1:/gtank# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  9,94G  1,60G  8,34G    16%  1.20x  ONLINE  -

so there are ~ 200 MB of available deduped space.

Other “batteries included” functions with ZFS are:

  • Decrease disk I/O bottleneck using fast SSDs as caches with L2ARC
  • Copy-on-write transactions: no need for fsck after hard reboot, data is always consistent on disk.
  • Online Repair: ZFS store a checksum for every data block, and can notify data alternation on avery access or during a scheduled online scrub operation

Let me know if you have some good usage tips to submit!

Damned Coincidence

SMART error (ErrorCount) detected on host: fs Device: /dev/sdb, ATA error count increased from 0 to 74

=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 Device Model: ST3500320AS [..] 9 Power_On_Hours 0x0032 070 070 000 Old_age Always – 26286

OH wait: 26286 / 24 / 365 = 3,00068493 yr

This disk broke exactly after 3 years. WTF?