Roy's notes: Difference between revisions

From MalinWiKi
Jump to navigation Jump to search
Line 73: Line 73:


After the LV has grown, run ''xfs_growfs'' (xfs) or ''resize2fs'' (ext4) to make use of the new data.
After the LV has grown, run ''xfs_growfs'' (xfs) or ''resize2fs'' (ext4) to make use of the new data.
=== Migration tools ===
At times, storage regimes change, new storage is added and sometimes it's not easy to migrate with the current hardware or its support systems. Once I had to migrate a 45TiB fileserver from one storage system to another, preferably without downtime. The server originally had three 15TiB PVs of which two were full and the third half full. I resorted to using [https://linux.die.net/man/8/pvmove pvmove] to just move the data on the PVs in use to a new PV. We started out with creating a new 50TiB PV, sde, and then to pvmove.
<nowiki>
# vgextend data /dev/sde
# pvmove /dev/sdb /dev/sde
</nowiki>
This took a while (as in a week or so) - the pvmove command uses old code and logic (or at least did that when I migrated this in November 2016), but it affected performance on the server very little, so users didn't notice. After the first PV was migrated, I continued with the other two, one after the other, and after a month or so, it was migrated. We used this on a number of large fileservers, and it worked flawlessly. Before doing the production servers I also tried interrupting pvmove in various ways, including hard resets, and it just kept on after the reboot.
'''''NOTE:''''' ''Even though it worked well for us, '''always''' keep a good backup before doing this. Things may go wrong, and without a good backup, you may be looking for a hard-to-find new job the next morning''.


=== Thin provisioned LVs ===
=== Thin provisioned LVs ===

Revision as of 18:43, 16 October 2017

Low level disk handling

"Unplug" drive from system

# echo 1 > /sys/bus/scsi/devices/H:B:T:L/delete

Fail injection with debugfs

https://lxadm.com/Using_fault_injection

LVM

LVM is Linux' Logical volume manager. It is designed to be an abstraction layer on top of physical drives or RAID, typically mdraid or fakeraid. Keep in mind that fakeraid should be avoided unless you really need it, like in conjunction with dual-booting linux and windows on fakeraid. LVM broadly consists of three elements, the "physical" devices (PV), the volume group (VG) and the logical volume (LV). There can be multiple PVs, VGs and LVs, depending on requirement. More about this below. All commands are given as examples, and all of them can be fine-tuned using extra flags in need of so. In my experience, the defaults work well for most usecases.

I'm mentioning filesystem below too. Where I write ext4, the samme applies to ext2 and ext3.

Create a PV

A PV is the "physical" parts. This does not need to be a physical disk, but can also be another RAID, be it an mdraid, fakeraid, hardware raid, a virtual disk on a SAN or otherwisee or a partition on one of those.

# pvcreate /dev/sdb # pvcreate /dev/md1 /dev/md2

These add three PVs, one on a drive (or hardware raid) and another two on mdraids. For more information about pvcreate, see the manual.

Create a VG

The volume group consists of one or more PVs grouped together on which LVs can be placed. If several PVs are grouped in a VG, it's generally a good idea to make sure these PVs have some sort of redundancy, as in mdraid/fakeraid or hwraid. Otherwise it will be like using a RAID-0 with a single point of failure on each of the independant drives. LVM has RAID code in it as well, so you can use that. I haven't done so myself, as I generally stick to mraid. The reason is mdraid is, in my opinion older and more stable and has more users (meaning bugs are reported and fixed faster whenever they are found). That said, I beleive the actual RAID code used in LVM RAID are the same function calls as for mdraid, so it may not be much of a difference. I still stick with mdraid. To create a VG, run

# vgcreate my_vg /dev/md1

Note that if vgcreate is run with a PV (as /dev/md1 above) that is not defined as a PV (like above), this is done implicitly, so if you don't need any special flags to pvcreate, you can simply skip it and let vgcreate do that for you.

Create an LV

LVs can be compared to partitions, somehow, since they are bounderies of a fraction or all of that of a VG. The difference between them and a partition, however, is that they can be grown or shrunk easily and also moved around between PVs without downtime. This flexibility makes them superiour to partitions as your system can be changed without users noticing it. By default, an LV is alloated "thickly", meaning all the data given to it, is allocated from the VG and thus the PV. The following makes a 100GB LV named "thicklv". When making an LV, I usually allocate what's needed plus some more, but not everything, just to make sure it's space available for growth on any of the LVs on the VG, or new LVs.

# lvcreate -n thicklv -L 100G my_vg

After the LV is created, a filesystem can be placed on it unless it is meant to be used directly. The application for direct use include swap space, VM storage and certain database systems. Most of these will, however, work on filesystems too, although my testing has shown that on swap space, there is a significant performance gain for using dedicated storage without a filesystem. As for filesystems, most Linux users use either ext4 or xfs. Personally, I generallly use XFS these days. The only thing left that can't be done on XFS is shrinking a filesystem.

# mkfs -t xfs /dev/my_vg/thicklv

Then just edit /etc/fstab with correct data and run mount -a, and you should be all set.

Growing devices

LVM objects can be grown and shrunk. If a PV resides on a RAID where a new drive has been added or otherwise grown, or on a partition or virtual disk that has been extended, the PV must be updated to reflect these changes. The following command will grow the PV to the maximum available on the underlying storage.

# pvresize /dev/md1

If a new PV is added, the VG can be grown to add the space on that in addition to what's there already.

# vgextend my_vg /dev/md2

With more space available in the VG, the LV can now be extended. Let's add another 50GB to it.

# lvresize -L +50G my_vg/thicklv

After the LV has grown, run xfs_growfs (xfs) or resize2fs (ext4) to make use of the new data.

Migration tools

At times, storage regimes change, new storage is added and sometimes it's not easy to migrate with the current hardware or its support systems. Once I had to migrate a 45TiB fileserver from one storage system to another, preferably without downtime. The server originally had three 15TiB PVs of which two were full and the third half full. I resorted to using pvmove to just move the data on the PVs in use to a new PV. We started out with creating a new 50TiB PV, sde, and then to pvmove.

# vgextend data /dev/sde # pvmove /dev/sdb /dev/sde

This took a while (as in a week or so) - the pvmove command uses old code and logic (or at least did that when I migrated this in November 2016), but it affected performance on the server very little, so users didn't notice. After the first PV was migrated, I continued with the other two, one after the other, and after a month or so, it was migrated. We used this on a number of large fileservers, and it worked flawlessly. Before doing the production servers I also tried interrupting pvmove in various ways, including hard resets, and it just kept on after the reboot.

NOTE: Even though it worked well for us, always keep a good backup before doing this. Things may go wrong, and without a good backup, you may be looking for a hard-to-find new job the next morning.

Thin provisioned LVs

Thin provisioning on LVM is a method used for not allocating all the space given to an LV. For instance, if you have a 1TB VG and want to give an LV 500GB, although it currently only uses a fraction of that, a thin lv can be a good alternative. This will allow for adding a limit to how much it can use, but also to let it grow dynamically without manual work by the sysadmin. Thin provisioning adds another layer of abstaction by creating a special LV as a thin pool from which data is allocated to the thin volume(s)

Create a thin pool

Allocate 1TB to the thin pool to be used for thinly provisioned LVs. Keep in mind that the space allocated to the thin pool is in fact thick provisioned. Only the volumes put on thinpool are thin provisioned. This will create two hidden LVs, one for data and one for metadata. Normally the defaults will do, but check the manual if the data doesn't match the usual patterns (such as billions of files, resulting in huge amounts of metadata). I beleive the metadata part should be resizable at a later time if needed, but I have not tested it.

# lvcreate --size 1T --type thin-pool --thinpool thinpool my_vg

Create a thin volume

You have the thinpool, now put a thin volume on it. It will allocate some space for metadata, but probably not much (a few megs, perhaps).

# lvcreate -V 500G -T mh_vg/thin_pool --name thinvol

The new volume's device name will be /dev/thinvol. Now, create a filesystem on it, add to fstab and mount it. The df command will return available space according to the virtual size (-V), while lvs will show how much data is actually used on each of the thinly provisioned volumes.