It's the "installation birthday" of my home server on February 22nd:

/etc/cron.daily/installation-birthday:

                  0   0
                  |   |
              ____|___|____
           0  |~ ~ ~ ~ ~ ~|   0
           |  |           |   |
        ___|__|___________|___|__
        |/\/\/\/\/\/\/\/\/\/\/\/|
    0   |       H a p p y       |   0
    |   |/\/\/\/\/\/\/\/\/\/\/\/|   |
   _|___|_______________________|___|__
  |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
  |                                   |
  |         B i r t h d a y! ! !      |
  | ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ |
  |___________________________________|

Congratulations, your Debian system "marcos" was installed
8 year(s) ago today!

Best wishes,

Your local system administrator

I can't believe this machine I built 8 years ago has been running continuously all that time. That is far, far beyond the usual 3 or 5 year depreciation period set in most organizations. It goes to show how some hardware can be reliable in the long term.

I bought yet another new drive to deal with my ever-increasing disk use. I got a Seagate IronWolf 8TB ST8000VN0022 at Canada Computers (CC) for 290$CAD. I also bought a new enclosure as well, a transparent Orico enclosure which is kind of neat. I previously bought this thing instead, it was really hard to fit the hard drive in because the bottom was mis-aligned: you had to lift the drive slightly to fit it in the SATA connector. Even the salesman at CC couldn't figure it out. The new enclosure is a bit better, but also doesn't quite close correctly when a hard drive is present.

Compatibility and reliability

The first 8TB drive I got last week was DOA (no, not that DOA): it was "clicking" and wasn't detected by the kernel. CC took it back without questions, after they were able to plug it into something. I'm not sure that's a good sign for the reliability of that drive, but I have another running in a backup server and it has worked well so far.

I was happily surprised to see the new drive works with my old Asus P5G410-M motherboard. My previous attempt at connecting this huge drive into older equipment failed in a strange way: when connected in a Thermaltake USB-SATA dock, it would only be recognized as 4TB. I don't remember if I tried to connect it inside the server, but I do remember connecting it to curie instead which was kind of a mess. So I'm quite happy to see the drive works even on an old SATA controller, a testament to the backwards-compatibility requirements of the standard.

Setup

Of course, I used a GUID Partition Table GPT because MBR (Master Boot Record) partition tables are limited to 2TiB. I have learned about parted --align optimal to silence the warnings when creating the device:

parted /dev/sdc mklabel gpt
parted -a optimal /dev/sdc mkpart primary 0% 8MB
parted -a optimal /dev/sdc mkpart primary 8MB 100%

I have come to like to call parted without going into its shell. It's clean and easy to copy paste. It also makes me wonder why the Debian installer bothers with that complicated partition editor after all...

I have encrypted the drive using Debian stretch's LUKS default, but I have given special attention to the filesystem settings, given the drive is so big. Here's the commandline I ended using:

mkfs -t ext4 -j -T largefile -i 65536 -m 1 /dev/mapper/8tb_crypt

Here are the details of each bit:

ext4 - I still don't trust BTRFS enough, and I don't need the extra features
-j - journaling, probably default, but just in case
-T largefile - this is where things get interesting. the mkfs manpage says that -b -1 is supposed to tweak the block size according to the filesystem size, but mkfs refuses to parse this, so I had to use the -T setting. but it turns out that didn't change the block size anyways, which is still at the eternal 4KiB
-i 65536 ("64 KiB per inode" ratio) - the default mkfs setting would have allowed for around five hundred million (488 281 250) inodes on this disk. given that I have less than a million files to store on there so far, that seemed totally overkill, so I bumped it up.
-m - don't reserve as much space for root, as default (5%) would have reserved 400GB. 1% is still too big (80GB), but I can reclaim the space later with tune2fs -m 0.001 /dev/mapper/8tb_crypt. it gives me a good "heads up" before it's time to change the drive again. besides, it's not possible to pass lower, non-zero values to mkfs, strangely

See also backup for another disk configuration procedure.

Benchmarks

I performed a few benchmarks. It looks like the disk can easily saturate the SATA bus, which is limited to 150MB/s (1.5Gbit/s unencoded):

root@marcos:~# dd bs=1M count=512 conv=fdatasync if=/dev/zero of=/mnt/testfile
512+0 enregistrements lus
512+0 enregistrements écrits
536870912 bytes (537 MB, 512 MiB) copied, 3,4296 s, 157 MB/s
root@marcos:~# dd bs=1M count=512 if=/mnt/testfile of=/dev/null
512+0 enregistrements lus
512+0 enregistrements écrits
536870912 bytes (537 MB, 512 MiB) copied, 0,367484 s, 1,5 GB/s
root@marcos:~# hdparm -Tt /dev/sdc

/dev/sdc:
 Timing cached reads:   2514 MB in  2.00 seconds = 1257.62 MB/sec
 Timing buffered disk reads: 660 MB in  3.00 seconds = 219.98 MB/sec

A SMART test succeeded after 20 hours. Transferring the files over from the older disk took even longer: at 3.5TiB used, it's quite a lot of data and the older disk does not yield the same performance as the new one. rsync seems to show numbers between 40 and 50MB/s (or MiB/s?), which means the entire transfer takes more than a day to complete.

I have considered setting up the new drive as a degraded RAID-1 array to facilitate those transfers but it doesn't seem to be worth the trouble: this will yield warnings in a few place, adds some overhead (including scrubbing, for example) and might make me freak out for nothing in the future. This is a single drive, and will probably stay that way for the foreseeable future.

The sync is therefore made with good old rsync:

rsync -aAvP /srv/ /mnt/

Some more elaborate tests performed with fio also show that random read/write performance is somewhat poor (<1MB/s):

root@marcos:/srv# fio --name=stressant --group_reporting --directory=test --size=100M --readwrite=randrw --direct=1 --numjobs=4
stressant: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.16
Starting 4 processes
stressant: Laying out IO file(s) (1 file(s) / 100MB)
stressant: Laying out IO file(s) (1 file(s) / 100MB)
stressant: Laying out IO file(s) (1 file(s) / 100MB)
stressant: Laying out IO file(s) (1 file(s) / 100MB)
Jobs: 2 (f=2): [_(2),m(2)] [99.4% done] [1097KB/1305KB/0KB /s] [274/326/0 iops] [eta 00m:02s]
stressant: (groupid=0, jobs=4): err= 0: pid=10161: Mon Feb 25 12:51:21 2019
  read : io=205352KB, bw=586756B/s, iops=143, runt=358378msec
    clat (usec): min=145, max=367185, avg=23237.22, stdev=24300.33
     lat (usec): min=145, max=367186, avg=23238.42, stdev=24300.31
    clat percentiles (usec):
     |  1.00th=[  450],  5.00th=[ 3792], 10.00th=[ 6816], 20.00th=[ 9408],
     | 30.00th=[12608], 40.00th=[14912], 50.00th=[17280], 60.00th=[19328],
     | 70.00th=[22656], 80.00th=[27264], 90.00th=[46848], 95.00th=[69120],
     | 99.00th=[123392], 99.50th=[148480], 99.90th=[238592], 99.95th=[272384],
     | 99.99th=[329728]
  write: io=204248KB, bw=583601B/s, iops=142, runt=358378msec
    clat (usec): min=164, max=322970, avg=4646.01, stdev=10840.13
     lat (usec): min=165, max=322971, avg=4647.36, stdev=10840.16
    clat percentiles (usec):
     |  1.00th=[  195],  5.00th=[  227], 10.00th=[  251], 20.00th=[  310],
     | 30.00th=[  378], 40.00th=[  494], 50.00th=[  596], 60.00th=[ 2832],
     | 70.00th=[ 6176], 80.00th=[ 8896], 90.00th=[12480], 95.00th=[15552],
     | 99.00th=[22400], 99.50th=[33024], 99.90th=[199680], 99.95th=[234496],
     | 99.99th=[272384]
    lat (usec) : 250=4.86%, 500=16.18%, 750=7.01%, 1000=1.45%
    lat (msec) : 2=0.91%, 4=3.69%, 10=19.06%, 20=27.09%, 50=15.04%
    lat (msec) : 100=3.51%, 250=1.14%, 500=0.05%
  cpu          : usr=0.11%, sys=0.27%, ctx=103127, majf=0, minf=31
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=51338/w=51062/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=205352KB, aggrb=573KB/s, minb=573KB/s, maxb=573KB/s, mint=358378msec, maxt=358378msec
  WRITE: io=204248KB, aggrb=569KB/s, minb=569KB/s, maxb=569KB/s, mint=358378msec, maxt=358378msec

Disk stats (read/write):
    dm-6: ios=51862/51241, merge=0/0, ticks=1203452/250196, in_queue=1453720, util=100.00%, aggrios=51736/51295, aggrmerge=168/61, aggrticks=1196604/246444, aggrin_queue=1442968, aggrutil=100.00%
  sdb: ios=51736/51295, merge=168/61, ticks=1196604/246444, in_queue=1442968, util=100.00%

I am still, overall, quite happy with those results.

Comments on this page are closed.

Created 2019-02-25 12:59. Edited 2019-05-27 12:12.