Simple Integrated Multiplatform Backup & Archive or Simply Accessible Backup & Archive Performance: workstation 2.4 Ghz P4, 512 MB RAM, SATA disk over 100 Mbps Ethernet to server 2.4 Ghz P4, 2 GB RAM, IDE/SCSI disk array ~800000 files / 992387 links (lots of hard links) ~20 GB / 28330052 kB (multiple storage of hard links) time: 4h33m ~ 60 files / sec. ~ 1700 kB / sec cpu usage was negligible. bandwidth limited by network for large files. top files/sec was much higher than average (> 300). conclusion: bottleneck were seek times for DA. (hardly optimizable except maybe by sorting inode numbers) Same the next day: 3171 files transferred. 1h20min. ~200 files / sec. check filenames with non-ascii characters. Seems to work, except if there are non-utf-8 filenames on a utf fs (but that can't really work). check gid bits. Equality checking doesn't work if user is unknown on backup server: -r--r--r-- 1 4294967294 users 1449 2004-12-01 15:44 2006-11-27T23.22.42/yoyo.hjp.at/home/camel/wrk/perl-5.8.8/util.h -r--r--r-- 1 4294967294 users 1449 2004-12-01 15:44 2006-11-28T10.18.30/yoyo.hjp.at/home/camel/wrk/perl-5.8.8/util.h should be one file with two links, not two files. Tape performance: DDS4 (Vendor: HP Model: C5683A): About 5-6 MB/s for /dev/nst0, @ 64 kB Blocksize. (larger bs makes no difference). File was about 26 MB, 75% compressible with gzip. exit if disk full On my 800 MHz PIII, the CPU usage is rather high. Some profiling seems to be necessary (or I should get a faster backup server :-)). mkdir_p doesn't report the real reason of a failure: mkdir_p('/backup/2008-06-20T08.10.56/zeno.hjp.at/.', 777) mkdir_p('/backup/2008-06-20T08.10.56/zeno.hjp.at', 777) mkdir_p('/backup/2008-06-20T08.10.56', 777) failed: Read-only file system cannot mkdir /backup/2008-06-20T08.10.56/zeno.hjp.at/.: No such file or directory at /usr/local/share/perl/5.8.8/Simba/CA.pm line 180, line 1. The real reason is "Read-only file system" but after mkdir_p returns, $! is "No such file or directory". (and anyway Simba::CA::backup2disk shouldn't just die, but write a message to the log file first, but that's a different problem) Ideas: * Check if File::Path behaves better. * Die on error and let caller catch the error. MySQL after crash: -rw-rw---- 1 mysql mysql 10034184192 2010-06-07 09:14 instances.MYD drwxr-xr-x 8 mysql mysql 4096 2010-06-07 10:20 ../ -rw-rw---- 1 mysql mysql 297649152 2010-06-07 10:20 files.MYI -rw-rw---- 1 mysql mysql 619416576 2010-06-07 21:03 versions2.MYI -rw-rw---- 1 mysql mysql 6144 2010-06-15 21:00 sessions.MYI -rw-rw---- 1 mysql mysql 42952 2010-06-15 21:00 sessions.MYD -rw-rw---- 1 mysql mysql 20630449152 2010-06-16 10:21 instances.MYI mri:/var/lib/mysql/simba 10:21 :-) 108# df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/mri.wsr.ac.at-mysql 41284928 32332032 6856220 83% /var/lib/mysql mri:/var/lib/mysql/simba 10:21 :-) 109# psg backup root 19827 0.0 0.0 10100 1052 ? Ss Jun07 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 19407 0.0 0.0 10100 2136 ? Ss Jun08 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 3993 0.0 0.0 10100 2136 ? Ss Jun09 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 26324 0.0 0.0 10100 2552 ? Ss Jun10 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 16156 0.0 0.0 10100 2728 ? Ss Jun11 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 462 0.0 0.0 10100 2136 ? Ss Jun12 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 17584 0.0 0.0 10100 2748 ? Ss Jun13 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 1607 0.0 0.0 10100 2716 ? Ss Jun14 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 21566 0.0 0.0 10100 2692 ? Ss Jun15 0:00 | \_ /usr/bin/perl /usr/local/bin/backup root 18685 0.0 0.0 4352 1188 pts/5 S+ 10:22 0:00 \_ /bin/sh /usr/bin/psg backup Looks like it has been rebuilding that index for the last 9 days. That's clearly inacceptable. Bug: Duplicate detection doesn't seem to work sometimes. I have a lot of versions with checksum=null and a single instance although they really are hardlinked to older instances. Example (on mri): +---------+-----------+-----------+------------+------------+------------+---------------------+----------------+-----------+------------------------------------------+-----------------+ | id | file_type | file_size | file_mtime | file_owner | file_group | file_acl | file_unix_bits | file_rdev | checksum | file_linktarget | +---------+-----------+-----------+------------+------------+------------+---------------------+----------------+-----------+------------------------------------------+-----------------+ | 1220147 | f | 65559 | 1260229629 | hjp | betreuer | u::rw-,g::r--,o:r-- | | NULL | 41f445efd34cc11f3ec6eb924a5884a7fee0cf15 | NULL | | 2492389 | f | 65559 | 1260229629 | hjp | betreuer | u::rw-,g::r--,o:r-- | | NULL | NULL | NULL | | 2492394 | f | 65559 | 1260229629 | hjp | betreuer | u::rw-,g::r--,o:r-- | | NULL | NULL | NULL | | 2801787 | f | 65559 | 1260229629 | hjp | betreuer | u::rw-,g::r--,o:r-- | | NULL | NULL | NULL | | 3225686 | f | 65559 | 1260229629 | hjp | betreuer | u::rw-,g::r--,o:r-- | | NULL | NULL | NULL | +---------+-----------+-----------+------------+------------+------------+---------------------+----------------+-----------+------------------------------------------+-----------------+ Find files with checksum is null: select versions2.id, prefix, path from versions2, instances, files, sessions where file_type = 'f' and checksum is null and versions2.id=instances.version and instances.file=files.id and instances.session=sessions.id limit 100000; remove_session: select v.id from instances i right outer join versions2 v on i.version=v.id where i.id is null is very slow. Do two independent queries and difference via judy? In any case all the cleanup stuff needs to be outside of the loop.