TWiki> Cenkes Web>IncrementalBackups (19 Nov 2007, Main.AndrewPantyukhin)EditAttach
Tags:
create new tag
, view all tags

Incremental backups

I need to devise and implement a good backup strategy on a FreeBSD hosting server.

MySQL

Backward-incremental

Probably the best thing to do in my situation is backwards-incremental backups.
  • day1
    • mysqldump --opt -l -A > dump1
  • day2
    • mysqldump --opt -l -A > dump2
    • diff -e dump2 dump1 > dump2-1
    • bzip2 -9 dump2-1
    • rm dump1
  • day3
    • mysqldump --opt -l -A > dump3
    • diff -e dump3 dump2 > dump3-2
    • bzip2 -9 dump3-2
    • rm dump2
  • <...>
  • restore to day1
    • patch -o dump2 dump3 dump3-2
    • patch -o dump1 dump2 dump2-1
    • mysql < dump1

Problems

  • patch does not fully respect TMPDIR when applying ed-style patches
    • try ed(1) itself?
  • we have to find a way to move back in time for a subset of the dump (i.e. one db, one table)
    • we can keep chains of dumps one per each DB
    • maybe some smarter way
  • ed-style patches are probably cat(1)-friendly
    • maybe even optimizeable
  • we use both InnoDB and MyISAM
    • InnoDB should be backed up using mysqldump -l
    • MyISAM should be backed up using UFS2 snapshots
    • I guess the way we do it is safe enough, though not completely

File system

  • dump(1) is not flexible enough
  • we use both ACLs and flags
  • we probably have to figure out something with tar
    • preferably something backward-incremental
    • probably using UFS2 snapshots

Backward-incremental

  • ideally
    • text-file changes stored as ed-style diffs
    • bin-file changes stored as binary diffs or as whole files
  • but storing everything as whole files should also do
    • we don't expect to have many huge files that change often

First stab at tardiff

  • tar doesn't save flags/acls by default!
    • OTOH, in my situation flags and acls are very uniform, I can just set them after restore
    • flags are saved
      • specify -p to restore on extract
    • pax?
  • compare complete file listings of 1.tar and 2.tar through comm(1)
    • note what disappeared in 2.tar
      • record using a simple format
    • note what appeared and changed
      • selectively extract from second tar and add to diff.tar
    • we need to compare metadata, too (except for atime)
      • if metadata is the only thing that changed, it would be cool not to store the diff
  • maybe keeping per-file sha's is a good idea
  • ed seems to handle binary files well
  • why archive at all, maybe just copy?
    • incremental operations are very cheap then
    • diff is still usable
    • increments can be archived
  • rsync does not support ACL+Flags at the same time

Solution based on cp -Rp

  • day 1
    • create and mount snapshot
    • find /usr stuff > list-head
      • cp -Rp % usr-head-day1
    • unmount and remove snapshot
  • day 2
    • create and mount snapshot
    • find /usr stuff > list-new
    • compare list-new and list-head
      • if exists in both
        • if same (sha, mtime?)
          • sha is better, but perhaps too slow
            • maybe differentiate depending on size
          • skip
        • else
          • store old in usr-diff-day1
            • either diff -e or cp -Rp
          • mark as "dif" or "chg" in usr-diff-day1.jnl
          • copy new into head
      • if exists in list-new only
        • mark as "del" in usr-diff-day1.jnl
          • called "del" becaus the file will be deleted on restore
          • "jnl" means journal
        • copy new into head
      • if exists in list-head only
        • move to usr-diff-day1
    • find usr-head-day1 > list-verify
      • assert list-verify matches list-new
    • mv list-verify list-head
    • mv usr-head-day1 usr-head-day2
    • tar and compress usr-diff-day1
    • unmount and remove snapshot
problems
  • file names
    • spaces
    • newlines
    • illegal utf-8 (e.g. cp1251)
restore
  • usr-head-day23 contains latest backup
  • to restore to earlier ones:
    • delete whatever marked as "del" in usr-diff-22.jnl
    • extract "dif" from usr-diff-day22 into a temp place
      • apply them with ed(1)
    • extract "chg" from usr-diff-day22 into target
  • repeat for each day desired
  • set flags and ACLs
  • repair quotas
crash proof
  • needs a lot more design
logging
  • needs a lot more design
Topic revision: r8 - 19 Nov 2007 - 16:11:50 - Main.AndrewPantyukhin
 

Cenkes - IT Pro Bono