Incremental backups
I need to devise and implement a good backup strategy on a FreeBSD hosting server.
MySQL
Backward-incremental
Probably the best thing to do in my situation is backwards-incremental backups.
- day1
- mysqldump --opt -l -A > dump1
- day2
- mysqldump --opt -l -A > dump2
- diff -e dump2 dump1 > dump2-1
- bzip2 -9 dump2-1
- rm dump1
- day3
- mysqldump --opt -l -A > dump3
- diff -e dump3 dump2 > dump3-2
- bzip2 -9 dump3-2
- rm dump2
- <...>
- restore to day1
- patch -o dump2 dump3 dump3-2
- patch -o dump1 dump2 dump2-1
- mysql < dump1
Problems
- patch does not fully respect TMPDIR when applying ed-style patches
- we have to find a way to move back in time for a subset of the dump (i.e. one db, one table)
- we can keep chains of dumps one per each DB
- maybe some smarter way
- ed-style patches are probably cat(1)-friendly
- we use both InnoDB and MyISAM
- InnoDB should be backed up using mysqldump -l
- MyISAM should be backed up using UFS2 snapshots
- I guess the way we do it is safe enough, though not completely
File system
- dump(1) is not flexible enough
- we use both ACLs and flags
- we probably have to figure out something with tar
- preferably something backward-incremental
- probably using UFS2 snapshots
Backward-incremental
- ideally
- text-file changes stored as ed-style diffs
- bin-file changes stored as binary diffs or as whole files
- but storing everything as whole files should also do
- we don't expect to have many huge files that change often
First stab at tardiff
- tar doesn't save flags/acls by default!
- OTOH, in my situation flags and acls are very uniform, I can just set them after restore
- flags are saved
- specify -p to restore on extract
- pax?
- compare complete file listings of 1.tar and 2.tar through comm(1)
- note what disappeared in 2.tar
- record using a simple format
- note what appeared and changed
- selectively extract from second tar and add to diff.tar
- we need to compare metadata, too (except for atime)
- if metadata is the only thing that changed, it would be cool not to store the diff
- maybe keeping per-file sha's is a good idea
- ed seems to handle binary files well
- why archive at all, maybe just copy?
- incremental operations are very cheap then
- diff is still usable
- increments can be archived
- rsync does not support ACL+Flags at the same time
Solution based on cp -Rp
- day 1
- create and mount snapshot
- find /usr stuff > list-head
- unmount and remove snapshot
- day 2
- create and mount snapshot
- find /usr stuff > list-new
- compare list-new and list-head
- if exists in both
- if same (sha, mtime?)
- sha is better, but perhaps too slow
- maybe differentiate depending on size
- skip
- else
- store old in usr-diff-day1
- mark as "dif" or "chg" in usr-diff-day1.jnl
- copy new into head
- if exists in list-new only
- mark as "del" in usr-diff-day1.jnl
- called "del" becaus the file will be deleted on restore
- "jnl" means journal
- copy new into head
- if exists in list-head only
- find usr-head-day1 > list-verify
- assert list-verify matches list-new
- mv list-verify list-head
- mv usr-head-day1 usr-head-day2
- tar and compress usr-diff-day1
- unmount and remove snapshot
problems
- file names
- spaces
- newlines
- illegal utf-8 (e.g. cp1251)
restore
- usr-head-day23 contains latest backup
- to restore to earlier ones:
- delete whatever marked as "del" in usr-diff-22.jnl
- extract "dif" from usr-diff-day22 into a temp place
- extract "chg" from usr-diff-day22 into target
- repeat for each day desired
- set flags and ACLs
- repair quotas
crash proof
logging
Topic revision: r8 - 19 Nov 2007 - 16:11:50 - Main.AndrewPantyukhin