备份时关于保留年备份、月备份、日备份的算法理解

备份时为了防止当前备份被污染,会定时将备份内容复制到其它目录,和当前备份写入目录隔离,以此达到保护备份的目的。但同时考虑到存储容量,会采取一定的策略删除部分拷贝,保留几个天备份,星期备份,月备份等等。

假设备份软件每隔5秒钟产生出一个备份拷贝,希望保留3个最新的秒备份,2个最新的分钟备份,2个小时备份,2个天备份,2个月备份,2个年备份。用一个数组表示:3 2 2 2 0 2 2,其中第5个是周备份,这里设成0也就是跳过不处理。以下是实验数据。

$ddd = @"
2018-03-03 21:21:00
2018-03-03 21:21:05
2018-03-03 21:21:10
2018-03-03 21:21:15
2018-03-03 21:21:20
2018-03-03 21:21:25

2018-03-03 21:11:20
2018-03-03 21:11:25
2018-03-03 21:11:30

2018-03-03 21:05:20
2018-03-03 21:05:25
2018-03-03 21:05:30

2018-03-03 20:11:20
2018-03-03 20:11:25
2018-03-03 20:11:30

2018-03-03 19:21:20
2018-03-03 19:21:25
2018-03-03 19:21:30

2018-03-02 20:11:20
2018-03-02 20:11:25
2018-03-02 20:11:30

2018-03-01 21:21:20
2018-03-01 21:21:25
2018-03-01 21:21:30

2018-02-02 20:11:20
2018-02-02 20:11:25
2018-02-02 20:11:30

2018-01-03 21:21:20
2018-01-03 21:21:25
2018-01-03 21:21:30

2017-02-02 20:11:20
2017-02-02 20:11:25
2017-02-02 20:11:30

2016-03-03 21:21:20
2016-03-03 21:21:25
2016-03-03 21:21:30
"@

单元测试代码:

function getfixture {
    $ddd -split "[\r\n]+" | Where-Object {$_} | ForEach-Object {@{CreationTime=(Get-Date $_)}} | Sort-Object -Property CreationTime
}

Describe "find backup files to delete" {
    it "should find yearly" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 0 0 0 0 0 0'
        "todelete $($toDelete.Count)" | Out-Host
        $toDelete.Count | Should -Be 5 # all 3 of 2016, and 2 out of 3 in 2017.
    }
    it "should find monthly" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 2 0 0 0 0 0'
        "todelete $($toDelete.Count)" | Out-Host
        $toDelete.Count | Should -Be 10 # 5 + 5
    }

    it "should find weekly" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 0 2 0 0 0 0'
        "todelete $($toDelete.Count)" | Out-Host
        $toDelete.Count | Should -Be 10 # 5 + 5
    }

    it "should find dayly" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 2 0 2 0 0 0'
        "todelete $($toDelete.Count)" | Out-Host
        $toDelete.Count | Should -Be 15 # 5 + 5 + 5
    }

    it "should find hourly" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 2 0 2 2 0 0'
        "todelete $($toDelete.Count)" | Out-Host
        $toDelete.Count | Should -Be 20 # 5 + 5 + 5 + 5
    }

    it "should find minutely" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 2 0 2 2 2 0'
        "todelete $($toDelete.Count)" | Out-Host
        $toDelete.Count | Should -Be 25 # 5 + 5 + 5 + 5 + 5
    }

    it "should find secondly" {
        $v = getfixture
        "total $($v.Count)" | Out-Host
        $toDelete = Find-BackupFilesToDelete -FileOrFolders $v -Pattern '2 2 0 2 2 2 2'
        "todelete $($toDelete.Count)" | Out-Host
        # only the items in last minute participate find action. If group secondly, there will be one item per group.
        # But it does'nt matter, it still do right.
        # 2018-03-03 21:21:00
        # 2018-03-03 21:21:05
        # 2018-03-03 21:21:10
        # 2018-03-03 21:21:15
        # 2018-03-03 21:21:20
        # 2018-03-03 21:21:25
        $toDelete.Count | Should -Be 29 # 5 + 5 + 5 + 5 + 5 + 4
    }
}
function Find-BackupFilesToDelete {
    param (
        [Parameter(Mandatory = $true, Position = 0)][array]$FileOrFolders,
        [Parameter(Mandatory = $false, Position = 1)][string]$Pattern
    )
    [array]$pts = $Pattern.Trim() -split '\s+'
    $pts = $pts | ForEach-Object {[int]$_}

    if ($pts.Count -ne 7) {
        throw 'wrong prune pattern, must have 7 fields.'
    }
    if (($pts[1] -gt 0) -and ($pts[2] -gt 0)) {
        throw 'one of week and month field must be 0.'
    }
    $ga = @(
        '{0:yyyy}',
        '{0:yyyyMM}',
        '{0:yyyy}',
        '{0:yyyyMMdd}',
        '{0:yyyyMMddHH}',
        '{0:yyyyMMddHHmm}',
        '{0:yyyyMMddHHmmss}'
    )

    $ToIterator = $FileOrFolders
    for ($i = 0; $i -lt $pts.Count; $i++) {
        $pt = $pts[$i]
        $mftstr = $ga[$i]
        if ($pt -gt 0) {
            if ($i -ne 2) {
                $grps = $ToIterator |
                    Sort-Object -Property CreationTime | 
                    Group-Object -Property {$mftstr -f $_.CreationTime} |
                    Sort-Object -Property Name
            }
            else {
                $grps = $ToIterator |
                    Sort-Object -Property CreationTime |
                    Group-Object -Property {($mftstr -f $_.CreationTime) + [int]($_.CreationTime.DayOfYear / 7)} |
                    Sort-Object -Property Name
            }
            $toDeleteGrps = $grps | Select-Object -SkipLast $pt
            $remainGrpsButLast = $grps | Select-Object -Last $pt | Select-Object -SkipLast 1
            $lastGrp = $grps | Select-Object -Last 1

            $toDeleteGrps | ForEach-Object {
                $PSItem.Group | ForEach-Object {$_}
            }
            $remainGrpsButLast | ForEach-Object {
                $PSItem.Group | Select-Object -SkipLast 1
            }
            $ToIterator = $lastGrp.Group
        }
    }
}

代码逻辑:

首先按年份分组,实验数据分组后有2016,2017,2018三个分组,除去需要保留的分组数,比如2,那么2016分组全部删除;最后一个不去动它(2018),除此之外的保留分组内只保留最新的一个版本(2017年分组内只保留最新的版本)。然后将没有动过的分组的内容传递给月分类,按同样的逻辑处理即可。

Leave a Reply

Your email address will not be published. Required fields are marked *