I subscribe to a loose interpretation of the 3-2-1 backup strategy for my essential files, mainly my photos. I have:

  1. The primary/working copy on disk in my computer
  2. Backblaze continuous local backup (to protect against local emergencies like fires, etc.), with their 1-year file history option
  3. A periodic sync to Google Drive since I’m already paying Google for my email
  4. A manually performed copy of my files on a portable drive left in a safety deposit box

During my university days, my backup process was different. I didn’t have the Google Drive or portable drive copy. Instead, I uploaded my photos to AWS Glacier to ensure that I wouldn’t lose my most important files if my computer and backup drive got damaged during my frequent moves. After settling down, I decided to stop paying AWS ~$4/month and started using Google Drive and the portable drive as backup instead.

With Backblaze hiking their fees, Google starting to enforce their storage limits on Google Workspace, and the bank closing the branch I had my safety deposit box, I decided to revisit my backup decisions.

Why not local?

Having a second local copy does help when something happens to my primary copy, but it doesn’t help when an earthquake or similar hits - for that, I rely on Backblaze.

That said, I’ve had bad experiences, and I don’t trust media that is idle for a while. I’ve had drives fail to spin up, and other drives develop bad sectors when trying to read from them; backup DVDs decayed to the point they’re unreadable.

Manual periodic validation has been a pain; having someone else manage and watch over my data seems like a good plan. I don’t have unlimited money, but I’m not a college student anymore; I can pay for services - especially those I’ll treat as the last line of defense.

Cold Clouds

I ended up evaluating a few providers that offer multiple storage tiers - I’m looking at their respective archive tiers for my archive use case and calculating how much I would be charged to store & retrieve files. This meant looking for operations and bytes-based fees on top of the storage charge.

I’ve included links to the pricing pages I used at the bottom of this post.

The providers I’m considering are AWS, GCP, Oracle, and OVH. I’m using Backblaze B2 as a baseline for comparison. ($0.006/GB/month, free egress up to 3x the average monthly stored bytes)

Oracle is surprisingly not more competitive than AWS or GCP on pure storage - but Oracle is competitive on transfer out bandwidth charges at least. Unfortunately, Backblaze is cheaper than Oracle Infrequent Access ($0.01/GB), so Oracle is quickly dropped from further consideration.

I was surprised by OVH. Their Archive Storage is priced well - $0.0024/GB, only let down by a $0.0121/GB fee for both ingress and egress. The ingress fee is unique to OVH, but as far as I can tell, there are no operation fees, which is impressive and simple to calculate.

GCP Archive has the second lowest storage charge - $0.0012/GB-month. But there’s a stunning $0.05/GB retrieval charge before the $0.12/GB egress fee and a $0.05/1000 objects operations fee. Unlike AWS, there are no other retrieval levels that you can opt for slower service. GCP also has the worst minimum storage duration charge at 12 months.

AWS has at least two services - the older “S3 Glacier API” I used in 2013 and the newer “S3 Glacier Deep Archive”. The Glacier API is straightforward: $0.0036/GB-month, $0.09/GB transfer out, free bulk retrievals, and $0.033/1000 objects uploaded.
In comparison, the Deep Archive has the lowest storage price at $0.00099/GB-month. Still, it has many operation charges and an annoying-to-calculate metadata storage charge per object. The egress charge is the same, $0.09/GB, but bulk retrievals cost $0.0025/GB and $0.0025/1000 requests, on top of $0.05/1000 objects uploaded.

Structuring to save money

I have about 300000 photos that I want to archive, totaling about 900GB. I can zip my photos together to reduce per-object charges, e.g., into 1GB batches. 1GB is arbitrary, balancing the potential of starting from a partial recovery with the diminishing returns of growing the archive size. Having ~900 archives instead of 300000 photos would be 0.3% of the original operation count cost.

The GCP egress ($0.12/GB) is even worse than AWS ($0.09/GB). But you can get $0.04/GB egress using the CDN Interconnect. To save 66% on egress charges, I’d write something that downloads files through a partner CDN (hi Cloudflare) instead of using the GCP API directly.

AWS has a free tier for data transfer of 100GB, with 1TB for CloudFront. I could probably create an equivalent of the GCP CDN Interconnect solution for CloudFront as well. My expected archive size fits under the 1TB limit, so egress might be free for me, which would be a big win since egress fees are legendarily nasty.

There might be an opportunity to use Intelligent Tiering in AWS to automatically transition between the Deep Archive tier and the standard tier to avoid the retrieval fee associated with the Deep Archive tier and some of the other operational charges. It’s unclear from the documentation, but I think you get charged different rates while the data gets moved between storage classes until it gets to the Deep Archive class (6 months?) on top of the $0.0025/1,000 objects/month charge for monitoring. Considering those fees are higher and uncertain, I won’t try to calculate it.

Calculations

I’m treating restoring from the archive as a worst-case situation, meaning I lost both my local copy and Backblaze backup. I assume this situation will happen at least once in 10 years or 120 months.

There are 3 situations that I’ll consider separately - uploading the files, storage for 120 months, and retrieval. The baseline will be Backblaze’s $0.006/GB-month and free egress.

Uploading

Backblaze: $0.00/upload = $0

OVH: $0.0121/GB ingress: 900 * $0.0121 = $10.89

GCP: $0.05/1000 operations: 300000 * 0.05/1000 = $15

  • Reduced files: 900 * 0.05/1000 = $0.045

AWS Glacier API: $0.033/1000 operations: 300000 * 0.033/1000 = $9.90

  • Reduced files: 900 * 0.033/1000 = $0.0297

AWS Deep Archive: $0.05/1000 operations: 300000 * 0.05/1000 = $15

  • Reduced files: 900 * 0.05/1000 = $0.045

OVH is the loser here despite being much more straightforward than the other providers, but it shows the strength of structuring uploads. After structuring, everyone other than OVH is under $0.05.

Storing

Backblaze: $0.006/GB-month * 900GB * 120 months = $648

OVH: $0.0024/GB-month * 900GB * 120 months = $259.20

GCP: $0.0012/GB-month * 900GB * 120 months = $129.60

AWS Glacier API: $0.0036/GB-month * 900GB * 120 months = $388.80

AWS Deep Archive: $0.00099/GB-month * 900GB * 120 months = $106.92

  • Metadata charge:
    • 300000 * 32kb * 1mb/1024kb * 1GB/1024mb * 0.00099/GB-month * 120 months = $1.09
      • Reduced files: 900 * 32kb * 1mb/1024kb * 1GB/1024mb * 0.00099/GB-month * 120 months = $0.003
    • 300000 * 8kb * 1mb/1024kb * 1GB/1024mb * 0.023/GB-month * 120 months = $6.32
      • Reduced files: 900 * 8kb * 1mb/1024kb * 1GB/1024mb * 0.023/GB-month * 120 months = $0.02

It is no surprise that Deep Archive wins since it’s got the cheapest storage charges, though GCP puts up a decent showing. I was worried that the Deep Archive metadata rates would be much worse, but because it’s a per-object charge, structuring data before upload dramatically helps.

Retrieval

Backblaze:

  • $0.004/10000 operations: 300000 * $0.004/10000 = $0.12
  • $0 egress up to 3x average stored bytes: $0

OVH:

  • $0.0121/GB egress: 900 * $0.0121 = $10.89

GCP:

  • $0.05/1000 operations: 300000 * 0.05/1000 = $15
    • Reduced files: 900 * 0.05/1000 = $0.045
  • $0.05/1GB restore: 900 * $0.05 = $45
  • $0.12/GB egress: 900 * $0.12 = $108
    • CDN Interconnect: 900 * $0.04 = $36

AWS Glacier API:

  • $0.00 retrieval: $0
  • $0.09/GB egress: 900 * $0.09 = $81
    • CloudFront: 900 * $0 = $0

AWS Deep Archive:

  • $0.025/1000 restore requests: 300000 * 0.025/1000 = $7.50
    • Reduced files: 900 * 0.025/1000 = $0.0225
  • $0.0025/GB bulk restore fee: 900 * 0.0025 = $2.25
  • $0.09/GB egress: 900 * $0.09 = $81
    • CloudFront: 900 * $0 = $0

Totals

Backblaze: $0 + $648 + $0.12 = $648.12

OVH: $10.89 + $259.20 + $10.89 = $280.98

GCP:

  • Standard: $15 + $129.60 + $15 + $45 + $108 = $312.60
  • Reduced files: $0.045 + $129.60 + $0.045 + $45 + $108 = $282.69
  • Reduced + CDN Interconnect: $0.045 + $129.60 + $0.045 + $45 + $36 = $210.69

AWS Glacier API:

  • Standard: $9.90 + $388.80 + $81 = $479.70
  • Reduced files: $0.0297 + $388.80 + $81 = $469.83
  • Reduced + Cloudfront: $0.0297 + $388.80 = $388.83

AWS Deep Archive:

  • Standard: $15 + $106.92 + $1.09 + $6.32 + $7.50 + $2.25 + $81 = $220.08
  • Reduced files: $0.045 + $106.92 + $0.003 + $0.02 + $0.0225 + $2.25 + $81 = $190.26
  • Reduced + Cloudfront: $0.045 + $106.92 + $0.003 + $0.02 + $0.0225 + $2.25 = $109.26

Results

I put all of these numbers into a spreadsheet with my calculations (publicly viewable link). (If you want to mess around with numbers yourself, here’s a template.)

All the archive solutions beat the Backblaze benchmark, and everything except Glacier API beat it by at least half. Everything beats Backblaze since I filtered out the options that cost more than Backblaze, and Backblaze wins on the lack of operations charges and ingress/egress fees, not pure storage costs - which dominate over my 10-year timespan.

Glacier API being the worst archive performer isn’t surprising since it has the highest archival storage fee. It’s marginally lower than GCP in the initial few months, but Deep Archive is cheaper, so why not just go for that? I’ve previously posted about my suspected soft deprecation of Glacier API, and this price difference is more evidence.

There’s one case when Glacier API is cheaper than Deep Archive: when the average file size is tiny - think 128kb. The fixed costs are much less because Glacier API has free retrievals and cheaper upload operation costs. I suspect the price improvement would disappear if I compared Glacier API to Glacier Flexible Retrieval.

GCP’s performance was disappointing. In part because of their higher egress fee, they can’t match AWS. Even when using GCP CDN Interconnect to reduce the egress fee, I can’t identify a scenario when it’s cheaper than AWS Deep Archive. The two companies are evenly matched in terms of prices. GCP has $0.05 retrieval + $0.04 egress charges, while AWS has $0.0025 retrieval + $0.09 egress charges. The slightly lower storage charge for AWS means that Deep Archive is cheaper over the long term, while the longer minimum storage charge means GCP never has a chance to be cheaper.

OVH is an oddball. Because its egress fees are dramatically cheaper than AWS and GCP, it does well in my comparison. Without the CloudFront loophole, OVH is the most inexpensive overall solution for about 4 years, after which the higher storage fee makes it less competitive.

Finally, Deep Archive. With my suspected CloudFront loophole, the total cost is extremely low - $109 to store 900GB over 10 years, including downloading the archives once. That cost goes up to $190 without the loophole, only $20 less than the GCP CDNi solution - but still 10% less.

Conclusion

Deep Archive is the winner by numbers. It’s a shame that it suffers from the standard AWS complexity - it’s the most complicated solution to try and calculate for, so people will be surprised by unexpected charges.

If OVH had cheaper storage or I had smaller files, I’d go with OVH just for the simplicity. For now, I’ll be sticking with Deep Archive… once I figure out how to perform the uploads as cheaply as possible. That experiment derailed my previous attempted adoption.

The most surprising thing for me is that grouping files together saves money. My earliest photos are tiny (2-megapixel JPGs are under 1MB) compared to more recent pictures with my DSLR. Archiving those directly would cost more in the per-file operations than the actual storage.

The sweet spot seems to be when the average file size is around 500MB-1GB - experimenting with numbers, the graph shows diminishing returns after that as the total per-file operations cost drops, and improvements start to be in the order of cents instead of dollars.

I also want to investigate whether I’m correct about the 1TB free Cloudfront egress applying to Deep Archive restores. It would save $81, which is worth it for my small scale. The loophole could work when storing more than 1TB of data - just download a portion each month to keep the free egress.

Prices

The list of providers I looked at was based on a selection of rclone providers & my experience.

S3 Glacier Class Deep Archive

  • https://aws.amazon.com/s3/pricing/
  • $0.00099/GB-month
  • $0.09/GB transfer out
  • Gotchas:
    • $0.10/1000 (standard) or $0.025/1000 (Bulk) restore requests - incentivizes packing files into an archive
    • $0.02/GB (standard) or $0.0025/GB (Bulk) restore ($0.003 in Ireland/Stockholm)
    • Minimum storage duration of 180 days
    • $0.05/1000 PUT requests
    • Each object stored has 40kb of metadata, 8kb at S3 standard, 32kb at deep archive rates

S3 Glacier API (old one)

  • https://aws.amazon.com/s3/glacier/pricing/
  • $0.0036/GB-month
  • $0.09/GB transfer out
  • Gotchas:
    • retrievals are $0.00/GB in Bulk, otherwise $0.01/GB
    • retrieval requests are $0.033/1000 requests in standard, $0.00/1000 in bulk
    • minimum storage duration of 90 days
    • $0.033/1000 UPLOAD requests

GCP Archive Storage

OVH Cloud Archive

Oracle Cloud Object Infrequent Access Storage

Disclaimers

I own stock in Amazon and Backblaze.