Box to AWS S3

Box Eliminates Unlimited Storage for Education

In late 2019, Box announced changes to their pricing model for all educational institutions. Box eliminated the unlimited file storage agreement at the current annual spend. (See: https://it.wisc.edu/news/new-storage-quotas-for-box-after-unexpected-contract-changes/, https://bconnected.berkeley.edu/projects/box-service-changes)

Many universities stored petabytes of data in Box based on their previous contracts for unlimited storage. The contract changes are leaving many schools scrambling to find other solutions.

AWS to the Rescue

It turns out AWS has an inexpensive highly available storage solution that is web addressable. The S3 family of products has storage options from $0.023 per GB to $0.00099 per GB as of July 2020. For comparison, 1 PB of data stored at $0.023 per GB per month is $22,583.30 per month and stored at $0.00099 is $1,038.09 per month.

If an university is storing petabytes of data in Box, it probably is not all users’ personal documents and data but includes research data sets and other backups. This use case lends itself to long term storage with infrequent access such as S3 Glacier Deep Archive, the lowest end of the cost spectrum.

One Solution using AWS S3

Here is a brief outline of a solution for a Box to AWS S3 storage migration for a small group of users using FTP to access their individual files, which are inaccessible by other users. This was designed with Box to AWS S3 in mind but could be used for any AWS S3 storage project.

  • Amazon S3 – used for storage of documents with either individual buckets or prefixes in single bucket (e.g. bucketname.s3.amazonaws.com/team-a).
  • IAM Role – one for each user/team/bucket that needs to be separated from other users. Only users with the ability to assume the role would have access to the bucket or bucket prefix.
  • Amazon Single Sign-On – provides an easy to use solution to grant users permissions to assume an IAM role. It can be used stand alone or integrated with AD or SAML for authentication.
  • User access to files:

Final Thoughts

I don’t know of any universities that are willing to go from spending nothing (unlimited Box storage) to over $22,000 per PB per month. Careful use of S3 storage classes is the key to reducing the price. While close to $1000 per PB per month is possible, the most likely scenario is using multiple storage classes for different storage use cases. Ideally, no more than a small subset is frequently accessed. Infrequently accessed or archival data that users can wait up to 12 hours to retrieve can be shifted to lowest cost storage, with multiple variations in between.

Using S3 for storage has other benefits, especially for researchers such as:

This is not meant to be an implementation guide, but rather a high level solution outline. If you are interested in exploring such a solution, please reach out to us for more information.

Michael McCarthy

Michael is veteran software engineer and cloud computing aficionado. After starting his career as a Java software engineer, he evolved into a consultant, focusing first on enterprise content management and later on AWS.

Leave a Reply

Your email address will not be published. Required fields are marked *