Backup Files And Folders To Amazon S3 From Linux Terminal

Amazon S3 (Simple Storage Service) offers a flexible option to store files/folders and backups of websites in the cloud. It is a part of various web services Amazon offers like EC2, CloudFront and so on. These are collectively known as AWS (Amazon Web Services).

Creating an AWS account is free. New users get 5 GB of Amazon S3 standard storage, 20,000 Get Requests, 2,000 Put Requests, and 15GB of data transfer out for free for each month of the first year.

The way S3 works is to first make a “bucket” and then storing/uploading data in that bucket.

While uploading files directly in S3 through it’s web console is simple, it can be a bit complex when trying to access S3 directly from Linux Terminal.

Let’s take a look at how to backup an entire website (consisting of lots of files and folders) to Amazon S3 through Linux Terminal. (This is also applicable to backing up specific files/folders from local Linux systems as well).

This example assumes an already existing AWS account, a Linux server access using SSH (root access isn’t needed as this will work even on shared hosting for backing up websites to Amazon S3 as long as SSH is enabled).

Prerequisites :

  • Access to Linux server from where website content needs to be backed up. This system should have Python version 2.6.3 installed or greater (which most web hosts already have).

This can be checked by the following Linux command :

python --version
  • Amazon S3 bucket should be created. This can be done by signing into AWS S3 console, choosing Services > S3 and clicking “Create Bucket”.

s3bkt

Once the S3 bucket is created, right click on the bucket name and choose “Properties”.

amz1

From the right side of AWS pane, note down the information displayed especially “Region”. Also, from the “Permissions” tab, make sure that your username has all the permissions to the created S3 bucket.

amz2

  • Access to AWS private and public keys. These can be obtained by clicking on your username in AWS and choosing “Security and Credentials”.

amz3

Then clicking “Create New Access Key”.

amz4aNote down both the keys and download the key file if needed. Keep this information somewhere secure.

amz4b

So to sum up, the following components need to be available before trying to backup content from Linux Terminal to Amazon S3 :

  • Python version 2.6.3 or greater installed on the host system.
  • A created Amazon S3 bucket.
  • Details of S3 bucket like Region and user id who has access to it.
  • Access Key ID and Secret Access Key details.

Now, to access S3 bucket from Linux Terminal, AWS command line interface needs to be installed and configured (AWS CLI). This is a tool for managing AWS products directly using commands.

Setting up environment for installing AWS CLI :

Install AWS CLI using the following :

curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"

unzip awscli-bundle.zip

./awscli-bundle/install -b ~/bin/aws

amz5

amz6

amz7

This will not need root access and the AWS CLI will be installed for the current user.

Once installed, verify if the ~/bin is in PATH environment as it is a symlink through following command :

echo $PATH | grep ~/bin

amz8

If not present, use the EXPORT command :

export PATH=~/bin:$PATH

AWS CLI should be now be all set for use.

To verify if it is working, use the following command :

aws help

This will bring up the AWS CLI help page.

amz9

Configuring AWS CLI to “see” S3 buckets :

The command for this is :

aws configure

amz10

Here is where the prior information about S3 bucket region, access key ID, secret access key are to be entered. Hit “Enter” after typing each of these details.

Note that the region name in this example is “us-west-2” as that is the corresponding region for “Oregon” which is the S3 bucket region here. Default output format can be set to none/blank.

An official list of all S3 regions can be found here.

Once this S3 configuration is done, it is time to access it.

This is done by following command :

aws s3 ls

If correctly configured, the same S3 bucket that was created from AWS S3 console should now be listed here.

amz11

 

Finally, now the fun part  – actually backing up stuff to S3. 🙂

To test if files can be uploaded to this bucket, simply create a test file if needed and use the cp command such that :

aws s3 cp pathtolocalfile s3://sbucketname

In this example, a test file named test.txt is copied to S3 bucket named “s3bkp” through the command :

aws s3 cp /home/a3514/test.txt s3://s3bkp

amz12

This file should show up in the AWS S3 console of the bucket :

amz13

Now, to backup the entire directory structure from host Linux server to S3 bucket, the command to use is “sync” :

aws s3 sync directorypathtobackup s3://bucketname

In this example, the entire “public_html” folder needs to be backed up to S3 bucket “s3bkp”. So to do this :

aws s3 sync /home/a3514/public_html s3://s3bkp

amz14

This will start the sync process and all the content will be uploaded to S3. The same should be now visible through the S3 console for the bucket where it is being uploaded.

amz15

All done!

Restoring from S3 :

This is exactly the reverse of backing up to S3. Simply switch the source and destination paths :

aws s3 sync s3://bucketname/path /localsystempath

So to restore a folder named “logs” from S3 bucket named “s3bkp” to a local folder named “restore”, the command will be :

aws s3 sync s3://s3bkp/logs /home/a3514/restore

amz16

[ For copying an individual file, simply use the “cp” command with source and destination paths as those of S3 bucket and local path.]

So to sum it up, backing up files/folders to Amazon S3 through Linux Terminal consists of :

  • AWS CLI installed (Python version 2.6.3 or greater should be installed on host system for this).
  •  Having the required S3 information at hand (S3 bucket region, private and secret access key IDs) for configuring and accessing S3 from Linux Terminal.
  • Copying files and folders to S3.

Important : Amazon S3 is a storage service which bills only for as much as you use it. Use the S3 pricing calculator to get an idea of costs as it calculates this based on number of GET and PUT requests. The pricing details can be found here.

So to avoid a lot of individual GET and PUT requests for large number of files, it can be more economical to backup a compressed archive and restore it back when needed. This is simple to do using the tar command in Linux.

Official resources for further reference :

Installation and usage of AWS CLI

S3 bucket documentation

Update : There is a server backup monitoring solution – Backup Bird that can do this for S3 as well as for FTP and DropBox. Check out the article on how to set it up here. ]

Comments are closed.