Migrate Files from local files system to Amazon S3 with Python Application | AWS S3 | Python

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. S3 storage well fit in different use cases, such as websites, mobile applications, backup and restore, archiv2e, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements. 

In this Article we will learn how to migrate set of files from local storage to AWS S3 bucket using Python API, solution design for this application will be as following

Important Note: Before continue in this project, you need to setup your AWS credentials file on your local system, if you have AWS CLI configured so it will be created as part of AWS CLI configuration steps, if you don’t have AWS CLI you need to create credentials file manually at ~/.aws
Check the following articles on how to setup AWS CLI and how to set credentials files using your AWS Access Keys

What you will need to apply this project:
– Python installed on your machine
– Python IDE such as Spyder, PyCharm, or Visual Studio Code
– AWS Free Account

We have a set of 6 files we need to transfer to S3 Bucket, to make the project practical and production ready we will learn how to delete objects in a bucket then delete the bucket it self if exists, assuming that there are possibility that files already exist before and we need to replace them

First we need to install Boto3 library, and to do that we will use pip as following

pip install boto3

Now open your code editor and lets start writing our application

import boto3
import json
import os
from botocore.exceptions import ClientError
import logging

directory='C:\\appdata'

s3 = boto3.resource('s3')
client = boto3.client('s3')

The first line we import boto3 library we just installed, json library which we will use later to print response coming from AWS and to read, and os library which we will use to read files from our local file system.
Import ClientError from bootcore.exceptions to capture any errors happens when we call actions on a resource.
Import logging to log any kind of errors happens while we call actions on a resource.
directory variable to point to the folder where we have our files we need to migrate

To use boto3 we need to specify the resource we need to work with, and in some resources we need to create a client which is a low level representation to give you access to multiple actions you can do on the selected resource, so we created a s3 resource, and client to access s3 resource

In coming blocks of code we will delete all objects inside the Bucket, then delete the bucket itself, and create a new bucket, these three steps are optional we just mentioned here to show you how to do these actions on S3 with python application, if you don’t want to do these steps remove it completely from the application

# Delete Content of the Bucket (optional you can remove from the application)
try:
    my_bucket = s3.Bucket('s3appdvversion2')
    my_bucket.objects.all().delete()
    print("All Bucket Objects deleted succesfully")
except ClientError as e:
    print(logging.error(e))

In this code we used the s3 resource we created and created another variable called my_bucket to point to the bucket we need to delete then using objects method we accessed bucket objects then use of all().delete() to delete all objects

# delete the bucket after removing all objects in the bucket (Optional)
try:
    response = client.delete_bucket(
    Bucket='s3appdvversion2')
    print("Bucket Deleted Successfully")
    print(json.dumps(response, indent=2))
except ClientError as e:
    print(logging.error(e))

Output so far will be as following

In this block, we created a variable called response in this variable we get the response from calling delete_bucket method using client we created. delete_bucket method takes Bucket as a parameter which is the value of the bucket we need to delete.
In case action is successful we print response which is a JSON object, that’s why we use json library to pretty print the output response

# Create a new bucket (Optional)
response = client.create_bucket(
    ACL='public-read-write',
    Bucket='s3appdvversion2',
    CreateBucketConfiguration={
        'LocationConstraint': 'eu-west-2',
    },
)

print(json.dumps(response, indent=2))

In this block, we create a new bucket using create_bucket method, this method accepts a set of parameters here we are using:
ACL: which set the access rights on the new created bucket, the user which we use his Access keys in the credentials file will be the owner of the bucket, in ACL we have
these set of options ‘private’, ‘public-read’, ‘public-read-write’, ‘authenticated-read’
Bucket: specify Bucket name to create, bucket names must follow AWS naming conventions mentioned in this link
https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html#bucketnamingrules
CreateBucketConfiguration: The configuration information of the bucket
LocationConstraint: Specifies the Region where the bucket will be created. If you don’t specify a Region, the bucket is created in the US East (N. Virginia) Region (us-east-1)

Output of creating bucket will be

# Upload File to our Bucket
for file in os.listdir(directory):
    client.upload_file(directory+'\\'+file, 's3appdvversion2',file)

In this block we do a for loop over the files in our directory which we defined earlier, upload_file method has three inputs (file to be uplaoded, bucket name, key)
– File name and directory of the file to be uploaded
– Bucket name
– key, which is the file name on the s3 bucket, in our example here we uploaded the files with the same name exists on our local file system

Now if we checked the S3 console from the AWS management console we can see the 6 files has been uploaded successfully

#List content of the Bucket
for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

This last block of the code to list existing files in our bucket.

Now we walk through how to do the basic actions on S3 Bucket using Python SDK, and the basic block of this application you can use it in your applications to do more actions and apply business logic you need on S3 buckets.
You can download the application file from the below link

Ahmed Ibrahem

Ahmed Ibrahem is working as Data Engineering Team lead, with a wide experience in data management projects and technical implementation using different technologies, and delivering end to end projects starting from business analysis to Data Warehouse modeling and implementation to BI design to customers in different industries.

guest
0 Comments
Inline Feedbacks
View all comments