Trigger data transfer between AWS S3 and Google Cloud Storage (in python)

Beranger Natanelic
5 min readAug 18, 2022

--

Receive live data from a distant Amazon S3 bucket

Aloha!

You have a Google Cloud Platform environment up and running but a new requirement pops up :

The files are sent to an Amazon S3 Bucket.

All you want to do is to recover those data, send them to your GCP project and process them at home.

That’s what we do here.

Architecture

Architecture detailed in this tutorial

Plan

  • S3 Setup
  • GCP (Cloud Storage + IAM) Setup
  • Lambda Setup

S3 Setup

Go to the bucket console : https://s3.console.aws.amazon.com/s3/buckets

Click “Create Bucket”

Choose a name, a region (eu-west-2 for me)

Select ACLs enabled and unclick block all public access ⇒ These settings depend on the way you receive objects on the bucket

Then, keep default config as it and create the bucket.

GCP Setup

IAM

To connect to the Google Cloud Bucket, we need a service account from Google Cloud Platform.

To create one, go to IAM console, click “Create Service Account”, give it a name like lambda_gcs_service_account and finally grant access to Cloud Storage ⇒ Storage Admin. Validate.

Get the new service account in the service account list, click the 3 dots on the right and click “Manage Keys”, then “Add Key” in JSON. The download starts.

Rename the file lambda_gcs_service_account.json and keep it warm for later.

Storage Bucket

Easy peazy, go to https://console.cloud.google.com/storage/browser, click “Create”, invent a name, choose a region and keep the default parameters (or not, up to you).

Lambda Setup

tl;dr

  • Create the function with a trigger from Bucket S3
  • Use existing lambda blueprint
  • Add a Layer with Google Cloud Platform python packages

Create the function with a trigger from a S3 Bucket

Go to https://us-west-1.console.aws.amazon.com/lambda/home

At the top right corner, select the same region of the bucket you just created (eu-west-2 for me)

Click “Create Function”

Select “Use a blueprint”

In the search bar, type get-object and select get-object-python, click the only result

Give your function a name, a role name and select the bucket you just created to set up the trigger

In “Event type”, keep “All objects create events”

Keep the suggested code as it, we will change it later

Click “Create function”

Test the trigger

Once we have this, that’s a good start, but before going further we need to test if the trigger works correctly.

Upload a file in your bucket (inside the bucket window, click “Upload” and “Add files” and then “Upload”). And see the event in the lambda logs (inside the function window click “Monitor” tab and “Logs” subtab.

You see the last invocations, you can click the LogStream link on the right to see more details in CloudWatch.

Function code (python)

Our function should :

  • Get the new object from the S3
  • Connect to the Google Cloud Storage bucket
  • Send the new object to the bucket

First thing first, remember our service account key created in the GCP setup part?

Add the file (named lambda_gcs_service_account.json) to your lambda function folder. The structure should look like this :

Code!

Paste this code in the code tab

I guess the code is straightforward

It’s time to test it

Upload a file into the Amazon S3 bucket

Check the execution logs ⇒ Like we have seen before, click the LogStream link to see more details in CloudWatch.

This message must appear:

The module google doesn’t exist inside the function. And this is absolutely normal.

Unlike Google Cloud Functions, modules are not automatically installed during deployment of a lambda.

That’s why we need a very last step :

Create a google Layer

A Layer is a package of python packages.

To create a Layer, you need to follow a very precise process so the packages could be found to a precise path.

More details for other programming languages in the official documentation here.

For python, the packages must be located at this path : python/lib/python3.9/site-packages

Yeah, pain.

But follow these steps and it’s all easy.

First, we create a python virtual env, then install the package, then, at a specific location, we create the zip folder.

> python3 -m venv python
> source python/bin/activate
> pip install google-cloud-storage
> deactivate

IMPORTANT STEP : Before deactivate the virtual env, check the python version of your venv. You can change directory to /lib/ and ls, you will see the python version. My path is the following one

/Users/beranger/Documents/medium/python/lib/python3.9/site-packages

It means, (and this is the IMPORTANT PART), my lambda function must be deployed with the runtime python3.9. Adapt with your case (you can change the runtime in the Code tab of the lambda).

You can then deactivate the venv.

Zip the venv (cd .. to be outside the python directory, remember the required path must include /python/…).

zip -r9 lambda.zip .

If you followed that correctly, our main folder looks like this :

Congrats!

Add the google layer

Go to your lambda function code, scroll down to “Layers” and click “Add a layer” and “Create a new layer”, upload the brand new .zip file, choose the correct runtime and create.

Go again to our lambda function code, scroll down to “Layers” and click “Add a layer”, check “Custom layers” and select our layer.

Final test

Simply upload a file into the bucket.

Check the logs in CloudWatch.

You should see “UPLOAD SUCCESSFUL”.

And see your object in the Google Cloud Storage bucket.

If not, leave a comment to help me to improve this article.

Hope this help

Adios hippos!

--

--

Beranger Natanelic

Daily Google Cloud Platform user. I am sharing learnings of my tries, struggle and success.