Google Cloud Serverless technology never sleep

Antoine Castex
Google Cloud - Community
4 min readMay 16, 2022

--

Trying to look for the most efficient solution to deploy your application is a good approach.

Your company will be happy to have a cost efficient platform, but….. story is not over !

Here is my story, started 4 years ago :

The goal was initially to measure and report the BigQuery usage in the company, reported every day in a table with number of rows per table, size of each table etc ...

After looking for a different way to achieve I discovered the open source solution :

Thanks to DoIt team, the code was pretty easy to deploy on AppEngine Flexible.

Initially we were having hundreds of tables, meaning a quick and fast run every day for a very very low cost and 0 maintenance.

But because adoption of the platform is good, here we are now with millions of Tables & Datasets …

That result of the below situation :

  • Scan takes now 23 hours per day
  • Scan and export cost now close to 3 000$/Month

After troubleshooting the problem we don’t have discovered any issue on the used solution.

Now it’s time to find another way to do that, without reinvent the wheel..

Our idea is to leverage new Serverless generation tools that where not available 4 years ago … Cloud Run !

The best in class for Scalability, Efficiency and Agility.

We took the code from the previous solution used and deployed this :

Cloud Task is managing the different call of all the Cloud Run for each projects, each dataset and each tables … to have maximum Cloud Run services used in parallel.

Result is :

  • 2 hours of run
  • 800$ of monthly cost (1 run cost 26$) + 260$ of BigQuery insert & Cloud Logging
  • 713 Cloud Run Containers used during the run

Wow !!! Serverless 2.0 is here, this wave is incredibly powerful, the proof is clear :

  • Don’t sleep after deploying a solution because you are convicted that the problem is indefinitely solved
  • Try to spend a little bit of time to refactor an app sometimes is a very good approach

Keep in mind that Serverless v1 was using GAE flex container (up and running 100% of the month from few to more container but never scale to 0) and Serverless v2 is using 7xx Container per run daily launched on demand and shut them down after each run automatically…

Because we would like to measure the efficiency of this solution we have made a last test :

Copy the python code used in Cloud Run and try to deploy it in a classic VM, calling the API table by table like in the old days..

VM used is n2-standard-16 and of course it offers no scalability and parallelization of API calls ..

Why not the cheapest VM ? Because we are afraid if the job takes several hours, the information in memory will literally explode …

Result is crazy :

1 full run take 7 days !

Now imagine how many Vms are needed to do that in less than 24 hours :

  • 7 Vms worker to do the job in less than 24h and more sooner if we continue to have more and more tables to scan…
  • 4 000$ monthly cost for 7 Vms + 420$ for BQ Streaming insert / Cloud Logging = 4 420$

Now we can see how the Serverless 2.0 is powerful and efficient.

Lastly, Carbon footprint informations of the 3 scenarios to compare additionally to the money spend :

The numbers speak for me … Now you know what you have to do.

To summary

I spent couple of days to switch from Serverless v1 to v2 , to obtain :

59% of monthly cost reduction

25% of monthly Carbon Footprint reduction.

--

--