It’s time for our final chapter on Datawarehousing.

We already discussed :

#1 ETL vs ELT & BigQuery choice

#2 Secure the Data access

Today we’ll discuss data Governancy and Deployment !

Governancy

A successful project starts with strong executive sponsorship in line with the organization’s business strategy.

It’s not so easy to develop a data-driven culture, so we clearly defined new ways of working with the business team, the product owner, etc.

We have decided to let business team build their use case on what we call the consumption layer, based on the transformed & prepared data called shared domain layer.

The most important information is the responsibility model, which means that a Domain Product Owner is in charge of the complete life cycle, definition, access and evolution of its data.

Deployment

On how we deploy this new platform with all its capabilities and guidance, here are the principles that the data platform team is adopting:

And here is below the set of tools chosen with industrialization at the heart of these choices (the best example is probably Terraform, as Infrastructure As Code is a standard to be adopted without any doubt to be sure that everyone does the same work everywhere)

Additions to this template should be proposed on GitHub to the entire company’s technical team to reuse (and contribute) to the code.

In the end, from a purely technical point of view, we have something like this

Again, don’t forget to use the ETL model and transform/aggregate the data as much as possible in BigQuery using only SQL (again a standard).

All the technicals elements are shared here :

https://cloud.google.com/blog/products/serverless/loreal-combines-google-cloud-serverless-and-data-offerings

Any ideas or comments ? Contact me :)

--

--