Gustavo Almeida

Cloud Data Engineer

São Paulo, State of São Paulo, Brazil

8+ Years Exp

Summary

Gustavo Almeida is a dedicated Cloud Data engineer with 6 years of experience designing, developing, and implementing data solutions at large enterprises like Roche and Santander Bank. At ANBIMA, Gustavo has made strides transforming the data analytics area by bringing together disparate data sets, applying transformation, and data validation to provide usable data. He has architected and built complex data pipelines and conducted data modeling, performance, and integration testing. He utilizes clean code and modern cloud-native deployment techniques to design and integrate cloud computing and virtualization systems. Gustavo has built data cubes, data marts, and queries and maintained all aspects of storage and translation. He has the skills and best practice knowledge to assist stakeholders with their most challenging data needs.

Technical Skills

Detailed View

SQL

ETL

Python

PowerBI

PySpark

Scrum

AWS

Airflow

GCP

CI/CD

Python Programming

Data Modelling

Graphql

Power BI

REST API

Pyspark

Amazon Web Services

Microsoft Excel

Pentaho

Work Experience

Data Engineer

Hyqoo (formerly ClikSource)

Full Time | 01/07/2022 - Present

Remote | United States

Created Data Lakehouse using S3 and Glue Data Catalog.
Data ingestion integration with several data providers from different sources like REST API, GraphQL, CDC, DMS, PostgreSQL, and MongoDB
Designed pipelines templates for AWS Glue using AWS CloudFormation
Pipeline migration from SQL to PySpark
IaC migration from CloudFormation to Terraform
Full pipeline running in AWS Glue, with Workflow, Triggers, Crawlers, and Jobs getting data from different sources like MongoDB, MySQL, and PostgreSQL and creating a Glue Data Catalog to connect through Redshift Spectrum and Athena or directly in Redshift Storage
Streaming data ingestion pipelines with AWS Kinesis + Spark Structured Streaming
Near real time pipelines with AWS MSK + Spark Structured Streaming

Data Engineer

number8

Full Time | 10/01/2022 - 28/02/2023

Brazil

Created Data Lakehouse using S3 and Glue Data Catalog.
Data ingestion integration with several data providers from different sources like REST API, GraphQL and CDC.
Designed pipelines templates for AWS Glue using AWS CloudFormation.
Pipeline migration from SQL to PySpark.
Lakehouse architecture development with AWS Glue Data Catalog
Pipeline orchestration with AWS Glue Jobs, Triggers, Crawlers with Workflow
Ingestion of REST API, GraphQL, MongoDB and AWS DMS sources
Resource provisioning with AWS CloudFormation
Making data available in AWS Redshift

Data Engineer

Anbima

Full Time | 01/04/2020 - 01/01/2022

São Paulo, BR

Created Data Lakehouse using S3 and Glue Data Catalog.
He worked on an Oracle Database 12c hosted on an on-premise server in which there was an OLTP model responsible for the company's training and certification application and an OLAP model responsible for providing insights and KPI.
The project requirement was to migrate the OLTP and OLAP to AWS using open-source or AWS tools. For the OLTP model, we used DynamoDB because it was not necessary a relational database for the application, and it was easier for the developers to create new features, and the OLAP model was migrated to a Redshift single-node cluster.
Pipeline migration from on-premises servers and legacy and graphical interfaces tools to AWS Glue with Python and PySpark Jobs, orchestrating with Triggers and Crawlers through a Workflow.
Data modeling to provide a self-service schema for business analyst and integrated with Power BI and Improved critical daily pipeline from 8 hours to 20 minutes, handling around 60 GB of data.
Conducted data journey workshops to business analyst also created pipeline templates for easy maintenance.
Designed CI/CD pipeline with AWS CodePipeline and CloudFormation.
Data ingestion integration with several data providers from different sources like REST API, GraphQL and CDC.
Mined internal and external sources and joined disparate, non-normalized data sets.
Integrated information from multiple data sources, solved common transformation problems and resolved data cleansing and quality issues.
Utilized code and modern cloud-native deployment techniques to design, plan and integrate cloud computing and virtualization systems.
Understood client needs and objectives by conducting proactive customer and data analysis and researched, designed and implemented scalable applications for data extraction, analysis, retrieval and indexing.
Conducted data modeling, performance and integration testing and compiled, cleaned and manipulated data for proper handling.
Building pipelines using native cloud products - PaaS & SaaS.
Architecting and building complex data pipelines using leading-edge technologies.

Data Engineer

Febrafar

Full Time | 12/03/2018 - 01/03/2020

São Paulo, BR

Responsible for implementing a data driven culture, migrating all the Excel reports to Python and PySpark.
Pentaho pipeline migration to PySpark increasing performance by 80%.
Created Data Lake using Google Cloud Storage and Google BigQuery.
Data modeling to provide a self-service schema for business analyst and integrated with Power BI.
Created pipeline to deliver 10k+ personalized reports for customers.
Pipeline orchestration with Airflow (Google Cloud Composer)
Generated reports, maintaining dimensional as well as relational data structures and managing operational data store and data warehouse.
Developed applications and designed processes for transformation and data management from company-wide databases.
Built data cubes, data marts and queries, maintaining every aspect of storage and translation.
Created data models and mapped content storage pathways to facilitate easy access.
Selected methods and criteria for warehouse data evaluation procedures.
Mapped data between source systems and warehouses and validated warehouse data structure and accuracy.
Mined internal and external sources and joined disparate, non-normalized data sets.
Integrated information from multiple data sources, solved common transformation problems and resolved data cleansing and quality issues.

Business Analytics Engineer

Roche

Full Time | 01/07/2017 - 01/03/2018

São Paulo, BR

Created data warehouse in SQL Server and PySpark to obtain data from Salesforce, making it possible to develop KPI of customer's journey.
Developed KPI's and dashboards in Power BI allowing full view of forecast process in the organization and strategic focus.
Migrated VBA and Excel reports to Python.
Designed and developed analytical data structures.
Built databases and table structures following Star Schema architecture methodology.
Explained data results clearly and discussed how it can be utilized to support project objectives.
Selected methods and criteria for warehouse data evaluation procedures.
Mapped data between source systems and warehouses.
Validated warehouse data structure and accuracy.

Business Intelligence Analyst

Santander

Full Time | 01/08/2014 - 01/06/2017

São Paulo, BR

Automated Excel reports, reducing errors and inconsistency.
Improved flow of information, developing a centralized pipeline using SQL Server Triggers and Stored Procedures.
Developed KPI's and dashboards for strategic focus.
Optimized data sources and processing rules to enhance data quality through design and development phases.
Determined data storage and optimization policies, shaping organization efforts to enhance performance.
Compiled, cleaned and manipulated data for proper handling.
Explained data results clearly and discussed how it can be utilized to support project objectives.