Gustavo Almeida
Cloud Data Engineer
São Paulo, State of São Paulo, Brazil
8+ Years Exp
Summary
Technical Skills
Detailed View
Work Experience
Data Engineer
Hyqoo (formerly ClikSource)
Full Time | 01/07/2022 - Present
Remote | United States
- Created Data Lakehouse using S3 and Glue Data Catalog.
- Data ingestion integration with several data providers from different sources like REST API, GraphQL, CDC, DMS, PostgreSQL, and MongoDB
- Designed pipelines templates for AWS Glue using AWS CloudFormation
- Pipeline migration from SQL to PySpark
- IaC migration from CloudFormation to Terraform
- Full pipeline running in AWS Glue, with Workflow, Triggers, Crawlers, and Jobs getting data from different sources like MongoDB, MySQL, and PostgreSQL and creating a Glue Data Catalog to connect through Redshift Spectrum and Athena or directly in Redshift Storage
- Streaming data ingestion pipelines with AWS Kinesis + Spark Structured Streaming
- Near real time pipelines with AWS MSK + Spark Structured Streaming
Data Engineer
number8
Full Time | 10/01/2022 - 28/02/2023
Brazil
- Created Data Lakehouse using S3 and Glue Data Catalog.
- Data ingestion integration with several data providers from different sources like REST API, GraphQL and CDC.
- Designed pipelines templates for AWS Glue using AWS CloudFormation.
- Pipeline migration from SQL to PySpark.
- Lakehouse architecture development with AWS Glue Data Catalog
- Pipeline orchestration with AWS Glue Jobs, Triggers, Crawlers with Workflow
- Ingestion of REST API, GraphQL, MongoDB and AWS DMS sources
- Resource provisioning with AWS CloudFormation
- Making data available in AWS Redshift
Data Engineer
Anbima
Full Time | 01/04/2020 - 01/01/2022
São Paulo, BR
- Created Data Lakehouse using S3 and Glue Data Catalog.
- He worked on an Oracle Database 12c hosted on an on-premise server in which there was an OLTP model responsible for the company's training and certification application and an OLAP model responsible for providing insights and KPI.
- The project requirement was to migrate the OLTP and OLAP to AWS using open-source or AWS tools. For the OLTP model, we used DynamoDB because it was not necessary a relational database for the application, and it was easier for the developers to create new features, and the OLAP model was migrated to a Redshift single-node cluster.
- Pipeline migration from on-premises servers and legacy and graphical interfaces tools to AWS Glue with Python and PySpark Jobs, orchestrating with Triggers and Crawlers through a Workflow.
- Data modeling to provide a self-service schema for business analyst and integrated with Power BI and Improved critical daily pipeline from 8 hours to 20 minutes, handling around 60 GB of data.
- Conducted data journey workshops to business analyst also created pipeline templates for easy maintenance.
- Designed CI/CD pipeline with AWS CodePipeline and CloudFormation.
- Data ingestion integration with several data providers from different sources like REST API, GraphQL and CDC.
- Mined internal and external sources and joined disparate, non-normalized data sets.
- Integrated information from multiple data sources, solved common transformation problems and resolved data cleansing and quality issues.
- Utilized code and modern cloud-native deployment techniques to design, plan and integrate cloud computing and virtualization systems.
- Understood client needs and objectives by conducting proactive customer and data analysis and researched, designed and implemented scalable applications for data extraction, analysis, retrieval and indexing.
- Conducted data modeling, performance and integration testing and compiled, cleaned and manipulated data for proper handling.
- Building pipelines using native cloud products - PaaS & SaaS.
- Architecting and building complex data pipelines using leading-edge technologies.
Data Engineer
Febrafar
Full Time | 12/03/2018 - 01/03/2020
São Paulo, BR
- Responsible for implementing a data driven culture, migrating all the Excel reports to Python and PySpark.
- Pentaho pipeline migration to PySpark increasing performance by 80%.
- Created Data Lake using Google Cloud Storage and Google BigQuery.
- Data modeling to provide a self-service schema for business analyst and integrated with Power BI.
- Created pipeline to deliver 10k+ personalized reports for customers.
- Pipeline orchestration with Airflow (Google Cloud Composer)
- Generated reports, maintaining dimensional as well as relational data structures and managing operational data store and data warehouse.
- Developed applications and designed processes for transformation and data management from company-wide databases.
- Built data cubes, data marts and queries, maintaining every aspect of storage and translation.
- Created data models and mapped content storage pathways to facilitate easy access.
- Selected methods and criteria for warehouse data evaluation procedures.
- Mapped data between source systems and warehouses and validated warehouse data structure and accuracy.
- Mined internal and external sources and joined disparate, non-normalized data sets.
- Integrated information from multiple data sources, solved common transformation problems and resolved data cleansing and quality issues.
Business Analytics Engineer
Roche
Full Time | 01/07/2017 - 01/03/2018
São Paulo, BR
- Created data warehouse in SQL Server and PySpark to obtain data from Salesforce, making it possible to develop KPI of customer's journey.
- Developed KPI's and dashboards in Power BI allowing full view of forecast process in the organization and strategic focus.
- Migrated VBA and Excel reports to Python.
- Designed and developed analytical data structures.
- Built databases and table structures following Star Schema architecture methodology.
- Explained data results clearly and discussed how it can be utilized to support project objectives.
- Selected methods and criteria for warehouse data evaluation procedures.
- Mapped data between source systems and warehouses.
- Validated warehouse data structure and accuracy.
Business Intelligence Analyst
Santander
Full Time | 01/08/2014 - 01/06/2017
São Paulo, BR
- Automated Excel reports, reducing errors and inconsistency.
- Improved flow of information, developing a centralized pipeline using SQL Server Triggers and Stored Procedures.
- Developed KPI's and dashboards for strategic focus.
- Optimized data sources and processing rules to enhance data quality through design and development phases.
- Determined data storage and optimization policies, shaping organization efforts to enhance performance.
- Compiled, cleaned and manipulated data for proper handling.
- Explained data results clearly and discussed how it can be utilized to support project objectives.
Education
MBA: Data Science
Universidade De São Paulo
Bacharel, Sistemas de Informação
FIAP
Certifications

Microsoft Certified Data Analyst Associate