Felipe Dias de Souza

Jan 30, 2025

Azure Data Lakehouse Pipeline

Description:

Objectives:

Technologies Used:

The image illustrates the end-to-end data pipeline, starting from data ingestion using Azure Data Factory to a structured lakehouse architecture in Azure Data Lake Gen2. The transformation follows the Bronze, Silver, and Gold layer approach, optimizing data quality and analytics.

This approach ensures that raw data is collected in the Bronze layer, processed into structured tables in the Silver layer, and transformed into analytical models (Star Schema) in the Gold layer, stored in Delta Lake for efficient querying.

Repository

Jan 05, 2025

Data Engineering Pipeline

Description:

Objectives:

Key Steps:

Ingestion: Data is ingested from on-premise sources into Azure Data Lake via Azure Data Factory.
Transformation: Data flows through the Bronze (raw), Silver (cleaned), and Gold (aggregated) layers using Azure Databricks for processing.
Analytics: Transformed data is loaded into Azure Synapse Analytics and visualized in Power BI dashboards.
Security and Governance: The architecture ensures compliance and security using Azure Active Directory and Azure Key Vault for identity management and sensitive data protection.

This architecture highlights the core principles of modern data engineering—data lakehouse integration, scalable ETL/ELT workflows, and secure access control.

Repository

Aug 03, 2022

Marketing Campaign

Description:

Objective:

Implications:

The plot of the distribution of the indicators, such as Amount Collected, Unit Sold, Montly Target, and Client Type helping identify which indicators have the strongest correlation with CPI-U.

This bar plot visualizes the return on investment (ROI) by variable and account type. Each bar represents a specific variable, and the height of the bar indicates the average return on investment in dollars ($) associated with that variable. The bars are grouped by account type, with different colors representing different types of accounts.

Repository

Aug 03, 2022

Economic Analysis Consumer Price Index - USA

Description:

Objective:

Implications:

The plot of the Heatmap above showing the correlation matrix between CPI-U and the economic indicators, helping identify which indicators have the strongest correlation with CPI-U.

Ploted above the historical trend of CPI-U alongside each economic indicator (Unemployment Rate, Labor Force Participation Rate, Treasury and Agency Securities, All Commercial Banks Data) over time.

The plot of the Feature Importance above ranking to show which economic indicators have the most significant impact on predicting CPI-U.

Repository

April 25, 2021

Steel Plate Defect

Problem Statement:

Context:

Objective:

Analysis:

This plot above visualizes the ROC AUC Scores for Classification Models.

This plot above illustrates the classification report showing the main classification metrics such as precision, recall and f1-score.

April 10, 2021

Weather Analysis

Problem Statement:

Context:

Objective:

Analysis:

This plot above visualizes the air temperature trends throughout the year 2023. The blue line represents the daily temperature variations, while the red dashed line indicates the mean temperature for the year. The plot helps to observe temperature fluctuations over time and highlights the average temperature level for the entire year.

This plot above illustrates the performance comparison of different regression models including Linear Regression, Ridge Regression, Lasso Regression, and Random Forest Regression. Each boxplot represents the distribution of Root Mean Squared Error (RMSE) scores obtained through cross-validation for a specific model. The lower the RMSE, the better the model's predictive performance. This analysis aids in selecting the most suitable regression model for the given dataset based on its predictive accuracy.

This plot above compares the target temperatures with the model temperatures over time. Each point represents a specific date, with the target temperatures indicated by one set of points and the model temperatures indicated by another set. The plot helps visualize the relationship between the observed and predicted temperatures, aiding in assessing the accuracy and performance of the model across different dates.

Repository

March 15, 2021

Segment Shopping Customer

Problem Statement:

Context:

Objective:

Analysis:

This plot visualizes the bivariate clustering of annual income and spending score. The black stars represent the cluster centers obtained from the clustering algorithm. Each point on the scatter plot represents a data point, with the x-axis indicating annual income and the y-axis indicating spending score. The plot provides insights into the relationships and patterns present in the data regarding spending behavior and income levels.

Repository

Felipe Dias de Souza PORTFOLIO

Felipe Dias de Souza
PORTFOLIO