Skip to content
View dogeraldi05's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report dogeraldi05

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dogeraldi05/README.md

Hi, I'm Douglas Geraldi πŸ‘‹

🎯 Applied Data Scientist / Data Analyst (Remote)
⚑ Energy & Power Systems domain background | 🐍 Python | πŸ”₯ PySpark | πŸ“Š Data Analysis | 🧱 ETL mindset

I'm building a portfolio focused on data-driven problem solving in the energy sector, combining analytics + machine learning + scalable processing with PySpark.
I’m currently interested in remote opportunities (USA & Canada) in Data Science / Data Analytics.


πŸ”₯ Featured Projects

⚑ US Power Outages Analysis with PySpark (2023)

A PySpark + ML project using a public outage dataset (77k+ records) to:

  • explore outage patterns
  • classify outage event types with Random Forest
  • predict outage duration with Linear Regression
  • visualize results in a notebook workflow

➑️ Repo: OutagesUSA_1
πŸ”— https://github.com/dogeraldi05/OutagesUSA_1
2


🧰 ETL Toolkit (experiments)

A small collection of practical studies and utilities around:

  • decorators
  • retry strategies
  • time measurement
  • ETL-related patterns

➑️ Repo: etl-toolkit-test01
πŸ”— https://github.com/dogeraldi05/etl-toolkit-test01
3


πŸ“š Learning / Practice Repos

  • penguins_1 β€” first explorations using the Penguins dataset (EDA practice). 1
  • repo_1_crops β€” notebooks and early experiments (Jupyter). 1
  • read β€” learning repository based on Read the Docs tutorial template. 4
  • github-slideshow β€” GitHub Learning Lab training repository. 5

🧩 Core Skills (growing focus)

  • Python, PySpark, notebooks (Colab/Jupyter)
  • Data analysis & visualization (EDA, metrics, charts)
  • Basic ML workflows (classification/regression with Spark MLlib)
  • ETL patterns and reliability practices (retry, timing, modular code)

🎯 What I’m looking for

βœ… Remote roles in Data Analyst / Applied Data Scientist
βœ… Strong interest in energy / utilities / climate / industrial analytics
βœ… Teams that value domain knowledge + practical analytics


πŸ“« Let’s connect

Pinned Loading

  1. OutagesUSA_1 OutagesUSA_1 Public

    PysPark experimental around USA outages in 2023 using ML

    Jupyter Notebook