π― Applied Data Scientist / Data Analyst (Remote)
β‘ Energy & Power Systems domain background | π Python | π₯ PySpark | π Data Analysis | π§± ETL mindset
I'm building a portfolio focused on data-driven problem solving in the energy sector, combining analytics + machine learning + scalable processing with PySpark.
Iβm currently interested in remote opportunities (USA & Canada) in Data Science / Data Analytics.
A PySpark + ML project using a public outage dataset (77k+ records) to:
- explore outage patterns
- classify outage event types with Random Forest
- predict outage duration with Linear Regression
- visualize results in a notebook workflow
β‘οΈ Repo: OutagesUSA_1
π https://github.com/dogeraldi05/OutagesUSA_1
2
A small collection of practical studies and utilities around:
- decorators
- retry strategies
- time measurement
- ETL-related patterns
β‘οΈ Repo: etl-toolkit-test01
π https://github.com/dogeraldi05/etl-toolkit-test01
3
- penguins_1 β first explorations using the Penguins dataset (EDA practice). 1
- repo_1_crops β notebooks and early experiments (Jupyter). 1
- read β learning repository based on Read the Docs tutorial template. 4
- github-slideshow β GitHub Learning Lab training repository. 5
- Python, PySpark, notebooks (Colab/Jupyter)
- Data analysis & visualization (EDA, metrics, charts)
- Basic ML workflows (classification/regression with Spark MLlib)
- ETL patterns and reliability practices (retry, timing, modular code)
β
Remote roles in Data Analyst / Applied Data Scientist
β
Strong interest in energy / utilities / climate / industrial analytics
β
Teams that value domain knowledge + practical analytics
- LinkedIn: https://www.linkedin.com/in/douglas-geraldi/
- GitHub: https://github.com/dogeraldi05 ``