Data Engineer · ML / AI

I turn messy operational data
into systems people trust.

I build production-grade data pipelines — from raw operational documents to tested, queryable warehouses and the dashboards that make them useful. Strong on data engineering, with applied machine learning and AI.

Selected work

Projects

Each one is self-contained and reproducible — clone, run one command, see it work.

Interactive production & pipeline dashboard for the manufacturing ELT project

Manufacturing PDF → Warehouse

Medallion ELT · Dagster + dbt + DuckDB · Kimball star schema

A production-grade ELT pipeline that turns a plant’s daily PDF production reports into a tested, queryable analytics warehouse on a bronze / silver / gold medallion with a Kimball star schema — surfaced in a self-contained, interactive dashboard. Append-only lineage, a quarantine for bad rows, source-freshness & relationship tests, and architecture “fitness functions.” All data is synthetic and generated from scratch.

PythondbtDagster DuckDBKimball / dimensional modeling Data quality & testingPower BI

MIT Applied Data Science — Portfolio

31 projects · ML · Deep Learning · Recsys · Network analysis

A curated set of projects from the MIT Applied Data Science Program (MIT IDSS / Great Learning): EDA & statistical inference, supervised & unsupervised ML, deep learning (CNNs), recommender systems, and network analysis — each with a written walk-through, a methodology diagram, and the resulting visualizations.

Pythonscikit-learn Deep learning (CNN)Recommender systems Clustering / PCANetwork analysis

Live Flight Telemetry → Warehouse

OpenSky API · DuckDB · dbt Kimball star · real ADS-B

A pipeline that ingests real, live aircraft telemetry from the OpenSky Network API into a tested bronze / silver / gold medallion with a Kimball star schema, and maps live air traffic. Demonstrates production-grade API ingestion — incremental polling, rate-limit & retry handling, immutable landing with lineage — on a real, moving source.

PythonAPI ingestiondbt DuckDBKimball / star schemaGeospatial / time-series
More on the way
Cloud data-engineering projects (BigQuery, AWS) — publishing soon.
What I work with

Skills

Tools I’ve used to ship the work above.

Data Engineering

  • dbt · Dagster
  • DuckDB · SQL Server
  • Medallion & Kimball modeling
  • ELT & orchestration
  • Data quality / testing
  • Power BI

Languages & Tools

  • Python
  • SQL
  • pandas
  • Git & GitHub
  • pdfplumber · reportlab
  • Plotly

ML / AI

  • Applied machine learning
  • Supervised & unsupervised
  • Feature engineering
  • Model evaluation
  • scikit-learn
  • MIT Applied Data Science Program
Background

About

I’m a data engineer who likes the unglamorous parts: getting data out of awkward formats, making transformations testable, and building the warehouse layer that analytics and ML actually depend on. My recent work pairs solid engineering practice — medallion architecture, dimensional modeling, data-quality testing — with applied machine learning from the MIT Applied Data Science Program. I care about systems that are reproducible, well-tested, and clear enough that the next person can trust and extend them.

Get in touch

Contact

Open to data engineering and ML/AI roles. The fastest way to reach me: