Data Engineer · ML / AI

I turn messy operational data
into systems people trust.

I build production-grade data pipelines — from raw operational documents to tested, queryable warehouses and the dashboards that make them useful. Strong on data engineering, with applied machine learning and AI.

View projects Résumé ↗ GitHub ↗ LinkedIn ↗ Email

Selected work

Projects

Each one is self-contained and reproducible — clone, run one command, see it work.

Interactive production & pipeline dashboard for the manufacturing ELT project

Manufacturing PDF → Warehouse

Medallion ELT · Dagster + dbt + DuckDB · Kimball star schema

A production-grade ELT pipeline that turns a plant’s daily PDF production reports into a tested, queryable analytics warehouse on a bronze / silver / gold medallion with a Kimball star schema — surfaced in a self-contained, interactive dashboard. Append-only lineage, a quarantine for bad rows, source-freshness & relationship tests, and architecture “fitness functions.” All data is synthetic and generated from scratch.

PythondbtDagster DuckDBKimball / dimensional modeling Data quality & testingPower BI

Live dashboard ↗ View code ↗ Case study ↗

MIT Applied Data Science — Portfolio

31 projects · ML · Deep Learning · Recsys · Network analysis

A curated set of projects from the MIT Applied Data Science Program (MIT IDSS / Great Learning): EDA & statistical inference, supervised & unsupervised ML, deep learning (CNNs), recommender systems, and network analysis — each with a written walk-through, a methodology diagram, and the resulting visualizations.

Pythonscikit-learn Deep learning (CNN)Recommender systems Clustering / PCANetwork analysis

Live site ↗ View code ↗

Live Flight Telemetry → Warehouse

OpenSky API · DuckDB · dbt Kimball star · real ADS-B

A pipeline that ingests real, live aircraft telemetry from the OpenSky Network API into a tested bronze / silver / gold medallion with a Kimball star schema, and maps live air traffic. Demonstrates production-grade API ingestion — incremental polling, rate-limit & retry handling, immutable landing with lineage — on a real, moving source.

PythonAPI ingestiondbt DuckDBKimball / star schemaGeospatial / time-series

Code & dashboard ↗

More on the way
Cloud data-engineering projects (BigQuery, AWS) — publishing soon.

What I work with

Skills

Tools I’ve used to ship the work above.

Data Engineering

dbt · Dagster
DuckDB · SQL Server
Medallion & Kimball modeling
ELT & orchestration
Data quality / testing
Power BI

Languages & Tools

Python
SQL
pandas
Git & GitHub
pdfplumber · reportlab
Plotly

ML / AI

Applied machine learning
Supervised & unsupervised
Feature engineering
Model evaluation
scikit-learn
MIT Applied Data Science Program

Background

About

I’m a data engineer who likes the unglamorous parts: getting data out of awkward formats, making transformations testable, and building the warehouse layer that analytics and ML actually depend on. My recent work pairs solid engineering practice — medallion architecture, dimensional modeling, data-quality testing — with applied machine learning from the MIT Applied Data Science Program. I care about systems that are reproducible, well-tested, and clear enough that the next person can trust and extend them.

Get in touch

Contact

Open to data engineering and ML/AI roles. The fastest way to reach me:

stevemarn27@gmail.com LinkedIn ↗ github.com/Steve27M ↗ Résumé (PDF) ↗

I turn messy operational datainto systems people trust.