Data Science Portfolio

Stephen Marn · MIT Applied Data Science Program (MIT IDSS / Great Learning)

eda

CardioGood Fitness: Treadmill Customer Profiling

Descriptive analytics to build a buyer profile for each of three treadmill product lines

eda

FIFA World Cup Analysis

Mining 80+ years of World Cup history to guide a new football club's strategy

eda

Uber NYC Trip Demand Analysis

Exploring six months of NYC Uber pickups to understand when and where ride demand peaks

network

CAVIAR Criminal Network Analysis

Tracking how a Montreal drug-trafficking network reorganized under repeated police seizures

unsupervised

Clustering Countries by Socio-Economic Profile

Grouping 167 nations by health, trade, and income indicators to guide aid and development decisions

network

Enron Email Network Analysis

Using social network analysis to surface the key actors in Enron's senior leadership

unsupervised

Genomic Data Clustering: Decoding the Genetic Code

Using unsupervised learning to rediscover that DNA reads in three-letter codons

unsupervised

Unsupervised Pattern Discovery with PCA and t-SNE

Compressing high-dimensional education and air-pollution data to reveal hidden structure

regression

BigMart Sales Prediction

Predicting item-level outlet sales with interpretable linear regression

classification

Employee Attrition Prediction

Why employees leave — and predicting who is at risk

regression

Predicting Hospital Length of Stay

Forecasting patient length-of-stay at admission to help HealthPlus plan beds, staff, and resources

regression

SuperKart Retail Sales Forecasting

Predicting per-product store sales for the upcoming quarter with linear regression

regression

Bitcoin Price Prediction

Forecasting monthly Bitcoin closing prices with classical time-series models

classification

Celestial Object Detection

Classifying stars, galaxies, and quasars from Sloan Digital Sky Survey photometry

classification

Predicting Employee Attrition at McCurr Health Consultancy

An end-to-end classification pipeline to flag at-risk employees before they leave

regression

Predicting Hospital Length of Stay for HealthPlus

A deployable regression model that forecasts patient length of stay at admission to plan beds, staff, and resources

classification

Predicting Hotel Booking Cancellations for INN Hotels

Using tree-based classifiers to flag at-risk bookings before they cancel

cnn

Audio MNIST: Spoken-Digit Recognition with a Neural Network

Classifying spoken digits 0-9 from raw .wav audio using MFCC features and a Keras ANN

cnn

CIFAR-10 Image Classification with CNNs

Classifying 32x32 color images into 10 object classes with convolutional neural networks and transfer learning

gnn

Citation Network Classification with Graph Neural Networks

Predicting a paper's research topic from how it cites other papers, using a GCN on the Cora dataset

cnn

COVID-19 Chest X-Ray Classification

A CNN decision-aid that triages chest X-rays into COVID, Normal, and Viral Pneumonia

classification

Predicting Employee Attrition with Deep Learning

An artificial neural network that flags which data scientists are likely to switch jobs

cnn

Food Image Classification with CNNs

Teaching a convolutional neural network to tell Bread, Soup, and Vegetable-Fruit apart

regression

Predicting Graduate Admission Chances

A neural network that flags which applicants are likely to be admitted to UCLA

gnn

Movie Recommendation with Graph Neural Networks

Learning movie embeddings from co-viewing patterns on MovieLens to suggest the next film to watch

cnn

Rice Type Classification with CNNs

Sorting five rice varieties from magnified grain images using deep learning

recommender

Book Recommendation System

Comparing rank-based, collaborative filtering, and matrix factorization approaches to recommend books

recommender

MovieLens Movie Recommendation System

Recommending relevant movies from user rating history with popularity, collaborative filtering, and SVD

recommender

Building a Product Recommender at Scale

Comparing rank-based and collaborative-filtering recommenders on millions of user ratings

recommender

Yelp Restaurant Recommendation System

Recommending restaurants from Yelp reviews using both collaborative filtering and content-based NLP

regression

Used Cars Price Prediction (Capstone)

Pricing 7,253 used cars for Cars4U with regression on log price