← All Projects

Used Cars Price Prediction (Capstone)

Pricing 7,253 used cars for Cars4U with regression on log price

Overview

We built a tool that estimates a fair resale price for any used car so the company can buy and sell with confidence.

Methodology

flowchart LR
  A[Raw Data] --> B[Clean & Encode]
  B --> C[EDA]
  C --> D[Train/Test Split]
  D --> E["Random Forest / Linear Regression / Ridge / Lasso"]
  E --> F["Tune (Cross-Validation)"]
  F --> G["Evaluate: R2 / RMSE"]

The Data

We started with about 7,250 used-car records and cleaned them into a reliable set of roughly 6,000 cars ready for modeling.

Exploratory Analysis

Prices and mileage were extremely lopsided, so we used a log transform to make the patterns clearer and easier to model.

Key Drivers of Price

A car's age, how far it's been driven, its power and engine size, brand, and fuel type were the biggest factors in its price.

Modeling & Results

We tested several prediction methods; the simplest regression on log price was the most accurate and reliable.

Business Recommendations

Adopt the regression pricing model to set listings instantly and flag mispriced cars, with the tree model as a backup.

More Visualizations

Tech Stack

Attribution

This project was completed as part of the MIT Applied Data Science Program (MIT IDSS / Great Learning). The program provided the case-study scaffolding; the analysis, code, and results are my own. Published with permission, for portfolio use only.