Food Image Classification with CNNs

Overview

We taught a computer to look at a photo of food and decide which of three food types it is.

Goal: classify food photos into three categories - Bread, Soup, and Vegetable-Fruit - directly from raw pixels.
Image classification is a core deep learning task enabled by larger datasets and modern compute.
Approach: build and compare convolutional neural networks (CNNs) that learn visual features end to end.
Success means high test accuracy with few confusions between visually similar food classes.

Methodology

flowchart LR
  A[Image Dataset] --> B[Resize / Normalize / Augment]
  B --> C["CNN: Conv + Pooling layers"]
  C --> D[Dense + Softmax]
  D --> E[Train w/ Early Stopping]
  E --> F["Evaluate: Accuracy and Confusion Matrix"]

The Data

We started with folders of food photos already sorted into training and testing piles.

Three balanced classes: Bread, Soup, and Vegetable-Fruit, organized into Training and Testing folders.
Each image is read into a NumPy array and paired with its class index as a labeled tuple.
Labels are one-hot encoded so each class becomes a vector, e.g. Bread = [1, 0, 0].
Class distribution was checked to confirm the three categories are reasonably balanced.
Images are resized to a common 150x150x3 RGB shape for the network input.

Sample Images & Preprocessing

We looked at example pictures from each food type and cleaned them up before training.

Bread items are mostly round, oval, or elliptical and often show grilled or charred portions.
Soup images feature liquid in a container or utensil, with a distinct glare from light reflection.
Vegetable-Fruit images stand out with vibrant colors and repeating shapes across the frame.
Pixel values are normalized to speed up training and reduce the risk of getting stuck in local optima.
Utensils appearing in both Bread and Soup photos were flagged as a likely source of confusion.

CNN Architecture

We built two layered networks, the second one designed to avoid memorizing the training photos.

Model 1: three convolutional blocks, each a Conv2D + MaxPooling2D layer, starting at 64 filters with 3x3 kernels.
Model 1 learns about 1,185,107 parameters with relu activations and 'same' padding - large and prone to overfitting.
Model 2: four convolutional blocks adding a Dropout layer after each, starting at 256 filters with 5x5 kernels.
Model 2 trains about 1,099,171 parameters; Dropout randomly disables neurons to curb overfitting.
Note: this run used a reduced 2-epoch schedule for speed, so accuracy figures are indicative, not final.

Results & Accuracy

The second network did a noticeably better job and confused fewer photos than the first.

Model 1 overfit: training accuracy climbed steadily while validation accuracy lagged behind.
In the full run Model 1 reached roughly 70% test accuracy, often mistaking Bread for Soup.
Adding Dropout in Model 2 closed the train-validation gap and reduced misclassifications.
Bread vs Soup stayed the hardest pair due to shared dishes and utensils, but errors dropped clearly.
With the reduced 2-epoch re-run, curves and confusion matrices show the same qualitative trend, just lower numbers.

Key Takeaways

A well-regularized CNN can reliably sort food photos, and there is still room to push accuracy higher.

Model 2 was the better classifier, correctly predicting a randomly chosen test image.
Dropout and a deeper architecture were the key levers that improved generalization over Model 1.
Visually similar classes that share objects (Bread/Soup utensils) remain the main error source.
Next steps: more epochs, alternative architectures, and different optimizers to lift test accuracy further.
Built with: TensorFlow, Keras, NumPy, Matplotlib, scikit-learn

Tech Stack

pandas — data wrangling and tabular manipulation
numpy — fast numerical arrays
scikit-learn — modeling, pipelines, and evaluation
seaborn — statistical visualization
matplotlib — plotting
tensorflow — deep-learning framework
keras — high-level neural-network API

Attribution

This project was completed as part of the MIT Applied Data Science Program (MIT IDSS / Great Learning). The program provided the case-study scaffolding; the analysis, code, and results are my own. Published with permission, for portfolio use only.