← All Projects

Citation Network Classification with Graph Neural Networks

Predicting a paper's research topic from how it cites other papers, using a GCN on the Cora dataset

Overview

We want to guess what topic a research paper is about by looking at which other papers it cites.

Methodology

flowchart LR
  A["Graph: Nodes + Edges"] --> B[Node Features]
  B --> C["Graph Neural Network layers"]
  C --> D[Train]
  D --> E["Evaluate: Node-Classification Accuracy"]

The Data (Citation Graph)

The data is a web of 2,708 papers connected by who cites whom, with each paper described by the words it uses.

Graph Construction

We turn the papers and citations into a graph the computer can read, then draw a slice of it to see the structure.

GNN Model

The model blends each paper's words with its neighbors' information over two passes, then picks the most likely topic.

Results

From just 140 labeled papers the model correctly identifies the topic of about 4 out of 5 unseen papers.

Key Takeaways

Connections between papers are powerful predictors, and graph neural networks turn those connections into accurate guesses.

Tech Stack

Attribution

This project was completed as part of the MIT Applied Data Science Program (MIT IDSS / Great Learning). The program provided the case-study scaffolding; the analysis, code, and results are my own. Published with permission, for portfolio use only.