All Articles Portfolio

Patent Category Classification from Abstracts using Ensemble Methods

May 9, 2026 aniket 1 min read

The Complexity of Legal Text

Classifying European Patent Office (EPO) patent categories from abstracts requires a deep understanding of domain-specific technical and legal terminology. Traditional keyword matching often falls short due to the dense vocabulary.

AI-Driven NLP Models

During the Data Science Student Championship 2024, I designed an AI-based NLP pipeline. The best-performing method was an ensemble technique integrating a LightGBM Classifier into a Bagging Classifier alongside a standalone LightGBM model operating on highly optimized text embeddings.

Achievements

This approach secured the 15th rank globally among 1029 participants, proving that optimized tree-based models on vectorized text can sometimes rival large language models for specific classification tasks.