Patent Category Classification from Abstracts using Ensemble Methods
May 9, 2026
aniket
1 min read
The Complexity of Legal Text
Classifying European Patent Office (EPO) patent categories from abstracts requires a deep understanding of domain-specific technical and legal terminology. Traditional keyword matching often falls short due to the dense vocabulary.
AI-Driven NLP Models
During the Data Science Student Championship 2024, I designed an AI-based NLP pipeline. The best-performing method was an ensemble technique integrating a LightGBM Classifier into a Bagging Classifier alongside a standalone LightGBM model operating on highly optimized text embeddings.
Achievements
This approach secured the 15th rank globally among 1029 participants, proving that optimized tree-based models on vectorized text can sometimes rival large language models for specific classification tasks.