Phishing Email Classification using Logistic Regression
Identifying Malicious Communications
Phishing attacks remain one of the most prominent cybersecurity threats. Building a robust machine learning-based approach to distinguish phishing emails from legitimate ones is critical for modern spam filters.
Modeling Strategy
In a recent Hiring Hackathon by MachineHack, I evaluated multiple classification models to tackle this problem. While one might instinctively reach for complex NLP models, extensive benchmarking revealed that Logistic Regression, when paired with optimal hyperparameters and robust TF-IDF feature extraction, provided the best balance of speed and accuracy.
Performance
The model achieved an outstanding F1 Score of 0.99988 on the public leaderboard and 0.99984 on the private leaderboard, ranking 34th out of 109 participants. This highlights the power of fundamental machine learning algorithms when feature engineering is done correctly.