5. NLP Contractor Classification
I’ve made it a habit to always look back at my previous work experiences and think about how what I’m learning now could have helped me do my job better or automate some tasks. Recently, I’ve been learning about Natural Language Processing, which involves transforming text data into a format that a machine learning algorithm can understand. After learning some of the basics of NLP I was eager to try out this concept on some data that I collected myself during my time as a construction manager.
The data contained information on extra construction work that was needed to complete a project. My goal was to classify contractors based on the description of the work that they have done.
Project Overview
-
Cleaned the raw data and checked for null values
-
Explored and visualized the data
-
prepared the ‘work_description’ column by:
- Removing common words and punctuation
- Tokenizing the descriptions (convert each description into a list of words)
- Performing vectorization: use SciKit Learn’s bag-of-words model
-
Ensemble testing with Multiple Linear Regression, Random Forest, K-NN, and Naive Bayes classifier
Results
| Classifier | accuracy |
|---|---|
| Multiple Linear Regression | 0.90 |
| Random Forest | 0.88 |
| K-NN | 0.75 |
| Naive Bayes | 0.89 |