PriorityHR: NLP-Powered Ticket Classification for HR

This project aims to predict the priority of support tickets based on their content using machine learning techniques. The project utilizes a combination of data loading, preprocessing, dimension reduction, and machine learning model training to achieve its goal.

Overview:

The project involves the following key steps:

Data Load & Validation: The initial step involves loading the raw ticket data from a CSV file and performing basic validation to ensure data integrity.
Data Preparation: The text data from the tickets is preprocessed and converted into numeric form using TF-IDF vectorization. This process involves converting text data into a matrix of TF-IDF features, which represent the importance of each word in the corpus.
Dimension Reduction: Due to the high dimensionality of the TF-IDF vectorized data, dimension reduction techniques such as Principal Component Analysis (PCA) are employed to reduce the number of features while preserving the variance in the data.
Machine Learning Models: Two machine learning algorithms, Naive Bayes and Logistic Regression, are trained on the preprocessed data to predict the priority of support tickets.
Model Evaluation: The trained models are evaluated using metrics such as precision, recall, F1-score, and confusion matrix on a separate testing dataset to assess their performance.
Model Deployment: The best-performing model, Logistic Regression, is selected for deployment. It is serialized and saved for future use in predicting the priority of new support tickets.
Prediction for New Cases: A function is implemented to predict the priority of new support tickets based on the deployed model. This function takes the text of the tickets as input and returns the predicted priority.

Technologies Used:

Python: The project is implemented using Python programming language.
Libraries: Key libraries utilized include pandas, numpy, scikit-learn, and joblib for data manipulation, machine learning, and model serialization.
Machine Learning Models: Naive Bayes and Logistic Regression algorithms are employed for ticket priority prediction.
Dimension Reduction: Principal Component Analysis (PCA) is used for reducing the dimensionality of the TF-IDF vectorized data.

Conclusion:

By leveraging machine learning techniques, this project facilitates the efficient prioritization of support tickets based on their content, thereby enabling timely resolution of critical issues.