Comparison of the efficacy of Natural Language Processing Algorithms at classifying Cyberbullying Tweets

Authors

  • Eric Cui The Episcopal Academy
  • Christopher Brown Scholar Launch

DOI:

https://doi.org/10.47611/jsrhs.v11i4.3432

Keywords:

machine learning, natural language processing, cyberbullying tweets, logistic regression, long short-term memory, XGboost

Abstract

Machine Learning is frequently used to predict and classify data. Natural Language Processing (NLP) uses machine learning to classify strings of words. There are many different machine learning models that can be used for NLP, with three main categories being regression, decision tree, and neural net models. Each has their own advantages and drawbacks. After being trained and tested on a set of tweets concerning cyberbullying, Logistic Regression, XGboost, and Long Short-Term Memory (LSTM) were compared in terms of several metrics, including accuracy, recall, precision, and f1-score. Afterwards, the metrics were considered in combination with model runtime and complexity to determine which model was most appropriate for the given dataset and other similar datasets. Logistic Regression was found to lack sufficient complexity to properly classify the data. LSTM had worse metrics than XGboost and had significantly higher complexity and runtime. XGboost performed best, with the highest metrics and relatively short runtime.

Downloads

Download data is not yet available.

Author Biography

Christopher Brown, Scholar Launch

Advisor

References or Bibliography

Srivastava, A., Saini, S., & Gupta, D. (2019). Comparison of various machine learning techniques and ıts uses in different fields. In 2019 3rd International conference on electronics, communication and aerospace technology (ICECA) (pp. 81–86). Coimbatore, India.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785

Falessi, D., Cantone, G., & Canfora, G. (2010). A Comprehensive Characterization of NLP Techniques for Identifying Equivalent Requirements. Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. Παρουσιάστηκε στο Bolzano-Bozen, Italy. doi:10.1145/1852786.1852810

Bakliwal, A., Arora, P., Patil, A., & Varma, V. (2011, November). Towards Enhanced Opinion Classification using NLP Techniques. In Proceedings of the workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011) (pp. 101-107).

Searle, T., Ibrahim, Z., & Dobson, R. (2020). Comparing natural language processing techniques for Alzheimer's dementia prediction in spontaneous speech. arXiv preprint arXiv:2006.07358.

J. Wang, K. Fu, C.T. Lu, “SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection,” Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), December 10-13, 2020.

Published

11-30-2022

How to Cite

Cui, E., & Brown, C. (2022). Comparison of the efficacy of Natural Language Processing Algorithms at classifying Cyberbullying Tweets. Journal of Student Research, 11(4). https://doi.org/10.47611/jsrhs.v11i4.3432

Issue

Section

HS Research Articles