Using Machine Learning Algorithms to Detect Fake News

Authors

  • Cody Lawrenceville School
  • Nicole Lantz The Lawrenceville School

DOI:

https://doi.org/10.47611/jsrhs.v11i4.3446

Keywords:

Fake News, Yellow Journalism, Artificial Intelligence, Machine Learning, Support Vector Machine, Latent Dirichlet Allocation, Gradient Boosting, Boosting, Naive Bayes, Classification

Abstract

Fake news has been a growing threat in the modern world. A major reason why fake news is so dangerous and effective is due to the difficulties of distinguishing it from correct news, if there was a way to detect fake news accurately, its negative impact could be significantly minimized. Previous studies have already found that fake news differentiated itself substantially from real news in terms of words used and the structure of the texts, implying the possibility of differentiation. One possible method of detecting fake news is Machine Learning. Utilizing artificial intelligence to detect patterns within the text of fake and real news articles. In this paper, we test the capability of the Machine Learning Algorithms in detecting fake news using four different types of models, SVM, Multinomial NB, Gradient Boosting, and Gradient Boosting with LDA. We find that all four models had a high success rate of over 90%, with the LDA+Gradient Boosting model performing the best, and Multinomial NB being the least successful. We also attempt to determine the topics that fake news tends to cover and found that fake news is often about politics. While the model has proven to be successful, we recommend that future testing be done on other datasets with greater variety in news sources.

Downloads

Download data is not yet available.

References or Bibliography

Ahmed, H., Traoré, I., & Saad, S. (2017). Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. ISDDC. https://doi.org/10.1007/978-3-319-69155-8_9

Ahmed, H., Traore, I., & Saad, S. (2018). Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1), e9. https://doi.org/10.1002/spy2.9

Ali, M. (2020). PyCaret: An open source, low-code machine learning library in Python. https://www.pycaret.org

Bansal, H. (2020, November 25). Latent Dirichlet allocation. Medium. https://medium.com/analytics-vidhya/latent-dirichelt-allocation-1ec8729589d4

Bkkbrad. (2008, February 24). Latent Dirichlet allocation [Diagram]. Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Latent_Dirichlet_allocation.svg

Harris, C. R., Millman, K. J., Van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Van Kerkwijk, M. H., Brett, M., Haldane, A., Del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357-362. https://doi.org/10.1038/s41586-020-2649-2

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785.

Haag, M., & Salam, M. (2017, June 22). Gunman in ‘Pizzagate’ Shooting Is Sentenced to 4 Years in Prison. The New York Times - Breaking News, US News, World News and Videos. https://www.nytimes.com/2017/06/22/us/pizzagate-attack-sentence.html

Horne, B., & Adali, S. (2017). This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. https://doi.org/10.48550/arXiv.1703.09398

Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. CoRR,

https://doi.org/10.48550/arXiv.cs/0205028

.

Mishra, K. (2019, November 29). Machine learning : Bayes theorem. Medium. https://seeve.medium.com/machine-learning-bayes-theorem-2f48c33d51e5

OpenClipArt. (2014, September 4). SVM (Support Vector Machines) diagram vector image. FreeSVG. https://freesvg.org/svm-support-vector-machines-diagram-vector-image

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

https://doi.org/10.48550/arXiv.1201.0490

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A., & Gulin, A. (2019). CatBoost: unbiased boosting with categorical features. https://doi.org/10.48550/arXiv.1706.09516

Rubin, V., Conroy, N., Chen, Y., & Cornwell, S. (2016). Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News. In Proceedings of the Second Workshop on Computational Approaches to Deception Detection (pp. 7–17). Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-0802

Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018) Fake News Detection: A Deep Learning Approach. In SMU Data Science Review: Vol. 1: No. 3, Article 10. https://scholar.smu.edu/datasciencereview/vol1/iss3/10

Wineburg, S., McGrew, S., Breakstone, J., & Ortega, T. (2016, November 22). Evaluating information: The cornerstone of civic online reasoning. Stanford Digital Repository. https://purl.stanford.edu/fv751yt5934

World Health Organization. (2019). Ten health issues WHO will tackle this year. WHO | World Health Organization. https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019

Published

11-30-2022

How to Cite

Gao, Q., & Lantz, N. (2022). Using Machine Learning Algorithms to Detect Fake News. Journal of Student Research, 11(4). https://doi.org/10.47611/jsrhs.v11i4.3446

Issue

Section

HS Research Projects