Detecting Fake News Using Machine Learning

Authors

  • Elsa Norman Yorktown High School

DOI:

https://doi.org/10.47611/jsrhs.v12i1.3940

Keywords:

Fake News, Artificial Intelligence, Machine Learning, Linear SVC, Natural Language Processing

Abstract

Fake news has had a significant effect on society and politics. To aid in combating the spread of misinformation, we worked to develop a machine learning algorithm that could detect fake news based on textual data. We used a count vectorizer to vectorize our text which we then inputted into Logistic Regression, Support Vector Machine (SVM), and Linear Support Vector Classifier (SVC) models. The greatest accuracy score achieved was 99.97% with the Linear SVC. We discovered however that there was a significant difference in how the real and fake news datasets were constructed that would not translate into real life: the true news articles contained quotation marks, apostrophes, and dashes while these characters were not present in the fake news articles. Because of this, we also developed a more applicable Logistic Regression model removing these specific characters from the dataset all together with an accuracy score of 98.4%.

Downloads

Download data is not yet available.

References or Bibliography

Ahmed, H., Traore, I., Saad, S. “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques.” https://www.uvic.ca/ecs/ece/isot/assets/docs/Detection%20of%20Online%20Fake%20News%20Using%20N-Gram.pdf?utm_medium=redirect&utm_source=/engineering/ece/isot/assets/docs/Detection%20of%20Online%20Fake%20News%20Using%20N-Gram.pdf&utm_campaign=redirect-usage.

Allcott, H., Gentzkow, M. “Social Media and Fake News in the 2016 Election.” https://web.stanford.edu/~gentzkow/research/fakenews.pdf

Aphiwongsophon, S., Chongstitvatana, P. “Detecting Fake News with Machine Learning Method.” https://d1wqtxts1xzle7.cloudfront.net/59012493/Detecting-Fake-News-submit20190424-97672-6rhwzo-with-cover-page-v2.pdf?Expires=1656512062&Signature=YDwzCpIwNJzqCViEMp~j-OvEYgY11u-F-49RDTNiph9OQ10xTgEnHPPpUXmo2I6d3sO2isDxeyvn5QmLVZB-CalmLpmwsPOhOCsFjR06~VwlgW8nyg94t-49T91wErp0FNKhdEJJaGFkbMlG28Qup419mRYO-6cBQIkLqSRLyc3pEEsnx1XP-Wp19UqW~RySlW0EyGeMtyZ5dxQnxn-zQZ56FiaNo26dmaiHmmSzlizS3bHW7d70DuFqXxPCNt~oijx~HpKHgwsZtGnUud4mPvekbnZdE-yUHyKT04jWhPXBwFHHeElUB5srMQD38Rs-KTr49WhKXKDwksW8ueiwSg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA.

Dizikes, P. “Study: On Twitter, false news travels faster than true stories.” MIT, 8 Mar. 2018, https://news.mit.edu/2018/study-twitter-false-news-travels-faster-true-stories-0308.

Dreisbach, T. “How Trump's 'will be wild!' tweet drew rioters to the Capitol on Jan. 6.” NPR, 13 Jul. 2022, https://www.npr.org/2022/07/13/1111341161/how-trumps-will-be-wild-tweet-drew-rioters-to-the-capitol-on-jan-6.

Joachims, T. “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” https://www.cs.cornell.edu/people/tj/publications/joachims_98a.pdf.

Khanna, C. “Text Pre-Processing: Stop Words Removal Using Different Libraries.” Towards Data Science, 10 Feb. 2021, https://towardsdatascience.com/text-pre-processing-stop-words-removal-using-different-libraries-f20bac19929a#:~:text=Stop%20words%20are%20available%20in,focus%20to%20the%20important%20information.

Published

02-28-2023

How to Cite

Norman, E. (2023). Detecting Fake News Using Machine Learning. Journal of Student Research, 12(1). https://doi.org/10.47611/jsrhs.v12i1.3940

Issue

Section

HS Research Projects