Fighting misinformation with Machine Learning

March 20, 2019

This essay was originally written as part of an application to TUM’s Data Analytics Master’s program Machine learning is a double edged sword when it comes to fighting the spread of misinformation. On one hand it enables scalable, accurate detection of deceptive content and can adapt automatically to new threats and challenges [6]. On the other hand, machine learning methods facilitate the generation of fake texts, images and videos with a quality and cost-effectiveness that vastly supersedes previous manual methods [5]. Artificial intelligence can stop an ill-motivated conspiracy theory from spreading on social media within minutes of being posted [3][4]. Artificial intelligence can also create seemingly real video or audio of a person of public interest, making them say whatever is in the bad actor’s best interest [0][1][2]. Combating misinformation requires academics researching machine learning to do two things: Develop effective methods of detecting deceptive content while showing foresight when it comes to malign applications of the technology they create.

Social media has become a prime source of news for many adults, with research showing 62% of Americans getting news from such platforms [7]. While websites like Facebook allow easy and fast access to news, they also enable the fast spreading of deceptive content. Therefore social media platforms are a worthy place to apply machine learning to combat misinformation. Intervening early is crucial. It has been shown that once people form a false political belief, correcting such misconceptions becomes futile or even counterproductive [8]. Relying on exclusively human oversight results in limited effectiveness and delays due to the sheer amount of content being shared every minute. Machine learning systems can help here, working autonomously and reducing the need for human intervention. Detecting and deleting fake content at scale within minutes of posting is one possible use-case. Machine learning could also aid in making the task easier to perform easier for the human fact-checkers, for example through clustering posts into topics, using stance detection to mark them as opposing or supporting, or analysing the text to predict the veracity. Combining those techniques would allow a human to get quick oversight and focus his attention on mediating the damage from the most dangerous fake articles. I see much promise in such ‘Human in the loop’-systems. While still allowing humans to make the final decision on which articles should be treated as fake, they speed up the process immensely. Similar approaches have proven to work well in the context of handling credit-card fraud [9].

In general detecting fake news on social media follows two approaches: Content based or context based. While the former focusses on analysing the text or media itself, the ladder uses additional data being generated by social media, like user interactions and reach of the content, to classify postings into fake or real [10]. Content-based deception detection can use the fact that writing styles often differ between objective, truthful reporting and content created to deceive and manipulate [11]. Misinformative text often uses shorter sentences, simpler vocabulary and is in general easier to comprehend. Handcrafting these features and feeding them into a SVM classifier has proven to work well for separating fake from real [11]. More advanced approaches forego the feature extraction completely, using a black box RNN for end-to-end scoring of the quality of writing for a given text [12]. Context-based methods on the other hand use not the text itself but process the data generated by users interaction with a post. For example, consider a posting on Facebook that has been manually reported by users as a misinformation. Looking at what other posts were ‘liked’ by users that ‘liked’ the fake post allows one to fit a regression model that can identify more deceptive posts with high accuracy [13]. In addition to strictly content or context based methods, hybrid models using both types of data have also found success [6]. The strength of machine learning methods for these types of applications lies in their flexibility and scalability. Upon learning of the methods used to find fake news, bad actor’s might adapt their strategies. By using machine learning as opposed to fixed rules, the models can be adapted and retrained to effectively combat even those new strategies and enable big increases in processing speed compared to purely human classification.

The problem with using machine learning for misinformation lies in it being an effective tool for both sides, detection as well as generation. With the recent surge of interest in generative machine learning models, fake content creation has taken a leap forward. Recent models can now create texts audio, images and video [2][15][16][17] that are well on their way to becoming completely indistinguishable from real. While algorithms tend to fair better than humans at distinguishing fake from real, David Gunning, manager of a DARPA program to build systems that can spot such algorithmically created fakes, paints a bleak picture: “Theoretically, if you gave a GAN all the techniques we know to detect it, it could pass all of those techniques. We don’t know if there’s a limit. It’s unclear.” [18].

Machine learning can be a powerful tool to scalable detection and mitigation of misinformation. It can also be a great enabler for cheap generation of high quality fake content designed to deceive and manipulate. Dealing with requires the research and engineering community to work on two tasks at the same time: Building powerful systems to find fake content while at the same time acting responsibly and showing foresight with the potential implications of the research it publishes.

[0] Thies, Justus, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Niessner. “Face2Face: Real-Time Face Capture and Reenactment of RGB Videos,” 2387–95, 2016.

[1] Gibiansky, Andrew, Sercan Arik, Gregory Diamos, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, and Yanqi Zhou. “Deep Voice 2: Multi-Speaker Neural Text-to-Speech.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 2962–2970. Curran Associates, Inc., 2017.

[2] “Faux Rogan.” Accessed May 20, 2019.

[3] Jin, Zhiwei, Juan Cao, Yongdong Zhang, and Jiebo Luo. “News Verification by Exploiting Conflicting Social Viewpoints in Microblogs.” In Thirtieth AAAI Conference on Artificial Intelligence, 2016.

[4] Farajtabar, Mehrdad, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. “Fake News Mitigation via Point Process Based Intervention.” In Proceedings of the 34th International Conference on Machine Learning - Volume 70, 1097–1106. ICML’17., 2017.

[5] Chesney, Robert, and Danielle Keats Citron. “Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, July 14, 2018.

[6] Ruchansky, Natali, Sungyong Seo, and Yan Liu. “CSI: A Hybrid Deep Model for Fake News Detection.” In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 797–806. CIKM ’17. New York, NY, USA: ACM, 2017.

[7] “News Use Across Social Media Platforms 2016, Pew Research Center,” May 26, 2016.

[8] Nyhan, Brendan, and Jason Reifler. “When Corrections Fail: The Persistence of Political Misperceptions.” Political Behavior 32, no. 2 (June 1, 2010): 303–30.

[9] “Fighting Fake News and Deep Fakes with Machine Learning w/ Delip Rao - #259.” This Week in Machine Learning & AI (blog), May 3, 2019.

[10] Shu, Kai, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. “Fake News Detection on Social Media: A Data Mining Perspective.” SIGKDD Explor. Newsl. 19, no. 1 (September 2017): 22–36.

[11] Afroz, S., M. Brennan, and R. Greenstadt. “Detecting Hoaxes, Frauds, and Deception in Writing Style Online.” In 2012 IEEE Symposium on Security and Privacy, 461–75, 2012.

[12] Taghipour, Kaveh, and Hwee Tou Ng. “A Neural Approach to Automated Essay Scoring.” In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1882–1891, 2016.

[13] Tacchini, Eugenio, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, and Luca de Alfaro. “Some Like It Hoax: Automated Fake News Detection in Social Networks.” ArXiv:1704.07506 [Cs], April 24, 2017.

[14] Vicario, Michela Del, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. “The Spreading of Misinformation Online.” Proceedings of the National Academy of Sciences 113, no. 3 (January 19, 2016): 554–59.

[15] “Better Language Models and Their Implications.” OpenAI, February 14, 2019.

[16] “Which Face Is Real?” Accessed May 20, 2019.

[17] Suwajanakorn, Supasorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. “Synthesizing Obama: Learning Lip Sync from Audio.” ACM Transactions on Graphics 36, no. 4 (July 20, 2017): 1–13.

[18] Knight, Will. “The US Military Is Funding an Effort to Catch Deepfakes and Other AI Trickery.” MIT Technology Review. Accessed May 20, 2019.

Fighting misinformation with Machine Learning - March 20, 2019 - Simon Boehm