The Court of Social Media: Leveraging Computational Methods to Predict Public Response to Sexual Misconduct News Stories


More than a handful of public figures have been accused of (and admitted to) sexual misconduct. Given the nature of how sexual misconduct stories are released to the public, the details of each transgression are generally not made equally explicit. Hence, it should not be surprising that social media sentiment and judgment concerning each sexual misconduct news story widely varies. Humans are quite capable of understanding and discerning different sentiments and judgments from one another. Computers, in contrast, are known to struggle with these tasks. However, the nature of their precise and algorithmic processing of information enables them to detect latent patterns that the human eye may miss. This raises the question of how we may leverage computational techniques to analyze how differing language features in separate sexual misconduct news articles affect public sentiment. To address this question, this honors project will use a specific computational method known as machine learning to determine latent language features that influence public opinion (as captured by social media). Understanding how distinct language features influence the public’s judgments on figures accused of a sexual misconduct has important implications for how dialogue should move forward in contemporary feminist movements such as #TimesUp and #MeToo.


  1. Jamie Hershaw says:

    Hi Hayden,

    I’m very excited about your research and am so encouraged to see a new generation of capable young women using big data to answer important questions. I am a cognitive neuroscientist, so my questions for you are influenced by that framework. You state that the patterns detected by various algorithms may be latent to human observers. Have you considered the (philosophical?) idea that these algorithms are detecting patterns created by humans, but that, as you say, are not detectable by humans? I find interesting the idea that we are using computers to understand patterns that are the result of human cognition, but can not be deciphered by human cognition. On a somewhat related note, have you considered that what people post online (which I assume is one of your measured outcome variables for which you are trying to develop a prediction model?) is possibly heavily influenced by random noise/error related to psychosocial factors such as response bias, impression management, etc…? Also, I’m curious what your measures are, exactly.

  2. How interesting. For someone from the humanities what does it mean to “use a specific computational method”? Are you quantifying the words that are used in a bunch of articles? Does “machine learning” imply that computers are learning the words that will create the most clicks, and including those? I can’t wait to learn more about your project.