In my last post, I talked about a project in which a friend and I scraped popular Twitter hashtags related to sexual violence and performed LDA document clustering and topic modeling on them to see if any interesting patterns emerged. To be honest, it was pretty cool to see an algorithm differentiate users/tweets that supported victims of sexual violence from those who were less than empathetic.
The project has led me to question the utility of data science as it relates to social justice and digital activism – data scraping is cool, machine learning algorithms are fascinating, and meaningful outputs are interesting to talk about, but how does one take the knowledge derived from doing all this data sciencey stuff and actually do something USEFUL with it?
And by useful, I don’t mean the kind of data science that lets me tell HR how many people to fire to improve the bottom line or the kind that my professor gives me an A for because I made my data visualization look pretty. I’m talking about the “using this knowledge to make the world a better place” kind of data science.
So that leads me to the actual question that’s been bothering me for a while:
Does massaging Twitter content into categories do anything beneficial for survivors?
If not in its present state, could it?
This begs some really important questions about how I approach my future projects.
Is it responsible for me to collect data and apply algorithms without a clear direction or intention in mind? What if that direction is a really vague, “I wonder what will happen if…?” Is that approach a responsible way of avoiding self-fulfilling prophecy bias? Is bias in data science necessarily a bad thing?
Data science as a tool does not live inside a bubble. It is inherently outward-facing. According to every book, article, and MOOC out there, the point of it is to answer data-driven questions.
But what if that question isn’t useful?