Data Science, Project Description, Social Justice, Techy Babble

LinkNYC – a project and also why aren’t laypeople given the chance to understand the Internet?

I’ve become keenly aware of New York City’s aggressive, aptly-named “LinkNYC” campaign to revamp old, pee-soaked telephone booths by making them into free public Wi-Fi hotspots featuring advertised speeds of up to 1 Gbps.

Prompted by his response to my audible “holy shit,” I explained the situation (in hindsight, using unnecessarily technical electrical engineering terms like data rate and fiber optics) to my boyfriend (civil engineer turned brilliant artist/designer/philosopher).

His response: “huh?” A quick text to my family confirmed that, no, this type of thing is not, in fact, common knowledge.

Then literally the next day, I realized that I’d been seeing gigantic banners on the subway every morning  that spelled out “1 GIG, THE INTERNET SPEED YOUR MOTHER WARNED YOU ABOUT.”

Whoa now copywriters (are those still the people who do ads? this is the one thing I remember from Mad Men), WATERYEWDEWING. I know the implications of that internet speed, and I DIDN’T EVEN NOTICE THE AD. How on earth is a person who hasn’t experienced the pleasures of communication theory, digital signal processing, and wireless communications courses supposed to have any idea of what that means or why they should care?

Isn’t it a bit strange that most of us spend a good portion of our time online, yet have nearly no understanding about what goes on beyond the router?

enter new project!

So thanks to that little revelation, aforementioned boyfriend and I are collaborating on the creation of an infographic that will bring some perspective to the initiative for the averagely techy New Yorker.

Steps:

  1. Decide to communicate the meanings of Internet speeds
  2. Do this by researching the Internet speed requirements of comparative use cases (eg. Netflix vs web browsing)
  3. Decide to also communicate the impact of LinkNYC on the prevalence of public wifi hotspots
  4. Do this by locating* a map detailing where non-LinkNYC and LinkNYC public wifi hotspots are located
    • *Make a map that details where non-LinkNYC and LinkNYC public wifi hotspots are located using NYC open data
  5. Realize that the spread of LinkNYCs on a map looks eerily similar to a gentrification map you saw one time
  6. Decide to also communicate the potential for implications of LinkNYCs and other public wifi locations with respect to class and race disparity
  7. Do this by planning for lots of approximations and printing and drawing to determine the number of non-LinkNYC and LinkNYC public hotspots per neighborhood divided into categories based on average income bracket
  8. Drink some wine
  9. Decide to go home, write a blog post, and go to bed.

Stay tuned for our infographic!

Advertisements
Brain Barf, Data Science

concentric circles and quantum physics

separating the tools from the skills
danger
data science the capitalist mindset repurposement
– it arose from cheap computing making it economically viable which basically means that human created
flashes of voltage inside a human created container electrical pulses from the brain to the blah which in turn
has the potential to widespread effect the electrical pulses in the heads of other humans and alter life
courses it is like an engine
DATA
packages of already existing information
morphing them into other domains
layers
commodification
how can
barriers in understanding and communication of knowledge
data science’s box flexing pushing reaching \\ framework —> flesh out the knowledge part
just because the person doing it understands the math behind it (but let’s be real do they actually every time does it matter it still has the potential to affect tons of people) stakes, what’s the point – to be right? to be closer to right? result-oriented society
process is hidden proprietary
rote memorization vs coming to conclusions by oneself
can’t tell people how to think well you can, but then it’s not about communication information its about force feeding
show vs tell —> art
sharing of experience to meld them
the best way to do this is to superposition oneself and experience everything with everyone but that’s not possible yet
what’s the next best thing
deductive reasoning tunnel vision vs exploratory curiosity inductive reasoning
take experience distill package specifically total hit or miss competence totally elastic vs ineffective collisions which makes sense if the goal is specific and oversimplified and sanitized but that’s machine-like overfit to a specific purpose
reach a bigger audience
not just those who understand algorithms
shared experience of and understanding of selves through augmented intelligence
because what do machine learning algorithms do if not mimic the brain and augment existing connections by bringing to light new connections and patterns
its up to the artist to interpret and create
its a tool an aide a new medium through which a wave travels/information is expressed
art should be revelatory, accessible, facilitate the creation of mutual connections among people
FUCK
medium
agitation
amplify a pain point
create a collective experience
stretch the rubber band
feel alive
embrace frustration
creation of a model of experience understandable by all in some sense
FEEL something
experience existence (live in the moment?)
form of attempting to understand anything
layer by layer
peeling back
superficiality
pain
feeling jolt electric
FRICTION
guide the observer through the layers
packaged answer/result – who knows where that exists in this space?
the key is to make people want to continue the process
guide without providing an end because there isn’t one
a rubber band longer than your stretched out body
forced attempted understanding of one another
doesn’t feel forced – relative – trickery – insidious
force isn’t bad, just implies a resistance
nothing is resistanceless
rubber band —> pulling as opposed to pushing
closer to one another’s shared equilibrium
fucking planets
create openness towards one another – remove friction/gravity?
no because then we would never stop to understand
Brain Barf, Cooper Union, Data Science

On data science and how I see its relationship to art, philosophy, and my experience as an engineering student at Cooper Union, Part 1

As I hurdle towards the inevitable – my college graduation in May 2017 – I’ve been thinking a lot about the purpose of my education, its place in the greater context of my life, and the way it intersects with my place in the world.

I came to Cooper Union essentially by luck, driven by the same things as most high-achieving students proficient in math and science my age – grades, class rank, the perception that those above me in the hierarchies of family and scholarship knew what was best, that the ultimate purpose was to be the Best, do the Best, whatever the hell that even meant. So I ended up applying to engineering school because it fit the persona I felt I was “supposed” to be (smart, useful, wealthy) and thought I wanted to be. Cooper was free, and falling in line with my tendency towards risk-aversion (failure means weakness, and weakness is unacceptable!), that too seemed like the Right thing to do. So I went to Cooper Union.

Like many of my classmates, I was intoxicated by the idea that STEM was a superior set of fields – we were “smart”, we got good grades (the ultimate validation), we enlightened beings understood that the only correct way to look at the world was through the eyes of logic, reason, and rationality. We were – ironically enough – objective zealots.

Except it isn’t in the slightest. Engineering in the way that it’s been presented to me by many of my professors and peers – an overwhelming series of theory-dense courses that reward rote memorization and the ability to perform well under arbitrary pressure, is anything but superior. Like the education of my early and formative years did, it shapes directionless students into 4.0-hungry followers and suppresses recalcitrance and stifles original thought. Of course it does – most of them(us?) have been raised with similar value systems that we swallowed without question – most of them don’t seem to have thoughts outside their field of study or quest for some nebulous sense of ‘success.’

The only time I’ve found myself to be truly happy/thoughtful in my time at Cooper – and I’m not talking about the spikes of adrenaline that accompany the feeling of checking my semester grades – is when I struggle to make sense of something, only to come to the realization that my original perception of the concept or idea in question is missing something. Some examples I can pinpoint: figuring out how cell towers work or experiencing critiques in the class that utterly upended my life.

Last semester I took an art class that challenged the way I saw myself in the classroom setting and totally altered my perceptions of what it means to have “a successful education.” For the first time, I was surrounded by people (artists) who all seemed to have passions and practices that drove their educations, instead of vice versa. There was no ‘right’ answer to find in the solutions manual; the point was not to smile and speak up in class and do the assignments so that the professor would like me and give me an A or a glowing recommendation so that I could get a job and make lots of money and retire in a house with a garage and some dogs. As someone incredibly comforted by following the rules and the paths of other people to avoid discomfort and failure, this class was a shock to my system.

For the first time in my life, I was forced to think for myself. “Bullshitting” a project, as my engineering peers call the execution of an assignment with the minimal amount of work and receiving a stellar final grade, was not a badge of honor anymore. Because making art isn’t about the grade you get when you present a finished work at critique. It’s about how a thought process is explored and questioned and expressed and critiqued, but it’s also about the fact that nothing is ever finished or answered. I assert these things about art, but to be completely honest, my  experiences and perceptions of it are constantly changing but will never reach a deterministic truth. It’s exhilarating.

To be continued…

Brain Barf, Cooper Union, Data Science, Project Description

2017 is the year of realizing things: some half-baked project ideas…

kylie_jenner = pandas.DataFrame(['KUWTK', 'Kylie Kosmetics', 'Tyga4ever', 'why is there a python in this picture?'])

Kylie Jenner said that 2016 was the year of realizing things, but I’d bet Cooper Union’s sticker price (TOO SOON) that she wasn’t referring to the illuminating experience of learning Python for fun over winter break. Yes, I realize that Kylie has lip kits and white Ferraris to focus on, but girl should check out pandas dataframes if she really wants to live.

In an attempt to get over myself and the resulting self-doubt and stubbornness that made me think I wasn’t capable of programming and therefore terrified of failing at it, I spent the last three weeks crash-coursing myself in Python and all of its very awesomely intuitive data science packages.

EdX is great – check out some of their ‘Python for Data Science’ courses if you’re trying to teach yourself to code and have some solid self-discipline to keep you going.

Now that I’m proficient in numpy, pandas, matplotlib, and scikit-learn, I’ve seen the light that is data manipulation/machine learning with Python and have all the regrets that I tried to do all my Statistical Learning assignments last semester in MATLAB. *shudders*

This is cool. Now I should make some cool things that attempt to answer some cool questions.

So if you read my aptly-titled ‘Brain Barf’ post, you know that I have all the feels about doing projects that fulfill my arbitrary standard of what is valuable and useful. Are those feels (and that post) just my thinly-veiled insecurities about never being good enough? Probably. Like I said, working on getting over myself.

I’ve come to terms with the fact that right now it is most valuable for me to practice my skills on projects that challenge the way that I think; doing significant things that change the world will come later when my skill level and mental elasticity get there.

So right now, I’m planning on doing projects related to some questions that I’ve jotted down in my notes recently:

What will happen if Congress defunds Planned Parenthood?

Yes I realize that this is a massive question, but I’m curious about the relationships between maternal death rates, infant and fetal mortality, and crime rates, among other things, and how they’ve changed since Planned Parenthood started offering abortion services in 1970. I also wonder if there’s a significant difference in the trends of graduation rates, suicide rates, and quality of life over that period of time between the biological sexes (namely the male and female sexes, as intersex data is largely unavailable).

The hardest part of this project will probably be the data collection. Some of the features that I’m interested in analyzing are readily available in nice clean datasets, but many (including some of the features I have yet to think of), are not.

What type of brown ale should we brew next?

For those of you following along at home, some important context:

  • I’m a senior studying electrical engineering at Cooper Union (More About Me!)
  • I helped start an interdisciplinary independent study in beer brewing last semester.
  • We brewed some delicious stouts (milk and imperial), an IPA (session), a blonde ale, and a brown ale.

This semester, we’re continuing to brew for fun even though there aren’t credits involved, and we’re trying to refine our process and clone our favorite beers.

BeerAdvocate, a noted beer review website that we use for reference, has reviews for 2677 different brown ales alone. As tempting as it may be, it’s not feasible for our class of 5 people to try 2677 different brown ales before deciding which one to clone.

dd1b67d80680d1b66b6dd9b904bc881c

Enter data science! My brewing professor found a gigabyte worth of scraped beer reviews (YASSSSS I don’t have to deal with scraping!!!) that I can do some text analysis on. The preliminary plan is to look for themes in the reviews and determine the ones that match with our class’s verbal description of the type of brown ale we’d like to brew.

Anyhoo, that’s all I got for now. Will keep y’all posted with project updates and intermittent existential crises!
Brain Barf, Data Science

#NowWhat? and other unfinished bits of brain barf…

In my last post, I talked about a project in which a friend and I scraped popular Twitter hashtags related to sexual violence and performed LDA document clustering and topic modeling on them to see if any interesting patterns emerged. To be honest, it was pretty cool to see an algorithm differentiate users/tweets that supported victims of sexual violence from those who were less than empathetic.

The project has led me to question the utility of data science as it relates to social justice and digital activism – data scraping is cool, machine learning algorithms are fascinating, and meaningful outputs are interesting to talk about, but how does one take the knowledge derived from doing all this data sciencey stuff and actually do something USEFUL with it?

And by useful, I don’t mean the kind of data science that lets me tell HR how many people to fire to improve the bottom line or the kind that my professor gives me an A for because I made my data visualization look pretty. I’m talking about the “using this knowledge to make the world a better place” kind of data science.

So that leads me to the actual question that’s been bothering me for a while:

Does massaging Twitter content into categories do anything beneficial for survivors?

If not in its present state, could it?

This begs some really important questions about how I approach my future projects.

Is it responsible for me to collect data and apply algorithms without a clear direction or intention in mind? What if that direction is a really vague, “I wonder what will happen if…?” Is that approach a responsible way of avoiding self-fulfilling prophecy bias? Is bias in data science necessarily a bad thing?

Data science as a tool does not live inside a bubble. It is inherently outward-facing. According to every book, article, and MOOC out there, the point of it is to answer data-driven questions.

But what if that question isn’t useful?

Cooper Union, Data Science, Project Description, Social Justice, Techy Babble

import #SexualViolence from MachineLearning.LDA

This blog post was originally the author’s writeup for a project she did with her classmate Jason Katz for ECE411: Statistical Learning during the Fall 2016 semester at Cooper Union.

Lots of surreal things happened in 2016…

Harambe, Prince, Muhammad Ali, David Bowie, Harper Lee, Leonard Cohen, Gene Wilder, Alan Rickman, and Zsa Zsa Gabor left us. Brexit happened. Donald Trump got elected president. Flint, Michigan STILL doesn’t have clean water. You get my point.

Fueled by Donald Trump’s misogynistic statements, the many women who have accused him of sexual assault, and his own admission of sexual assault that went viral during the 2016 presidential race, the topic of sexual violence has situated itself at the forefront of national conversation among politicians, pundits, and prominent feminist writers alike. This is just a small segment of the much larger discussion about affirmative consent, Title IX, the rape crisis on college campuses, victim blaming, and slut shaming that has been going on for years.

Ok, but what does Twitter have anything to do with this?

On one hand, it’s great – albeit REALLY overdue – that the news media finally recognizes sexual violence is an issue that warrants substantial airtime. But basically everything on TV that deals with rape culture or misogyny has been neatly packaged into an emotionally flat or intentionally misconstrued block of information to fit into the two second space allotted to it. Because profit.

Screen Shot 2016-12-20 at 3.13.05 PM.png

Case in point.

Twitter, on the other hand, is a gold mine of publicly available, unfiltered visceral reactions. For many survivors of sexual violence, it’s a great outlet for self-therapy in the form of all caps vent rants but also empowerment via an informal support network of people who also tweet about their experiences with similar trauma.

Coming at it from the perspective of a DSWFSAGJFVOSV (data scientist who feels strongly about getting justice for victims of sexual violence), I had a hunch that “survivor Twitter” might have a lot to tell us. 

the techy stuff

Inspired by the hashtags #RapedAtSpelman and #RapedAtMorehouse, which went viral in response to an anonymous account set up by a rape survivor at Spelman College, we used Twitter’s REST API to scrape trending hashtags related to sexual violence and promoted by activists who are sympathetic towards survivors and survivor justice, including:

#whywomendontreport

#notokay

#whyistayed

#rapecultureiswhen

#webelieveyou

#tilithappenstoyou

#askingforit

#thehuntingground

#rapeculture

#listentosurvivors

#thingslongerthanbrockturnersrapesentence

#endrapeculture

#stanfordrapist

#sexualviolence

#EmilyDoe

You get it?

Twitter does this annoying thing where it only lets you scrape a certain number of tweets from the past week, so we had to run our scraping script every couple of days to amass a sufficient collection of tweets.

My fabulous partner did a ton (read: 25 hours) of pre-processing to turn the tweets into parseable text, which we then fed into a latent Dirichlet allocation (LDA) code base.

LDA for the Layperson (CUTE HAMSTERS AHEAD!)

LDA is a well-known method of topic modeling, which in machine learning (ML) and natural language processing (NLP) refers to a statistical model that discovers ‘topics’ that characterize a collection of documents, or in this case, tweets. According to LDA, a document is merely a collection of topics where each topic has some probability of generating a specific word.

Suppose you have a bunch of sentences (or tweets).

  • I am tired because I have a lot of finals this week.
  • Is my finals-related sleep deprivation apparent yet?
  • Sometimes I go to animal shelters and pet kittens to avoid my responsibilities.
  • Baby animal videos are best when they include kittens.
  • Look at this cute hamster munching on a piece of bok choy that I found while procrastinating my stat learning blog post assignment.

If we give our LDA these sentences and ask it to determine 2 topics, LDA might produce something like this:

  • Sentences 1 and 2: 100% Topic A
  • Sentences 3 and 4: 100% Topic B
  • Sentence 5: 60% Topic A, 40% Topic B
  • Topic A: 30% sleep, 15% tired, 10% finals, 10% deprivation, … (at which point you could call topic A finals week)
  • Topic B: 20% animal, 20% kittens, 20% cute, 15% hamster, … (at which point you could call topic B cute animals)

LDA essentially represents documents tweets as mixtures of topics that spit out words with certain probabilities. It assumes that the corpus of documents tweets is produced as follows:

  • decide on the number of words N the document will have
  • choose a topic mixture for the document tweet depending on the number of topics the user (us!) asks for
  • generate each word in the document tweet by
    • picking a topic
    • then using the topic to generate the word from the topic’s set of words

From here, LDA does fancy math to determine a set of latent topics that is likely to have generated the corpus of documents tweets. Our code in particular outputs the top n words from each topic that have the highest probability of being “chosen.”

LDA is cool. Be like LDA.

Screen Shot 2016-12-20 at 3.14.48 PM.png

This is confusing so maybe this picture will help.

We played around with the number of topics and number of top words per topic. When our code resulted in useless words like ‘day’ and ‘introduced,’ we added them to our dictionary of NLTK stop words. By stop words, we mean common words that are filtered out before NLP happens because they are meaningless in the context of topic modeling.

So….what happened?

We got the best results (read: most distinct categories) when we fit LDA models with 3 topics and asked for the top 10 words in each topic:

Screen Shot 2016-12-20 at 3.16.01 PM.png

I’d like to direct your attention to a couple of interesting things that popped up.


Topic #1: Self-Proclaimed Deplorables

screen-shot-2016-12-20-at-3-00-35-pm

America, MAGA, norefugees, obama, nightmares, deport, and DREAMers all point to a sphere of Twitter inhabited by Donald Trump supporters who don’t like the current political landscape of the United States and dislike immigrants and refugees. @lrihendry, in particular, refers to popular alt-right Twitter user Lori Hendry (check out the Pepe the frog emoji in her profile), and qgkmkkhrxu refers to the image name of a heavily retweeted xenophobic, anti-muslim meme that she tagged with #rapeCulture.

screen-shot-2016-12-19-at-11-20-08-pmscreen-shot-2016-12-19-at-11-19-58-pm

Lori kind of sucks. But she did retweet this vine so at least she’s got that going for her.

It’s a bit disconcerting that our scraping picked up enough tweets from ironic coopters of the hashtags #rapeculture and #notokay for LDA to give them their own separate category of tweets.

Many of these tweets were retweets of @lrihendry meme, hence the top words in the category, but there were a few other tweets that referenced the “double standard of rape culture” and “toxic femininity,” a riff on the popular hashtag #toxicmasculinity.


Topic #2: Sexual Violence and Twiplomacy

screen-shot-2016-12-20-at-3-00-42-pm

Schools, repspeier, transfer, and safe refer to The Safe Transfer Act, proposed two weeks ago by California Representative Jackie Speier, which requires notation on the academic transcript of any student found by their college or university to have violated the school’s rules or policies with regards to sexual violence. Despite being a follower of many prominent feminist Twitter accounts, I had no knowledge of this piece of legislation nor of Representative Speier prior to this project.

screen-shot-2016-12-20-at-12-03-57-amscreen-shot-2016-12-20-at-12-03-43-am

Heartbeatbill, man, domestic, and violence come from Twitter user @femtheologian Dr. Gina Messina, a professor and Huffington Post blogger who expressed her anger with a proposed fetal heartbeat bill in Ohio. Domestic violence, in particular, relates to her tweet about Ohio State Senator Kris Jordan, who Messina claims has a history of domestic violence.

screen-shot-2016-12-19-at-11-54-51-pmscreen-shot-2016-12-19-at-11-54-17-pmIn these cases, LDA teased out tweets specifically related to publicly contentious pieces of legislation related to Title IX and sexual violence.


Topic #3: “@kellyoxford: tweet me your first assaults. they aren’t just stats.”

screen-shot-2016-12-20-at-3-00-50-pm

In this category, @kellyoxford, sexualassault, rape, consent, and survivors clearly relate to Twitter user Kelly Oxford, the New York Times bestselling author whose #notokay tweet that encouraged women to tweet their experiences with sexual violence went viral.

screen-shot-2016-12-20-at-2-59-45-pm

Screen Shot 2016-12-20 at 3.04.53 PM.png

Screen Shot 2016-12-20 at 3.58.06 PM.png


@takedownmras refers to a Twitter account run by a group of men who actively disagree with men’s rights activists (MRAs) and work to support survivors of sexual violence. Similar to the accounts that popped up in the last topic, I had no prior knowledge of @takedownmras, but am very glad to have found them.

Screen Shot 2016-12-20 at 4.52.39 PM.png

This category affirms our initial claim that people who have experienced sexual assault use Twitter as a way to form a mutual support network with other survivors. The results are promising; machine learning algorithms may very well be a good way to facilitate more of these connections.

Now what?

This project was only a brief foray into the world of scraping and analyzing Twitter data.

We were under the impression that we understood “survivor Twitter.” Machine learning in the form of LDA helped us take viral hashtags we were familiar with and derive totally new knowledge from them, namely a parallel use of the viral hashtag #rapeCulture and our discovery of politicians for whom survivor justice is a priority.

Given our experiences (and interesting results!), we think there is definitely more to explore in the intersection of Twitter and sexual violence activism. One potential application of LDA is to use it to make it easier for survivors of sexual violence to find each other and the representatives who advocate for them, while avoiding people whose use of these hashtags might do more harm than good for survivors.