Tag Archives: big data

Obama Data Mining Team

Data Mining the Electorate

Obama Data Mining Team

The New York Times Magazine had a cover piece on the Obama data mining team that used modern data-mining techniques to more efficiently target the undecided voters that they needed to bring across the fence to win the election. Check out the last line (emphasis mine) on their clever use of Facebook photo tags as a way to further refine their targeting to determine who your real friends were. If they identified any of your close friends as potential voters that were on their “undecided” list, they would then put them on a list of friends for you to ask to vote for Obama.

They started with a list that grew to a million people who had signed into the campaign Web site through Facebook. When people opted to do so, they were met with a prompt asking to grant the campaign permission to scan their Facebook friends lists, their photos and other personal information. In another prompt, the campaign asked for access to the users’ Facebook news feeds, which 25 percent declined, St. Clair said.

Once permission was granted, the campaign had access to millions of names and faces they could match against their lists of persuadable voters, potential donors, unregistered voters and so on. “It would take us 5 to 10 seconds to get a friends list and match it against the voter list,” St. Clair said. They found matches about 50 percent of the time, he said. But the campaign’s ultimate goal was to deputize the closest Obama-supporting friends of voters who were wavering in their affections for the president. “We would grab the top 50 you were most active with and then crawl their wall” to figure out who were most likely to be their real-life friends, not just casual Facebook acquaintances. St. Clair, a former high-school marching-band member who now wears a leather Diesel jacket, explained: “We asked to see photos but really we were looking for who were tagged in photos with you, which was a really great way to dredge up old college friends — and ex-girlfriends,” he said.

- Data You Can Believe In

twitter-geo-tokyo

Digital Contrails

What do you do when you have access to the twitter firehose and a top notch geo-visualization artist? Make beautiful maps of course! Gnip and Eric Fischer got together with MapBox and plotted millions of tweets by location, language, and device to come up with some fantastic interactive maps.

twitter-geo-tokyo

 

The map above is Tokyo and the blue dots represent the location of geo-stamped tweets by people identified by their tweet history as locals while those dots in red are “tourists” who normally tweet from somewhere outside the region. The map tells you a couple of things.

  1. Most tourists are tweeting (photo-sharing?) from the major city centers. I can recognize Shibuya, Shinjuku, Marunouchi, Yokohama, Ueno, Ikebukuro, maybe the Rainbow Bridge?
  2. If you’re familiar with Tokyo, you can see that people tend to tweet while on the train.

This second point reminds me of something I read in Wired a couple of years ago. In an experiment, researches placed oat flakes in a pattern that resembled the major city centers in Tokyo. Then they place a culture of slime mold in the middle and let the culture figure out how best to harvest or “move around” the oat flakes across the pattern. What they found was that the mold grew a series of tunnels that matched the patterns found on the metropolitan rail system.

What works on a large scale also fits a pattern at a much smaller scale.

slime mold

I’ve written about Location Traces as Art before. Even before the crazy NSA/Snowden tracking scandal broke it was a well-known fact that the phone companies had a wealth of data about us. Aggregated en-mass in platforms such as twitter, this data can paint an pretty amazing picture of the world around us. A couple more maps from the Gnip/Fischer/MapBox collaboration.

twitter-devices

It’s a little hard to see but this is a map of the world that shows which type of twitter client is used when a tweet is made. The Red is iPhone, Green is Android, and Purple is Blackberry. Looks like Spain is big on Android (for twitter anyway) while Saudi Arabia, Mexico, and Southeast Asia are Blackberry strongholds (where BBM is huge).

bayareadevices

If we look at my neighborhood, you can see that I mostly live in an iPhone town except for a Oakland/San Leandro which is more into Android. I know what you’re thinking, The Atlantic already wrote about it. When you see lots of green, it usually signifies a less affluent area.

Fascinating stuff.

Nate Silver

Revenge of the Nerds

While everyone spoke of New York Times blogger, Nate Silver’s uncanny, almost witchlike ability to call the election last night, the big winner was the triumph of big data and smart algorithms over gut feel and egos.

Those in tech that have been following Nate Silver’s FiveThirtyEight blog at the New York Times broke out in collective high-fives when FiveThirtyEight finished the evening correctly calling 50 out of 50 of the states (besting his 2008 call of 49 out of 50). A baseball statistics geek, Mr. Silver turned to politics and the aggregation of state and national polls as a playground of data ripe for his insights. Traditional polling agencies such as Gallup accuse Nate Silver of standing on their backs and taking all the glory (1 in 5 visits to nytimes.com stopped by to visit FiveThirtyEight). Their complaint is one we’ve heard before, that without their original polling data, Nate would have nothing to aggregate.

Sounds like the what the newspapers used to say about Google News.

But in reality it’s more than just aggregation. Nate Silver and others like him (Votamatic, Princeton Election Consortium) rigorously analyzed what they pulled together and revealed patterns that let the data speak for itself. The accuracy of this approach is a huge wake up call to any pundit that did not take into consideration a data driven approach.

While Nate Silver has put the Science back into Political Science, the data-driven approach to politics is also transforming the sell side, the people that run the campaigns. Time magazine has a fascinating piece on the team that would use modern data aggregation techniques borrowed from online advertising exchanges and e-commerce funnel analysis to segment and target potential supporters of the Obama campaign.

As one official put it, the time of “guys sitting in a back room smoking cigars, saying ‘We always buy 60 Minutes’” is over. In politics, the era of big data has arrived.

- Inside the Secret World of the Data Crunchers Who Helped Obama Win

With the use of data to predict a winner or run a campaign, it is only natural that news organizations too use data as a way to make a point. Data visualizations are one way to convey information that is now becoming de riguer for any self-respecting newsroom. The Guardian started the Data Blog and the New York Times launched beta620 to experiment with data. Some of the best coverage of the local and state elections  (such as the image below) came from the Los Angeles Times’ Data Desk which I think is a great idea for any media organization, anything that raises data literacy.

Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it. – Data Journalism Handbook