Reuters Tracer combs Twitter for news

According to internal research, Reuters determined that 10-20% of news broke first on Twitter.

Reuters, the news agency that first scooped its rivals with the use of carrier pigeons, is seeing good results from an algorithm to sift through Twitter (over 12 million tweets/day,  2% of total volume) to search for signal in the noise. Reuters Tracer is the system summarized in MIT’s Technology Review, How Reuters’s Revolutionary AI System Gathers Global News

The first step in the process is to siphon the Twitter data stream. Tracer examines about 12 million tweets a day, 2 percent of the total. Half of these are sampled at random; the other half come from a list of Twitter accounts curated by Reuters’s human journalists. They include the accounts of other news organizations, significant companies, influential individuals, and so on.

The next stage is to determine when a news event has occurred. Tracer does this by assuming that an event has occurred if several people start talking about it at once. So it uses a clustering algorithm to find these conversations.

Of course, these clusters include spam, advertisements, ordinary chat, and so on. Only some of them refer to newsworthy events.

So the next stage is to classify and prioritize the events. Tracer uses a number of algorithms to do this. The first identifies the topic of the conversation. It then compares this with a database of topics that the Reuters team has gathered from tweets produced by 31 official news accounts, such as @CNN, @BBCBreaking, and @nytimes as well as news aggregators like @BreakingNews.

At this stage, the algorithm also determines the location of the event using a database of cities and location-based keywords.

Once a conversation or rumor is potentially identified as news, an important consideration is its veracity. To determine this, Tracer looks for the source by identifying the earliest tweet in the conversation that mentions the topic and any sites it points to. It then consults a database listing known producers of fake news, such as the National Report, or satirical news sites such as The Onion.

Finally, the system writes a headline and summary and distributes the news throughout the Reuters organization.

Three recent events and their corresponding Tracer’s and Reuters alerts.

More details (and attached screenshots) sourced from the paper, Reuters Tracer: Toward Automated News ProductionUsing Large Scale Social Media Data