Categories
Current Events

Spotify’s Mixtape Algorithm

With the launch of Apple Music’s “For You” feature, Spotify hand has been forced to unveil it’s own personalization engine in response. Discover Weekly was launched today via a series of well-timed pieces published today across the tech press. The PR push is on to explain to everyone currently evaluating Apple Music on a 3-month trial.

Spotify describes Discover Weekly as, “like having your best friend make you a personalised mixtape every single week.” More specifically,  “Updated every Monday morning, Discover Weekly brings you two hours of custom-made music recommendations, tailored specifically to you and delivered as a unique Spotify playlist.”

Spotify, to date, has relied mostly on the social sharing of tracks and manually curated playlists (more than 2 billion!) to enhance the experience of the Spotify subscriber. The coverage today highlights the contribution of Echo Nest, an music intelligence and data platform acquired by Spotify in March of 2014. Reading a number of posts we learn the following:

Spotify’s internal tool that they use to build playlists has the wonderful moniker, Truffle Pig.

Inside Spotify’s Hunt for the Perfect Playlist

and,

The Echo Nest’s job within Spotify is to endlessly categorize and organize tracks. The team applies a huge number of attributes to every single song: Is it happy or sad? Is it guitar-driven? Are the vocals spoken or sung? Is it mellow, aggressive, or dancy? On and on the list goes. Meanwhile, the software is also scanning blogs and social networks—ten million posts a day, Lucchese says—to see the words people use to talk about music. With all this data combined, The Echo Nest can start to figure out what a “crunk” song sounds like, or what we mean when we talk about “dirty south” music.

Smart.

maybe some of the songs are bad, or the lead-off song isn’t representative of the rest of the playlist—we’ll try to refine that and give it a shot.” Playlists are made by people, but they live and die by data.

This is another way of underlining the best practices of machine learning. An algorithm is really only as good as it’s training set.

In order to keep a burst of listens from drifting your taste profile towards a fleeting interest, something re/code’s Kafka calls, “the Minions Problem“, Spotify isolates isolated wandering from the core.

Spotify says it solves the Minions Problem by identifying “taste clusters” and looking for outliers. So if you normally listen to 30-year-old indie rock but suddenly have a burst of Christmas music in your listening history, it won’t spend the next few weeks feeding you Frank Sinatra and Bing Crosby. The same goes for kids’ music, which is apparently why Spotify knows I didn’t really like “Happy” that much — it was just in the “Despicable Me 2” soundtrack.

Spotify has built its discovery algorithm on the listening behaviors of its 75 million users while Apple has advertised a more top-heavy approach using designated curators that publish playlists for a mass audience. I have to wonder what happened to all the Genius data that has been gathered after analyzing everyone’s iTunes collections and wonder if we’ll see that being used to balance out Apple’s approach.

I’ve heard that Spotify is working on a “family plan” that would let me break out the collective profile built up on my Spotify account that I share with my kids. That will yield more relevant personal recommendations so I don’t get the hip-hop heavy playlist that greeted me due to my son’s heavy rotation.

I think it’s still very early days and consumers will ultimately benefit in the music recommendation race that has just begun.

Categories
Current Events

The importance of context

Surprisingly, the YouTube recommendation algorithm doesn’t draw inputs from far beyond the confines of YouTube itself. You might think that mining our Google search histories for clues about what videos we’d like would pay off. Nope, Goodrow says.

“The challenge is that web search history is very very broad.” Just because you Googled for help with your taxes does’t mean you want to watch YouTube videos about the ins and outs of U.S. tax law.

To Take on HBO and Netflix, YouTube had to Rewire Itself, Fast Company

Not surprising at all actually. Just because everything on the internet can  be connected doesn’t mean it has to be connected. When the internet is your world, zooming in on contexts and measuring behaviors in those contexts becomes paramount.

Categories
Office

Popping filter bubbles at SmartNews

It’s now just over a month since I joined SmartNews and I am digging into what’s under the hood and the mad science that drives the deceptively simple interface of the SmartNews product.

smartnews

On the surface, SmartNews is a news aggregator. Our server pulls in urls from a variety of feeds and custom crawls but the magic happens when we try and make sense of what we index to refine the 10 million+ stories down to several hundred most important stories of the day. That’s the technical challenge.

The BHAG is to address the increased polarization of society. The filter bubble that results from getting your news from social networks is caused by the echo chamber effect of a news feed optimized to show you more of what you engage with and less of what you do not. Personalization is excellent for increasing relevance in things like search where you need to narrow results to find what you’re looking for but personalization is dangerously limiting for a news product where a narrowly personalized experience has what Filter Bubble author Eli Pariser called the “negative implications for civic discourse.”

So how do you crawl 10 million URLs daily and figure out which stories are important enough for everyone to know? Enter Machine Learning.

I’m still a newbie to this but am beginning to appreciate the promise of the application of machine learning to provide a solution to the problem above. New to machine learning too? Here’s a compelling example of what you can do illustrated in a recent presentation by Samiur Rahman, and engineer at Mattermark that uses machine learning to match news to their company profiles.

Samiur Rahman on Machine Learning

The word relationship map above was the result of a machine learning algorithm being set loose on a corpus of 100,000 documents overnight. By scanning all the sentences in the documents and looking at the occurrence of words that appeared in those sentences and noting the frequency and proximity of those words, the algo was able to learn that Japan: sushi as USA : pizza, and that Einstein : scientist as Picasso : painter.

Those of you paying close attention will notice that some the relationships are off slightly – France : tapas? Google : Yahoo?  This is the power of the human mind at work. We’re great with pattern matches. Machine learning algorithms are just that, something that needs continual tuning. Koizumi : Japan? Well that shows you the limitations of working with a dated corpus of documents.

But take a step back and think about it. In 24 hours, a well-written algorithm can take a blob of text and parse it for meaning and use that to teach itself something about the world in which those documents were created.

Now jump over to SmartNews and understand that our algorithms are processing 10 million news stories each day and figuring out the most important news of the moment. Not only are we looking for what’s important, we’re also determining which section to feature the story, how prominently, where to cut the headline and how to best crop the thumbnail photo.

The algorithm is continually being trained and the questions that it kicks back are just as interesting as the choices it makes.

The push and pull between discovery, diversity, and relevance are all inputs into the ever-evolving algorithm. Today I learned about “exploration vs. exploitation”. How do we tell our users the most important stories of the day in a way that covers the bases but also teaches you something new?

This is a developing story, stay tuned!