Pinterest is a Database of Human Intentions

If you pin something to a board, the name of that board is a string and that string by definition describes it. Someone else pins the same thing to another board. And on and on. One board says shirts, one says ikat, one says gifts for my wife, one says red things. And most pins are on thousands and thousands of boards. So there are thousands of human-generated strings that describe each of these objects. These are descriptions that are very meaningful to the people who created them. It’s not someone trying to make a machine smarter. And we think it will make a machine smarter because it will solve a human problem.

What is Pinterest? A Database of Intentions

And later, on NPR’s Fresh Air, Alexis Madrigal, the author of the post above, expands on what Pinterest can do with these “strings used to describe objects”

By letting people copy and label images, Pinterest created this rich database of persons, places and things. And it is just beginning to use that data to help people find stuff. With a programming team that’s largely been hired away from Google, Pinterest has begun offering what it calls “guided search.”

Pinterest co-founder Evan Sharp told me that guided search helps you find things you didn’t know that you were looking for. If Google is great when you know exactly what you want, Pinterest can help you figure out what you want. As you search, Pinterest will suggest tags that you could add to help narrow your query. Search for hats on Pinterest and you might get “fedora” or “baseball” or “church lady” as suggestions.

Back in 2003, John Battelle was blogging about The Database of Intentions as, “The aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result” as an artifact of created by our interactions with search indexes creating, “a massive database of desires, needs, wants, and likes” of all humankind.

Back in 2003, the database was still made up of links manually added and clicks manually clicked. Today, screen-scrapers, bots, and click-farms automate much of what used to be a human activity. So much so that the human intent is lost.

So much so that Pinterest’s competitive advantage is to put the human back into discovery.

David Segal on Search

David Segal, the same New York Times journalist who filed the fascinating 5,500+ word piece in November about decormyeyes.com is back again. The Dirty Little Secrets of Search with a great piece looking into an unwitting, client (and now victim) of black hat SEO, JC Penney. His piece goes into quite a bit of detail (for a mainstream newspaper anyway) on the underground mechanics of the link sharing economy but what I like best is the following description.

When you read the enormous list of sites with Penney links, the landscape of the Internet acquires a whole new topography. It starts to seem like a city with a few familiar, well-kept buildings, surrounded by millions of hovels kept upright for no purpose other than the ads that are painted on their walls.

Exploiting those hovels for links is a Google no-no. The company’s guidelines warn against using tricks to improve search engine rankings, including what it refers to as “link schemes.” The penalty for getting caught is a pair of virtual concrete shoes: the company sinks in Google’s results.

Sounds like a rough neighborhood.

Using Wordle to Visualize Keyword Traffic

On Avinash’s excellent Ten Steps to Love & Success post on Web Data Analysis he writes about using the keyword tag cloud visualization tool Wordle to visualize search terms used to reach your site. Below is a visualization that I made in representing the top 500 phrases which were used to discover this site over the past three years.

Click for full size image

It took me only a few minutes to make this image. All you need is Google Analytics running on your site, Excel or other spreadsheet software, a text editor, and access to wordle.net. Here’s what you do:

  1. In Google Analytics, take a look at the Traffic Sources > Keywords report. By default it will show you the top 10 search terms from the past month. Change this to the top 500 and extend the date range to the maximum history of your blog.
  2. Click Export up top and export the data as a CSV file. If you have Excel, use it to strip out the extra columns and rows. You only need the Keywords and number of times used. Strip away everything else. Save what’s left as a csv file that you can open in your text editor (*.txt).
  3. Open the file you just saved in your text editor and replace the “,” on each row with a colon “:”
  4. Open your browser to the advanced tab on wordle.net.
  5. Copy the keyword rows from your text editor and paste them into the weighted words or phrases box on the wordle.net site (the first box).
  6. Click Go and visualize

You can fiddle around with the fonts, color and layout until you get an image you like. Share in the comments links to your own creations.

Fun with Google Social Search

Google Social Search is now available in Google Labs. Danny Sullivan has the most in-depth coverage out there but it’s worth turning on yourself because you’re going to be the best judge of how well this feature performs for you.

Be sure to dig deeper than the default two results Google throws timidly down at the bottom of your search results. Click Show Options and click on the Social category in your sidebar. Check out the results you get for People and you’ll get a feel for the power of this feature.

Google Social Search

You have the full index of Google quickly filtered by names that should be familiar to you. In a stroke of genius, Google has re-labeled this extended network your, “social circle” which a much better label than “friends” (which didn’t sound right) or “social graph” which no one really understood.

The screenshot above is my aha moment – I had no idea that Caterina ever was in Finland and certainly didn’t know that she took the time to suggestions of things to do in Helsinki. This is a post from 2003 which most certainly have been lost to the sands of time if it were not for this feature which surfaced this gem in just a few clicks.

And now for a feature request. As you can see, this is a great feature for retrospective searches which makes the foggy past plainly visible.  What about adding an RSS subscribe option so that you could apply these searches to the prospective future?

It would be great to know when any of my extended social circle also wrote about my adopted hometown. I suppose it would be easy enough to script something but why not build it in?

Reblog this post [with Zemanta]

Real-time Search, Art or Science?

Erik Schonfeld at TechCrunch posted a thought-provoking piece on real-time search. Twitter and Facebook are falling over each other in the media spotlight, fighting to be the place to go to find out the now.

There is something about human nature which makes us want to prioritize information by how recent it is, and that is the fundamental appeal of real time search. The difference between real time search and regular search didn’t really crystallize for me until I had a conversation with Edo Segal, who sold his real time search company Relegence to AOL a few years ago and holds three patents on the subject. “Real time taps into consciousness,” says Segal, “search taps into memory. That is why it so potent. You experience the world in real time.”

Like a moth to the flame, we are drawn in by the seductiveness of the freshest information but when we get too much of it, we’re overwhelmed with the banality of everything by everybody at the same time. It’s like listening to 1,000 CDs at once – perfect resolution saturates the senses.

What we really want is a way to zero in on the patterns that matter. Like the proverbial early bird, when you get fresh information that’s relevant and actionable, you can go further and faster on less. Back in start-up land I would use all my alerting tools to jump on new technologies that I knew would get buzz so that we could piggy-back on their eventual coverage. Just the other day, I noticed that Marshall Kirkpatrick was trolling for ideas and I jumped on that and was rewarded with a mention in his story about Augmented Reality.

How do you monitor the hourly hum and look for emerging patterns before they become obvious? That problem is as old as the weather forecaster looking for hurricanes or the stock broker parsing the wires for tips.

When does gossip become news? In the financial industry you need to strike the right balance between moving on privilaged information. If only a few people know something, it’s not going to move a stock – it has to be common knowledge in order for the masses to place their bets.

I used to sell financial news wire subscriptions in Tokyo to equity traders and there was one broker that wouldn’t replace her Reuters subscription with one from Dow Jones even after I showed her several examples where DJ beat Reuters on market moving news. In Tokyo DJ was an underdog so it was a tough sell. She wouldn’t budge, insisting that she needed to see the same news others in the market. She didn’t want to have to think and weigh each piece of news, she looked at her quote screen and would use jumps in price to draw her to the news feed. It was a cleaner signal, albeit a commodity.

Finding relevant emerging patterns before they become trends, that is the challenge. I’ve said it before, better filters will be the next great algorithm war. We’ve all become enamored with twitter search which we all know is imperfect and self-selecting but compelling nonetheless.  I think we’ll also find that getting real-time alerting right on a broader scale is going to be more art and less science.

YouTube as a Search Engine

My son was featured in yesterday’s Sunday New York Times in an article (At First, Funny videos. Now, a Reference Tool) about the unforeseen use of YouTube as a research tool. We all associate videos with entertainment but Tyler has taught me that with the addition of meta-data and micro-chunked content, it’s possible to use YouTube as a rich source of reference material.

Tyler's New York Times article

I was contacted by the reporter, who had seen a post on ReadWriteWeb about Tyler’s use of YouTube and wanted to bring the story to the New York Times’ readers.

My father commented, “It is the inclination of succeeding generations to simplify.” Tyler is on to something. For certain things (contact juggling, macarena, or bugatti vs. fighter jet), YouTube is going to explain things to you better and quicker than plain old text search results. You can sort by not only Relevance and Date Added but also using meta-data from community actions such as Ratings and View Count. Finally, using the example from the article, if you search on platypus, embedded in the results is a pre-defined playlist of over 40 video clips all about the animal.

Tyler was pleased to see that the article was in the “Bright Ideas” section. His comment about his pose in the photo was that after over 200 photos his head was feeling a little heavy. Strangely, the local newsstand didn’t carry the Sunday Times so we had to go to a Starbucks to get a copy for the photo above and as a keepsake.

Reblog this post [with Zemanta]

Techmeme is hiring

Techmeme is hiring someone to tweak their algorithms. It’s a new kind of role but one which I think we’ll be seeing more of in the future; in newsrooms and in corporate PR departments. When it’s so easy to aggregate, the next great war will be over the filter algorithm.

From the posting on craigslist (which I discovered via Matthew Ingram)

We’re not sure what to call this position. News Technician? News Analyst? Configuring Editor? The role involves interacting with an automated news-picking computer algorithm, configuring it and prodding it to ensure balanced and comprehensive coverage of important news topic areas. It’s the kind of job that possibly has never existed until 2008 but will become increasingly important in the years ahead.

Sounds fascinating.

Reblog this post [with Zemanta]

Wagging your Long Tail with Just for You

What if you could ask each reader that came to your blog what they were interested in and show them a list of posts from your archives that matched those interests? I’ve been blogging for over five years and as posts roll off the front page they fade into the archives to be mostly forgotten,.

Today MyBlogLog published a WordPress plug-in that grew out of a concept that I’ve been playing around with for the past year. Forget contextual matching for relevance and targeting, what if you could match against someone’s stated interests? Blow past trying to parse out meaning from the other text floating around on the current page and reach through the glass and query against the tags that people attach to their MyBlogLog profile. Target the Reader, not the Page. It’s a vision of programming that says, “OK, now that you’re here on this site, did you know there was a series of articles this author wrote about your passion for Harley Davidson motorcycles last year?”

Marshall Kirkpatrick of ReadWriteWeb writes,

There are countless companies that have raised millions in venture capital to offer publishers recommendation systems for their readers – commercial publishers pay big money for this functionality. Now bloggers can have the same type of thing for free

The Just for You plug-in works with hosted WordPress and, once installed, looks at each visitor to your site to see if they are a MyBlogLog user. If they are, the plug-in looks up the tags on that user’s profile and searches through your blog’s archives and presents a list of headlines pointing to posts that match those tags in a widget that runs in your blog’s sidebar. For more details and sample screenshots, see my post on the MyBlogLog Blog.

If you look to the right, the Just for You widget is right there, five headlines fresh from my Archives for your reading pleasure. If you’re a MyBlogLog user, let me know in the comments if they match the tags on your profile. If you’re not a MyBlogLog member, what you see is a collection of headlines based on the the tags of the most recent MyBlogLog visitors to the site so hopefully there’s some connection to why you’re there as well. Either way, I’m interested in your thoughts.

BrowseRank – Microsoft’s Answer to PageRank

Microsoft announced today that they’ve discovered a better way to rank web pages. While Google’s PageRank sorts roughly on the number of incoming links that point to a page, a vote of confidence by bloggers and website editors, Microsoft’s BrowseRank looks at browsing behavior to see which links get more clicks.

Sounds good on the surface. More democratic because it looks at the entire browsing population, right?

The more visits of the page made by the users and the longer time periods spent by the users on the page, the more likely the page is important. We can leverage hundreds of millions of users’ implicit voting on page importance

Not so fast. Andy Beal points out the obvious shortcomings:

“More visits?” – sure, spammers will have no idea how to inflate that metric.

“Longer time periods?” – couldn’t that also mean that your web site usability and navigation just sucks?

I would add a third. For this to work it requires that Microsoft know each and every link that you visit. I don’t know about you but there has to be a pretty good personal benefit for me to let Microsoft peer over my shoulder and take notes on every site I visit. Maybe they’ll just pay people. But as with Live Search cashback, that’s just going to attract the wrong audience and skew your biases.