Skip to content

everwas

a blog by Ian Kennedy

Reading List
Projects
Search 🔍

Tag: search

YouTube as a Search Engine

My son was featured in yesterday’s Sunday New York Times in an article (At First, Funny videos. Now, a Reference Tool) about the unforeseen use of YouTube as a research tool. We all associate videos with entertainment but Tyler has taught me that with the addition of meta-data and micro-chunked content, it’s possible to use YouTube as a rich source of reference material.

I was contacted by the reporter, who had seen a post on ReadWriteWeb about Tyler’s use of YouTube and wanted to bring the story to the New York Times’ readers.

My father commented, “It is the inclination of succeeding generations to simplify.” Tyler is on to something. For certain things (contact juggling, macarena, or bugatti vs. fighter jet), YouTube is going to explain things to you better and quicker than plain old text search results. You can sort by not only Relevance and Date Added but also using meta-data from community actions such as Ratings and View Count. Finally, using the example from the article, if you search on platypus, embedded in the results is a pre-defined playlist of over 40 video clips all about the animal.

Tyler was pleased to see that the article was in the “Bright Ideas” section. His comment about his pose in the photo was that after over 200 photos his head was feeling a little heavy. Strangely, the local newsstand didn’t carry the Sunday Times so we had to go to a Starbucks to get a copy for the photo above and as a keepsake.

January 19, 2009
Techmeme is hiring

Techmeme is hiring someone to tweak their algorithms. It’s a new kind of role but one which I think we’ll be seeing more of in the future; in newsrooms and in corporate PR departments. When it’s so easy to aggregate, the next great war will be over the filter algorithm.

From the posting on craigslist (which I discovered via Matthew Ingram)

We’re not sure what to call this position. News Technician? News Analyst? Configuring Editor? The role involves interacting with an automated news-picking computer algorithm, configuring it and prodding it to ensure balanced and comprehensive coverage of important news topic areas. It’s the kind of job that possibly has never existed until 2008 but will become increasingly important in the years ahead.

Sounds fascinating.

November 18, 2008
Wagging your Long Tail with Just for You

What if you could ask each reader that came to your blog what they were interested in and show them a list of posts from your archives that matched those interests? I’ve been blogging for over five years and as posts roll off the front page they fade into the archives to be mostly forgotten,.

Today MyBlogLog published a WordPress plug-in that grew out of a concept that I’ve been playing around with for the past year. Forget contextual matching for relevance and targeting, what if you could match against someone’s stated interests? Blow past trying to parse out meaning from the other text floating around on the current page and reach through the glass and query against the tags that people attach to their MyBlogLog profile. Target the Reader, not the Page. It’s a vision of programming that says, “OK, now that you’re here on this site, did you know there was a series of articles this author wrote about your passion for Harley Davidson motorcycles last year?”

Marshall Kirkpatrick of ReadWriteWeb writes,

There are countless companies that have raised millions in venture capital to offer publishers recommendation systems for their readers – commercial publishers pay big money for this functionality. Now bloggers can have the same type of thing for free

The Just for You plug-in works with hosted WordPress and, once installed, looks at each visitor to your site to see if they are a MyBlogLog user. If they are, the plug-in looks up the tags on that user’s profile and searches through your blog’s archives and presents a list of headlines pointing to posts that match those tags in a widget that runs in your blog’s sidebar. For more details and sample screenshots, see my post on the MyBlogLog Blog.

If you look to the right, the Just for You widget is right there, five headlines fresh from my Archives for your reading pleasure. If you’re a MyBlogLog user, let me know in the comments if they match the tags on your profile. If you’re not a MyBlogLog member, what you see is a collection of headlines based on the the tags of the most recent MyBlogLog visitors to the site so hopefully there’s some connection to why you’re there as well. Either way, I’m interested in your thoughts.

August 7, 2008
BrowseRank – Microsoft’s Answer to PageRank

Microsoft announced today that they’ve discovered a better way to rank web pages. While Google’s PageRank sorts roughly on the number of incoming links that point to a page, a vote of confidence by bloggers and website editors, Microsoft’s BrowseRank looks at browsing behavior to see which links get more clicks.

Sounds good on the surface. More democratic because it looks at the entire browsing population, right?

The more visits of the page made by the users and the longer time periods spent by the users on the page, the more likely the page is important. We can leverage hundreds of millions of users’ implicit voting on page importance

Not so fast. Andy Beal points out the obvious shortcomings:

“More visits?” – sure, spammers will have no idea how to inflate that metric.

“Longer time periods?” – couldn’t that also mean that your web site usability and navigation just sucks?

I would add a third. For this to work it requires that Microsoft know each and every link that you visit. I don’t know about you but there has to be a pretty good personal benefit for me to let Microsoft peer over my shoulder and take notes on every site I visit. Maybe they’ll just pay people. But as with Live Search cashback, that’s just going to attract the wrong audience and skew your biases.

July 25, 2008
Google’s Flash-Eating Spider

This announcement is definitely cool and will open up whole new areas of the web to search. But truthfully I just wanted to post this because it lends itself to a great headline.

From the FAQ posted on the Google Webmaster Blog:

Q: What content can Google better index from these Flash files?
All of the text that users can see as they interact with your Flash file. If your website contains Flash, the textual content in your Flash files can be used when Google generates a snippet for your website. Also, the words that appear in your Flash files can be used to match query terms in Google searches.

In addition to finding and indexing the textual content in Flash files, we’re also discovering URLs that appear in Flash files, and feeding them into our crawling pipeline—just like we do with URLs that appear in non-Flash webpages. For example, if your Flash application contains links to pages inside your website, Google may now be better able to discover and crawl more of your website.

July 1, 2008
The Lifestream Filter Will be the Next Great Algorithm War
I’m paraphrasing the title of this post from David Recordon who threw this line out following a chat I had with him a couple of weeks back. It’s a very insightful observation that predicts opportunities in the real-time world which lifestream services operate.

It’s now easier than ever to pull together an aggregated feed of content from across the web. Facebook and FriendFeed organize this content around your friends and contacts. MyBlogLog also presents a New in My Neighborhood view which shows a mixed feed of all your contact’s lifestream content. Yet, once you get more than a handful a friends on these systems, the number of updates (especially if any of them are using twitter) quickly spins out beyond what you can handle.

Twitter is often used to announce new blog posts and the new broadcast service from Six Apart, Blog It, only exasperates the problem by spawning multiple posts from a single Facebook entry. We live in a world where finding out what your friends are doing is not a problem. The difficulty is in filtering through the hundreds of updates that stream by each day to those events that are most relevant without losing the sense of serendipitous discovery that we experience today.

So here we are today. It’s like we’re all discovering search engines all over again. In a matter of weeks we’ve gone from “Wow! I can find everything here!” to, “Crap! Over 600,000 results for the phrase Serendipitous Discovery? How can I find the one reference I’m looking for?”

The huge opportunity ahead is a filter to bubble up the things you need to know without missing anything you want to know.

A couple of posts point to this being a trend
- Web 3.0 Will Be About Reducing the Noise – TechCrunch
- Lifestreaming Services Need Better Filtering – Lifestream blog
We’re trying a few things out at MyBlogLog that vector results based on how you have tagged yourself on your profile. Right now, in a user’s New in My World feed, it’s a straight, chronological feed based on items that match your tags. Also, because it’s based on meta-data, this only means we can present you with items that are tagged so that leaves out plain text updates such as twitter posts but we’re just getting started.

As David’s quote indicates, this is a huge opportunity and something I look forward to working on. I look forward to a robust debate on different approaches in the coming weeks!

Person David Recordon

Right click for SmartMenu shortcuts
April 21, 2008
Go on, cheat a little

Yahoo has joined up with the folks at the New York Times crosswords to promote the new Search Assist feature with a contest. The idea is that you fill the puzzle out successfully and you too can be entered into a drawing for one of five trips to Hawaii. Thing is, this thing is a gimme. Next to each clue is a link to a “Hint” which runs a search in the pane below against Yahoo’s Search Assist which will serve things up for you right there and then. It’s a great way to show off the new Search Assist and may give you a new reason to work on your crosswords with the browser handy.

I found out about this via a new group on Facebook. Join Yahoo! Pilot if you want to find out about the latest stuff going on at Yahoo! I can’t believe I found something not written up by the folks over at Yahoo! Cool thing of the Day, my usual source for tweaks and trivia about Yahoo – must have caught them asleep at the switch!

October 14, 2007
Yes, but ours go to “11”

If you haven’t checked out the new Yahoo Search Assist, by all means do. Someone’s finally got the clustered search and suggestive results thing right. Type something into search.yahoo.com and hesitate just a bit and the pane will come rushing out with suggestions.

On a lighter, Ryan Grove, one of the engineers who worked on the enhancements, points out that our search results now go to “11”

October 11, 2007
Mining the NYT Archives
Dave Winer looks to the recently released New York Times archives as rich loam of fertile content upon which many applications can be built. In another life, as a product manager for factiva.com, I came to appreciate the meta-data the Times would attach to their content as something Factiva would leverage for its clients. Factiva provided investment banks and corporate libraries with content feeds from major news outlets and used meta-data on their sources (often adding additional meta-data of its own) so their clients would get precisely the content they were interested in and avoid having to wade through irrelevant results that were often the result of blunt keyword searches.

If the global PR officers of Ford or Sharp were looking for breaking news stories, keyword searches on the internet would be nearly useless as they would pull in stories of used Ford cars for sale or someone’s “sharp” looking suit. These client would pay for the meta-data and Factiva’s taxonomy consultants would offer numerous tips & tricks to hone down their filters to find exactly what was required.

With this in mind, I took a quick look at the source on the New York Times stories and found that they contain much of the meta-data that I remember.

Today’s story on Iranian President Ahmadinejad’s speech at the UN contains the following meta tags:
- byl= Warren Hoge
- des= International Relations;Embargoes and Economic Sanctions;Atomic Weapons
- per=Ahmadinejad, Mahmoud
- org=United Nations;Security Counci
- geo= Iran
A business article on the arrival of the Microsoft game Halo 3 has the following:
- byl=Seth Schiesel
- des=Computer and Video Games;Computers and the Internet
- per=Gates, Bill
- org=Microsoft Corp;Sony Corp;Nintendo Company Limited
- ticker=Microsoft Corp|MSFT|NASDAQ;Best Buy Company Incorporated|BBY|NYSE;Sony Corp|SNE|NYSE;Nintendo Company Limited|NTDOY|other-OTC;GameStop Corporation|GME|NYSE;Circuit City Stores Inc|CC|NYSE
From this we can see elements of the nytimes.com taxonomy poke through.
- byl – is the byline of the author of the story
- des – the description and how this story is classified by the New York Times
- per – nodes for individuals
- org – company or organizational nodes
- ticker – public company stock symbols and their listing exchange
I’ve only just started playing around with this but using text from the meta-data fields and your favorite search engine you can already start to sort results in interesting ways.
It’s still early days as it appears that the search engines have not crawled the archives completely and a quick check of older articles are lacking in most of this meta-data. It will be interesting to see what insights skillful use of the meta-data fields will yield over the next few weeks and what applications can be built on top of them.
September 26, 2007

←Previous Page

1 2 3 4 … 6

everwas

a blog by Ian Kennedy

Search

About Me
Projects
Search

Twenty Twenty-Five

Designed with WordPress

Loading Comments...

Write a Comment...

Email (Required)

Name (Required)

Website