One of the great things about working at Yahoo is that on any given week there’s a brown bag lunch with someone interesting or provocative. Posters around campus promote these bigger draws and I subscribe to an internal mailing list which lets me know of the others. I try and make it to as many of these talks as I can or, if I’m remote, will at least try and tune in via an internal streaming server they have set up.
For David Weinberger’s chat with Bradley Horowitz last week I made a special trip down to Sunnyvale to see him and I’m glad I did – the talk was fantastic.
He’s on a speaking tour for his new book, Everything is Miscellaneous so if you get the opportunity to catch him speak, do try and make it. I’ve read The Cluetrain Manefesto and was deeply influenced by Small Pieces Loosly Joined which I picked up after seeing David speak at an early BloggerCon in Cambridge. I confess that I have not yet picked up a copy of but am looking forward to digging into it soon.
David of course talked about his book, the central argument being that as we move information from a physical world (books on a shelf) to a digital realm (bits striped onto a raid drive) the very nature of how we store information is strained. Dewey’s Decimal system worked fine when there was only one place to put a book but when you need to classify data and could support multiple tags that pointed to the data, traditional hierarchical taxonomies break down. Pointing to shifts such as how people used to organize their CD collection to today when we create playlists of our digitized music on the fly and the recent hand-wringing over the question of Pluto and the qualities that make up a planet are shaking our very core understanding of knowledge.
David gave the example of The Library of Congress, the bastion of this old world order as perfectly tuned for the world of print. 7,000 books arrive each day and are cataloged and assigned their place in the great category tree which is our modern library system. Meetings are held when anomalies occur but are quickly resolved at weekly meetings.
Yet, this method quickly breaks down when we try and apply it to the internet. First there’s just the scale of it all. Over 100,000 blogs are created each day and Technorati’s latest stats show over 15 posts uploaded each second. Not only is it impossible to categorize something that is growing at this rate, it’s also a lost cause to try and filter it for quality. Instead of catching things on the way in as the Library of Congress is doing, in this day of cheap storage and bandwidth, it’s better just to chuck it all into the digital equivalent of a shoebox and let the algorithms sort it out later.
As the grand index of everything we know grows larger, it’s going to be vital that we build better tools around this data to help us find what we need. In the world of photos and music we already know the importance of good metadata. Geo-tagging, date-stamps, EXIM, and BPM data are useful in helping us make sense of what we have. Social interactions with data also add valuable insight. Tagging add a layer of intelligence that a simple algo cannot.
David also believes that your social network will also add an important filter on a generic dataset to help you locate something relevent or interesting. The news this week that Facebook is adding classifieds is important because something for sale by your Facebook friend is an order of magnitute more compelling that a generic Craigslist listing by someone you don’t know.
Our schools are not very well equipped to deal with how we need to work in this new world. Testing in schools is still a, “face forward, solipsitic experience” which doesn’t take into account how we learn things today. A more appropriate test of your child’s understanding of Roman History would be a collaborative project. David suggested that the teacher work with the class on creating a wiki on their topic. The process of hunting, gathering, verifying, and collating information from across the web would prepare them much better than any multiple choice test could today.
What was most interesting in David’s talk was that he also tempered what he said with warnings not to jump too far in one direction. Sometimes, especially in Silicon Valley, we get all wrapped up in the new and shiny and too quickly leave behind the tried and true. There is value in a top down taxonomy on which to hang your folksonomic tags. The examples of Amazon suggesting books on “adoption” for those searching “abortion” and the more recent Google’s autosuggest snafu kicking over “she invented” and suggesting “he invented” as a more appropriate are examples of what happens when you let the ants design the castle. A blended approach is more sensible in the long run.
He also touched on the value of mediated experience. There is a great filtering process that takes place when a book is published. Thoughts are collected, sentences are composed. The investment in putting words to print is an important quality filter. As we move to the digital world where it becomes possible to record every waking moment of your life, it’s important to hang onto that filtering process. Despite it’s banality, Twitter is still compelling because there is ultimately someone at the other end; it is, “mediated by human meaning.” JustinTV, on the other hand is just a pure stream without edits, a capture device at best and one that requires 1:1 time and attention to extract meaning from what is captured.
I obviously need to read his book. After I read Small Pieces, I was sufficiently inspired to quit my job at Dow Jones to join the blogging revolution at Six Apart. Lord knows what will happen after I read Miscellaneous.
Further Reading :
- Great review by Ethan Zuckerman
- The entire talk is available on video on the YUI blog (where I snagged the book cover image, thanks Eric!)