Tag: business model

  • Publishing in an Agentic AI World

    Publishing in an Agentic AI World

    I was fortunate to be invited last week to the kick off meeting for an IAB Tech Lab task force dedicated to establishing a framework for Monetizing the Open Web in the age of AI.

    The accelerated use of tools such as OpenAI’s Chat GPT, MSN’s CoPilot and Google’s AI Overview has precipitated a re-thinking of how publishers are compensated for their work. While the conversation is only just beginning, this group outlined concrete suggestions to the challenges ahead.

    Publishers must work together to stem the flow of unlicensed content. 

    As long as information is readily available and free, there is no incentive to drive demand. Unless you limit supply, you cannot derive value. Content Delivery Networks (CDN) such as CloudFlare and Fastly are on the front lines as they see the majority of requests & responses across their networks. They have noted the sharp increase in AI-bot traffic and have the expertise in the blocking and tackling of the thousands of bots and spiders.

    TollBit is keeping track of the AI crawlers with their quarterly State of the Bot reports

    Publishers in the room are anticipating a world where the value of a search result on Google is less than the value of licensed access from an AI Agent. In such a world, it would be prudent to default to denying access to all crawlers in favor of direct access by verified, registered readers or licensed partners.

    Old methods such as robots.txt and user-agent/IP address blocking filters are readily ignored or spoofed by long-tail startups scrambling to serve their users. More secure methods are necessary. “Get a better lock on your door.” says Jon Roberts, Chief Innovation Officer at DotDash Meredith.

    Establish a marketplace for publisher information that can scale to serve AI Agents.

    While CloudFlare’s Cost-Per-Crawl implementation redirects crawlers to a 402 HTTP error page to redirect AI developers to licensing information, TollBit and Dappier are building the first marketplaces for publisher libraries as a proof-of-concept as a bridge from the search engine results of the past to the marketplace of the future.

    A search for the past 25 years returned 10 blue links of which a human might read 1 or 2 links. This referral traffic was the currency of the old internet. This attention could be monetized any number of ways.

    Google owned this marketplace.

    Now imagine a world where a query by an AI Agent may spawn 20-50 queries of which *every* article is scanned and synthesized into a single response. Ads and subscription funnels on these pages are ignored. There needs to be a different pricing model for this traffic. Source material will not be a “page” but could be a snippet of video, a schematic blueprint, a sound bite, or a product spec. The pricing model in this “marketplace of everything” needs to recognize and support dynamically pricing requests based on who is requesting it and in what format, all in real-time.

    The programmatic advertising ecosystem which has been the engine driving the growth of online publishing for the past 25 years was subsidized by advertisers bidding for a reader’s attention on a web page. An entire tech-stack was built to serve up the right ad for the right audience at the right time, all in under 500 ms.

    With AI Agents, you have readers paying directly for content with their subscriptions (Chat GPT Pro is $200/month) and the AI Agent is acting as a proxy on their behalf. Once publishers have successfully shut off the free flow of their content, accurate, reliable, and up-to-date information will accrue value. In this world, there will be need for a real time marketplace to handle the access and metering of content and this system will need to be built to the same scale and performance of the programmatic advertising platforms of today.

    No one owns this ecosystem.

    IAB Tech Lab - Content Ingest API

    Tokenization – core components to a market

    The final step, once you have established a way to meter (cost-per-crawl, cost-per-query or some other subscription model) access to a publisher’s library, is to establish a standardized way to track and count content as it travels through the ecosystem from a publisher to the AI platform.

    Tokenization is this final step. Once an AI has asked for something, the response needs to be tokenized in such a way to properly attribute credit as well as track the usage of that content not only to the initial query but in all future derivative uses.

    ProRata has a working implementation of this tracking in their Gist Answers, distributed search widget. “If you can track it, you can charge for it.” they say as their IP is focused on the attribution of AI responses from their network of publishers.

    TollBit has one-time use tokens for access and are setting up a system where an AI Agent can query for information, inquire about pricing, and generate and receive a token to retrieve the information as needed, on demand.

    All these are approaches to the same problem and my company, SimpleFeed, aims to participate in the delivery of publisher content, whether it be tokenized, vectorized (to assist in discovery), or otherwise value-added with filtering and meta-data augmentation as we have been doing for 20+ years.

    I look forward to staying engaged with the individuals and companies that were gathered for this workshop. AI summarization is rapidly tearing down a business model that has worked for decades. Unless there is an agreed upon business model that is accessible to all players including small publishers and long-tail AI startups, we may lose the diversity of opinions and perspectives that have given us the open web we currently enjoy.

    It makes sense that IAB, an industry group that helped establish standards around online advertising, is taking the first steps to establish standards around the AI Agentic web of the future. I thank them for taking this first important step of getting all the players in the room together and publishing the first framework for publisher content monetization and brand content management for LLMs and AI agents.

    Publishers underwrite new projects based on forward-looking financials. If nothing is done to improve the economics of publishing online today, the investigative reporting of the future will not be funded and we will all be poorer for it.

  • Perplexity’s YouTube Moment

    Perplexity’s YouTube Moment

    The recent Perplexity acquisition rumors echo the moment Google was getting ready to buy YouTube back in 2006.

    Back then, pundits were concerned Google was taking on a whole host of potential copyright-infringement lawsuits as YouTube was chock full of pirated videos. YouTube has since built a sophisticated copyright detection algorithm that does a pretty good job of detecting not only pirated videos but also when copyrighted music is used as a video soundtrack.

    In hindsight, the then ineffable purchase price of $1.65 Billion was a bargain. YouTube’s technology was transformational. As YouTube expanded, it was no longer necessary to download a QuickTime plugin or other special player software. Once YouTube’s javascript libraries became ubiquitous, online video was solved for good. YouTube TV launched in 2017 to disrupt an entrenched cable television industry and now generates more total viewing time than both cable and broadcast television, combined.

    Could the same be said for the potential acquisition of Perplexity? While all LLMs share an index made up of the common crawl and anything else they can find on the open internet, will Perplexity’s vectorized index of exclusively licensed news sources drive enough usage and value to a potential acquirer?

    While YouTube’s pre-acquisition copyright concerns ended up being nothing more than a speed bump, it eventually started a formalized conversation around formally licensing content. Could Perplexity’s fledging licensing program be the start of a more sustainable way grow the new AI ecosystem?

  • Preserving Publisher Rights in the Era of AI Chatbots

    Preserving Publisher Rights in the Era of AI Chatbots

    Last September, I gave a talk at the Media Party conference in New York to propose a method to track the origin of text as it travels through a Large Language Model (LLM). Tracking provenance is important because to evaluate reputation and assign credit to properly allocate licensing revenues to publishers that provide source material to an LLM.

    What follows are the slides from the talk with some annotations to help explain.

    The rough outline of the proposal is a simple type of HTML markup which allows the publisher or author of a page to mark unique phrases, facts, quotes or figures for which they would like to retain credit. This markup, if retained along with the indexed text, would allow the LLMs to store and trace the origin of these unique phrases back to the originating url or domain tracking the “knowledge” as it travels from the originating website to an LLM and then back out via a genAI chatbot in the form of an “answer.”

    Setting some historical context, I explained how incentives can shape ecosystems . The pageview & advertising economy of online publishing incentivizes publishers to seek out traffic and has given rise to an ecosystem that put Google and their “ten blue links” at the center. A link drives traffic and traffic drives ad impressions which equals revenue in this ecosystem.

    This well-established ecosystem is being upended by AI chatbots which efficiently extract knowledge from a page and serve it back to the user without generating a pageview. This cuts out an important way for publishers to make money, grow audience, and promote their brand.

    To get a jump on this new ecosystem, large publishers are cutting deals with the AI companies but only the biggest will have the resources to benefit from such arrangements. Smaller publishers will be left out.

    SimpleFeed (where I work) released a simple WordPress plugin that monitors your site to see who is crawling your site and allows the site admin to block selected bots. The idea is to educate smaller site owners how much indexing is going on and build awareness of how the LLMs are interacting with their content.

    According to CloudFlare, bots make up 30% of a site’s traffic and this figure will surely increase.

    Referrals from social networks are falling. This puts pressure on site owners who wish to control who comes to crawl their site. Who do you let in, who to block? The act of publishing something is to distribute your information far and wide but, right now, many are defending their sites from aggressive crawlers strip-mining their sites without compensation.

    If we plot this situation to it’s conclusion, the largest publishes will survive off of whatever licensing terms they can secure while the smaller sites get starved of traffic and miss out on any significant licensing revenue. The result is that we lose the diversity of the web. This leads to the gentrification of anything going into and coming out of the LLMs. This is what is called, an ecosystem collapse.

    Tim O’Reilly is my North Star when it comes to understanding technological tectonic shifts. Much of my thinking here is inspired by an O’Reilly piece, How to Fix “AI’s Original Sin” in which he writes about how incentives can influence ecosystem design and how pageview incentives of past result in the block & tackle behavior of publishers towards the LLM platforms today.

    The challenge for the LLMs to break out of this cycle is to create a system for “detecting content ownership and providing compensation” so that LLMs can share the enormous, untapped potential everyone anticipates for the LLM platforms. In O’Reilly’s words,

    This is one of the great business opportunities of the next few years, awaiting the kind of breakthrough that pay-per-click search advertising brought to the World Wide Web.

    In the world of digital art (audio, photos, videos), the people and companies behind Content Credentials are already hard at work in creating this system.

    If a picture is worth 1,000 words, there must be value assigned to text. If something has a value, it’s worth tracking. I propose a few elements worth tracking. Quotes, Statistics, and even unique phrases.

    The next few slides told the story of how, when blogs and blogging were just getting started, there was a huge problem with comment spam. This was largely the result of incentives to get a high reputation site to link back to the commenter’s website to help improve their ranking in Google’s search results.

    Over the course of a few days (the internet was a smaller place back then), engineers at Google and Six Apart (where I worked at the time) agreed to negate the relevance of the link back to the commenter’s site on a comment and dealt a blow to the comment spam problem. A small group of engineer’s extended the web and, in a very simple way, removed the incentives that rewarded bad behavior.

    I told this story because I see the rel= link qualifier as something that could be used to markup text and prove provenance. I proposed something called a “knowledge unit” or KU for short.

    The syntax of the markup worked alongside HTML, just bracket anything you want to track in the rel=ku markup and, as long as the consuming LLM keeps that markup intact, that text will be tagged as something originating from the url cited in the markup.

    This provenance can be used to track the number of times a particular knowledge unit is mentioned in an LLM’s response. This enables a fundamentally different ecosystem from that of pageviews in that there is no need to constantly re-post something you wrote years ago to keep it fresh, relevant, and trending in Google’s search results. Hard work to produce durable knowledge should pay dividends on into the future.

    More akin to the Wikipedia reputation model, a good, unique fact can continue to be cited over time and, in fact, revenues should flow towards durable knowledge units and will hopefully reward those that gather and present unique knowledge rather that the hot takes and re-writes that are rewarded in today’s pageview economy.

    Taken a step further, we will then return to a web before ad targeting and enragement metrics to a world where we reward those that teach us something new.

    This new internet no longer drives you to “acquire” a “user” to package up and sell to an advertiser. Publishers no longer need to lock their stories behind a paywall to prevent non-monetized access. In this new ecosystem, the incentive is to share knowledge, getting paid directly for the broad distribution and citation of your work.

    This is just the germ of an idea that may well be totally naive. While I do like the bottoms-up, simplicity of the markup approach, it requires everyone to adopt and trust each other to collectively make it work.

    What is to keep bad actors from hijacking Knowledge Units and claiming something as their own? Page index timestamps will need to be the arbitrator of provenance I suppose but how do you guarantee delivery of your post over others?

    Also, why would LLMs adopt such a system that would fundamentally make their indexes more complex and expensive? My hope is that the LLMs eventually see that strip-mining the web is unsustainable. Just as in agriculture, an ecosystem that does not replenish it’s resources, both large and small, is not a diverse, healthy, and long-lasting ecosystem.

    If you’ve made it this far, I’m super-interested in your thoughts and encourage you to get in touch.

  • Access as a Service

    Access as a Service

    Tim O’Reilly popularized the term “Web 2.0” to explain the network effects of the participatory web enabled by dynamic web pages tied to personalization. He is excellent at summarizing large technical trends in a way that not only makes it relatable but also provides a useful framework when I need to explain these concepts to others.

    So it was with great anticipation that I saw that O’Reilly has posted his thoughts on the intersection of copyright and AI.

    The Risk

    If the long-term health of AI requires the ongoing production of carefully written and edited content—as the currency of AI knowledge certainly does—only the most short-term of business advantage can be found by drying up the river AI companies drink from. Facts are not copyrightable, but AI model developers standing on the letter of the law will find cold comfort in that if news and other sources of curated content are driven out of business.

    How to Fix “AI’s Original Sin”

    The Opportunity

    While large licensing deals are being cut by publishers that have the leverage & lawyers to negotiate massive, one-time deals, these are ultimately short-lived and only serve to build up the large AI-providers that can afford to subsidize premium materials for their users. These deals just make the rich even richer.

    The longer term, sustainable opportunity he proposes is in allowing the internet-of-many to share in the revenues enabled by the output from these large AI systems.

    But what is missing is a more generalized infrastructure for detecting content ownership and providing compensation in a general purpose way. This is one of the great business opportunities of the next few years, awaiting the kind of breakthrough that pay-per-click search advertising brought to the World Wide Web.

    How to Fix “AI’s Original Sin”

    The Challenge

    Build a shared provenience and attribution service that keeps track of all documents available to AI systems and the permissions and royalty payment requirements around those documents.

    O’Reilly alludes to the UNIX/LINUX filesystem architecture of files with permissions set at the global, group, and user levels as a potential solution to what publishers allow to AI vendors seeking out material for their training sets.

    If we expand this analogy out to internet scale, could we apply the architecture of Hosts tables and the modern Domain Name Service to provide a dynamic infrastructure that could maintain a public “lookup” service so any particular AI could locate the origin of any attributable fact, quote or yet-to-be-determined “knowledge unit” and the license fee should an AI wish to leverage that data.

    In UNIX, the chmod command is used to change permissions. Could setting copyright permissions via a specialized version of “chmod” be the key to a new way to control access and compensate publishers at scale?

    Food for thought.

  • OpenAI has an App Store

    OpenAI has an App Store

    OpenAI’s DevDay keynote had the look and feel of all Silicon Valley product announcements – a well-scripted parade of announcements, a couple live demos, and even a “one more thing” that is revealed with low-key fanfare but, by it’s placement at the end of the talk, signals to the world that this is the game-changer.

    That thing was the app store for custom AI chatbots. To make it easier to grok and talk about, OpenAI has co-opted the acronym for the rather technical mouthful that is “Generative Pre-trained Transformers” and made it into a product name. Custom versions of ChatGPT are now GPTs. This makes it easier for the broader public to understand and makes it a whole lot easier for marketers to fold into their campaigns in the same way, “There’s an App for that” became a catch phrase for Apple’s app ecosystem, I can see “Just GPT it!” becoming a verb for leveraging AI to do some grunt work for you.

    That’s my 30,000 foot view before diving in and playing around more. Stratechery has a much more informed deep dive on the significance of what was announced and I recommend reading Ben Thompson’s analysis which includes important observations around the significance of OpenAI using Microsoft’s infrastructure and what that partnership means for the market going forward.

    As a teaser, I found this passage thought-provoking,

    This has two implications. First, while this may have been OpenAI’s first developer conference, I remain unconvinced that OpenAI is going to ever be a true developer-focused company. I think that was Altman’s plan, but reality in the form of ChatGPT intervened: ChatGPT is the most important consumer-facing product since the iPhone, making OpenAI The Accidental Consumer Tech Company. That, by extension, means that integration will continue to matter more than modularization, which is great for Microsoft’s compute stack and maybe less exciting for developers.

    The OpenAI Keynote
  • Search finds, chat extracts

    Search finds, chat extracts

    The title of this post is from last week’s People vs. Algorithms newsletter. What starts with a grim evaluation of BuzzFeed’s latest earnings leads into a grim prospectus of the online media industry in a world where platforms such as TikTok and Chat GPT upend established publishing business models.

    In this world, publishers that have built their reputation on listicles curating the best posts from Reddit lose out to TikTok accounts scratching that same itch but wrapped up in bite-sized, personality-driven, 20-second video clips. People don’t go to BuzzFeed for random amusement, they go to TikTok.

    Then there’s search. When you know what you’re looking for, you realize that Google’s search results page is no longer that efficiently clean place that it used to be. There are more distractions on a a Google SERP than a suburban strip mall lined by used car inflatable air guys and their flailing limbs.

    Search for the best hotels in NYC and you’ll notice that not only the first couple of results are sponsored, the embedded map, People Also Ask box and other remaining links are also heavily SEO’d and lead to pages that are either full of sponsored links as well. Anyone who has searched for a recipe knows that the actual list of ingredients is buried down on the bottom of the page, after you’ve scrolled past the history, entomology, and evolution of the dish, all while generating impressions on the accompanying advertisements that may or may not be related.

    Conversational AI interfaces harken back to the utility of early Google as they cut right through all this. I have to admit that 80% of my ChatGPT use is asking for the ingredients of a cocktail. The response is wonderfully refreshing with its “just the facts” presentation.

    The web starts to look different, half chat box, half vertical video.

    BuzzFeed’s Dirty Laundry

    Lifestyle publishers that get their revenue via ads running on their site need to prepare for this new world. If curation of the social web is no longer seen as a value add and the “How to. . .” or recipe post just becomes raw material for a ChatGPT response, then how does this publisher, who is paid to introduce advertisers to their audience, get paid?

    The arc of the internet is long and unpredictable but bends toward user empowerment and ever increasing fidelity. An endless stream of algorithmically sorted vertical video is the current endpoint. Robots that do much of the work to make sense of things for you are coming faster than you can say “human augmentation.”

    BuzzFeed’s Dirty Laundry

    John Battelle, who wrote the book on search, the last technical innovation, has some ideas. The first two (affiliate and subscription) are the logical continuation of existing business models but the second two are more interesting.

    “NPR-style” underwriting – There’s an opportunity for a specialized AI to be sponsored by a brand in the same way you see certain brands feature prominently in certain magazines. Going back to my search for a cocktail recipe, does adding a classy, relevant brand ad to an AI search that’s been specifically trained on a curated dataset for the purpose can not only help pay for the experience, if done tastefully but also add to it.

    Building programmatic search ads “at scale” ruined the curation of high-end brand advertising. To make a good conversational search experience takes time and expertise. Great care should go into curating training sets and iterating continually to produce quality results. Hopefully the same care will be given to accompanying advertising.

    The branded agent – this brings to mind something that was pondered but never came to be when search became a consumer product. Search can go both ways, there’s the retrospective search where we search the past and then there’s prospective search that is like a standing search that only notifies you when there’s a new “hit” in the future. Prospective search is familiar to anyone who’s played around with a financial news service, Google Alerts, or services such as IFTTT or Zapier.

    If I think of it, I have multiple standing search queries across multiple services that vie for my attention when they get a hit. Spotify lets me know when an artist that I have on repeat is coming to town, American Express tells me every week how much I’ve spent on my card, and ESPN is laughing at me right now because my NCAA bracket is a mess.

    These are better known as push notifications and, if you’re like me, you get too many of them. Maybe this is where conversational AI will provide help. Notifications are like a one way conversation – various services trying to start a conversation, most of them failing. Apple has attempted to offer user controls but it’s so complicated to set up that entire articles are written about how to configure the Notification Center.

    World War I U-boat controls

    Maybe notification management is where we’ll see sponsored conversational AI agents provide value. Allow an AI access to your notifications to get filtered or enhanced notifications and chat conversations informed by your lifestyle and interests.

    Invite The New Yorker AI, sponsored by Calm to manage your weekend notifications and allow you uninterrupted time with their long-form journalism partners.

    Let Bicycling‘s AI, sponsored by Peloton look at health-related notifications and suggest that you take your indoor training on the road with the upcoming Five-Borough ride.

    Use the Eater AI sponsored by Resy to look for food & drink recommendations and get access to a branded conversational AI module that has a history of not only where you’ve been but also all the places you have “on your list.”

    We give Google access to our retrospective search, are we prepared to give an AI access to our prospective search in return for personalized AI?

    Imagine asking your personal AI when that Italian restaurant your friend texted you about last week is open for a Friday evening reservation. You then ask it to check which of those days works for your date and, when you hear back, you ask the AI to secure that reservation with your credit card. Skip a few beats and then your Peloton AI pipes in to suggest a longer than normal ride for you the following day to work off all that pasta. Respond “sure” and then it’s on your calendar for Saturday morning.

    Is this a dream come true or a nightmare you want to avoid? We’ve been here before. What privacy will you give up in return for convenience? It comes down to trust. We’ve been burned by the platforms who took our trust and used it to spam us with irrelevant messages in pursuit of CPCs at scale.

    Would things have been different if we opted in to brands we trust to broker our preferences. What if publishers such as The New Yorker, Bicycling, or Eater managed our privacy and brokered it to its advertisers. Wouldn’t you feel differently if you were putting your trust in an editorial voice that you identified with as a subscriber and reader and not some faceless technology stack that only sought to harvest your clicks?

    Now is the time for publishers to jump in front of conversational AI development and use their editorial expertise to craft experiences that cater to their readers. Use this Precambrian period to establish a reputation for quality and avoid disintermediation by the platforms again.

  • That was fast

    That was fast

    AI-generated junk suffocating online platforms like algal blooms that choke the life out of ponds. 

    Hustle bros are jumping on the AI bandwagon

    Well that was fast. While still pondering the impact of generative AI technologies such as ChatGPT, we already have the hucksters rushing in to put it to market and make a quick buck. On a more serious note, a Columbian judge has used it to help him draft his judgement and we’ve already about the robots taking over CNet.

    As the graphic in the tweet below has predicted, the first use cases for generative AI will be to scale up correspondence so that the we can produce customized on a grand scale.

    Chat support vendor Intercom demonstrated how AI can be used as an add-in to summarize, make more formal, translate or even write a new article based on simple inputs. Microsoft is already cashing in on their $10 billion investment in OpenAI and making Bing search more conversational and the AI has already been integrated into their enterprise software platforms.

    Viva Sales, which connects Microsoft’s Office and video conferencing programs with customer relations management software, will be able to generate email replies to clients using OpenAI’s product for creating text. The AI tools, which include OpenAI’s GPT 3.5 — the system that is the basis for the ChatGPT chatbot— will cull data from customer records and Office email software. That information will then be used to generate emails containing personalized text, pricing details and promotions. 

    Microsoft Will Use OpenAI Tech to Write Emails for Busy Salespeople

    The AI hype race has a nasty habit of pushing the “should we really do this?” stage of innovation to the side in pursuit of the almighty first-mover advantage. Threatened with Microsoft releasing a conversational AI search engine, Google is now pressured to release their own version. Despite careful consideration to date Google is making investments in what feels like an AI arms race.

    All this to say that it’s going to take awhile for the “algal bloom” mentioned at the top of thIs article to run its course. In time the valuable use cases will become obvious but, to most, it will be in hindsight. There are going to be some road wrecks along the way but hopefully we will not break the internet, democracy, or society while we learn how to be smarter about how to work smarter.

    It’s useful to gain perspective on the coming AI revolution from the great technological historian Kevin Kelly who spoke about how AI would lead to the Second Industrial Revolution six years ago at TED.

    Everything we electrified, we can now cognify. . . The most popular AI product in 20 years from now, that everybody uses, has not been invented yet.

    Kevin Kelly
  • Is Social Media Just Network Television Reborn?

    What if we follow the trend of the “app-ificaiton” of media to the next logical step? What if Snapchat’s Discover feature is just the modern version of network television where channels control distribution and readers become passive again, replacing their allotted 5 hours of TV with 5 hours of browsing Facebook, Twitter, Snapchat and the rest?

    If in five years I’m just watching NFL-endorsed ESPN clips through a syndication deal with a messaging app, and Vice is just an age-skewed Viacom with better audience data, and I’m looking up the same trivia on Genius instead of Wikipedia, and “publications” are just content agencies that solve temporary optimization issues for much larger platforms, what will have been point of the last twenty years of creating things for the web?

    – The Next Internet is TV

  • Where do you get your music?

    How we pay for music, 1983-2014
    How we pay for music, 1983-2014

    Digital Music News put together a visual showing the mix of revenue streams for music over the past 30 years. CDs, which represented only 0.5% in 1983 grew to the dominant medium in 2003 when it was 95.5% of revenue.

    In 2004 downloads appear on the scene (or begin to be counted) at 1.5% and are, in 2013 more than the CD with both downloads and streaming/subscription revenues eating away at CD market share.

    2013 Music Revenue Mix