{"id":77373,"date":"2025-12-12T15:47:38","date_gmt":"2025-12-12T15:47:38","guid":{"rendered":"https:\/\/creativecommons.org\/?p=77373"},"modified":"2026-01-08T16:51:16","modified_gmt":"2026-01-08T16:51:16","slug":"where-cc-stands-on-pay-to-crawl","status":"publish","type":"post","link":"https:\/\/creativecommons.org\/2025\/12\/12\/where-cc-stands-on-pay-to-crawl\/","title":{"rendered":"Where CC Stands on Pay-to-Crawl"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">As we\u2019ve discussed before, the rise of large artificial intelligence (AI) models has <\/span><a href=\"https:\/\/creativecommons.org\/wp-content\/uploads\/2025\/06\/Human-Content-to-Machine-Data_Final.pdf\"><span style=\"font-weight: 400;\">fundamentally disrupted the social contract governing machine use of web content<\/span><\/a><span style=\"font-weight: 400;\">. Today, machines don\u2019t just access the web to make it more searchable or to help unlock new insights; they feed algorithms that fundamentally change (and threaten) the web we know. What once functioned as a mostly reciprocal ecosystem now risks becoming extractive by default.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In response, new approaches are emerging to support creators, publishers, and stewards of content to reclaim agency over how their works are used.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pay-to-crawl is one approach beginning to come into focus. Pay-to-crawl refers to emerging technical systems used by websites to automate compensation for when their digital content\u2014such as text, images, and structured data\u2014is accessed by machines. We\u2019ve recently published our interpretation and observations of pay-to-crawl systems in <\/span><a href=\"https:\/\/bit.ly\/493T52v\"><span style=\"font-weight: 400;\">this dedicated issue brief<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<figure id=\"post-77376 media-77376\" class=\"align-none\"><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/creativecommons.org\/wp-content\/uploads\/2025\/12\/LoneThomasky-BitsBaumeDistortedSand-Mine-2560x3612-1-e1765218855354-1024x867.png\" alt=\"A bird's eye view photo of an orange sand mine with transport lorries, but the image is slightly distorted by digital artefacts.\" \/><figcaption class=\"attribution\">&#8220;<a href=\"https:\/\/betterimagesofai.org\/images?artist=LoneThomasky&amp;title=DistortedSandMine\">Distorted Sand Mine<\/a>&#8221; by Lone Thomasky &amp; Bits&amp;B\u00e4ume, licensed under <a href=\"https:\/\/creativecommons.org\/licenses\/by\/4.0\/deed.en\">CC BY 4.0<\/a>.<\/figcaption><\/figure>\n<h2><b>CC\u2019s Position on Pay-to-Crawl<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Implemented responsibly, pay-to-crawl <\/span><i><span style=\"font-weight: 400;\">could<\/span><\/i><span style=\"font-weight: 400;\"> represent a way for websites to sustain the creation and sharing of their content, and manage substitutive uses, keeping content publicly accessible where it might otherwise not be shared or would disappear behind even more restrictive paywalls.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, we do have significant reservations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pay-to-crawl may represent an appropriate strategy for independent websites seeking to prevent AI crawlers from knocking them offline or to generate supplementary revenue. But elsewhere, pay-to-crawl systems could be cynically exploited by rightsholders to generate excessive profits, at the expense of human access and without necessarily benefiting the original creators.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pay-to-crawl systems themselves could become new concentrations of power, with the ability to dictate how we experience the web. They could seek to watch and control how content is used in ways that resemble the worst of Digital Rights Management (DRM), turning the web from a medium of sharing and remixing into a tightly monitored content delivery channel.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We\u2019re also concerned that indiscriminate use of pay-to-crawl systems could block off access to content for researchers, nonprofits, cultural heritage institutions, educators, and other actors working in the public interest. Legal rights to access content afforded by exceptions and limitations to copyright law, such as noncommercial research (in the EU) or fair use exemptions (in the US), as well as provisions for translation and accessibility tools, have been carefully negotiated and adjusted over time. These rights could be impeded by the introduction of blunt, poorly designed pay-to-crawl systems.<\/span><\/p>\n<h2><b>Proposed Principles for Responsible Pay-to-Crawl\u00a0<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Pay-to-crawl systems are not neutral infrastructure. It\u2019s vital that these systems are built and used in ways that serve the interests of creators and the commons, rather than simply create barriers to the sharing of knowledge and creativity, and benefit the few.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We\u2019re proposing the following set of principles as a way to guide the development of pay-to-crawl systems in alignment with this vision:<\/span><\/p>\n<ol>\n<li style=\"margin-bottom: 15px;\"><strong>Pay-to-crawl should not become a default setting.<br \/>\n<\/strong><strong><span style=\"font-weight: 400;\">Pay-to-crawl represents a strategy that may work for some websites, and not all websites share the same underlying concerns. Pay-to-crawl systems should not be deployed as an automatic or assumed setting on behalf of websites by others, such as domain hosts, content delivery networks, and other web service providers. <\/span><\/strong><\/li>\n<li style=\"margin-bottom: 15px;\"><b>Pay-to-crawl systems should enable choice and nuance, not blanket rules.<br \/>\n<\/b><span style=\"font-weight: 400;\">Pay-to-crawl systems should enable websites to distinguish between\u2014and set variable controls for\u2014different types of content users (such as commercial AI companies, nonprofits, researchers, or even specific organizations), as well as types and purposes of machine use (such as model training, indexing for search, and inference\/retrieval). Systems should not affect direct human browsing and use of content, including by restricting translation or accessibility services.<\/span><\/li>\n<li style=\"margin-bottom: 15px;\"><b>Pay-to-crawl systems should allow for throttling, not just blocking.<br \/>\n<\/b><span style=\"font-weight: 400;\">Pay-to-crawl systems should enable websites to manage hosting costs and other impacts of heavy machine traffic without walling off content entirely. For instance, systems could allow websites to throttle traffic driven by \u2018agentic browsing\u2019 or \u2018inference\u2019 undertaken by large AI models, while permitting other forms of machine access that involve far lower traffic, such as for research or archival.<\/span><\/li>\n<li style=\"margin-bottom: 15px;\"><b>Pay-to-crawl systems should preserve public interest access and legal rights.<br \/>\n<\/b><span style=\"font-weight: 400;\">Pay-to-crawl systems should not obstruct access to content for researchers, nonprofits, cultural heritage institutions, educators and other actors working in the public interest. Nor should these systems block lawful uses of content protected by copyright exceptions and limitations, and other legal rights afforded in the public interest. The act of deciding not to abide by a pay-per-crawl system should not, by itself, convert an otherwise lawful use into an illegal act.<\/span><\/li>\n<li style=\"margin-bottom: 15px;\"><b>Pay-to-crawl systems should use open, interoperable, and standardized components.<br \/>\n<\/b><span style=\"font-weight: 400;\">Pay-to-crawl systems should not become proprietary chokepoints or gatekeepers. We urge particular caution in the use of proprietary components for authentication and payment that might result in websites getting locked into a particular pay-to-crawl system.<\/span><\/li>\n<li style=\"margin-bottom: 15px;\"><b>Pay-to-crawl systems should enable collective contributions to the commons.<br \/>\n<\/b><span style=\"font-weight: 400;\">Pay-to-crawl systems that only enable financial transactions between singular websites and content users risk creating a highly transactional future, where the value of content is atomized. Pay-to-crawl systems should support collective forms of payment, such as to coalitions of creators and publishers, and wider conceptions of what it means to contribute to the digital commons.<\/span><\/li>\n<li style=\"margin-bottom: 15px;\"><b>Pay-to-crawl systems should avoid surveillance and DRM-like architectures.<br \/>\n<\/b><span style=\"font-weight: 400;\">Pay-to-crawl systems must not introduce excessive logging, fingerprinting, or behavioral tracking related to the use of content. Systems should minimize data collection to only what is needed to authenticate users and settle payments, rather than seek to follow content downstream or dictate how it can be used.<\/span><\/li>\n<\/ol>\n<h2><b>The Path Forward: Showing Up Where the Future Is Being Decided<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">We believe now is the moment to engage, to influence, and to infuse pay-to-crawl systems with values that prioritize reciprocity, openness, and the commons.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We welcome feedback and dialogue on the principles outlined here. Your input will help guide our engagement with pay-to-crawl systems and related initiatives moving forward, as well as inform the wider CC community\u2019s understanding of them.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Thank you to Jack Hardinges for his contributions to this post.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As we\u2019ve discussed before, the rise of large artificial intelligence (AI) models has fundamentally disrupted the social contract governing machine use of web content. Today, machines don\u2019t just access the web to make it more searchable or to help unlock new insights; they feed algorithms that fundamentally change (and threaten) the web we know. What once functioned as a mostly reciprocal ecosystem now risks becoming extractive by default.<\/p>\n","protected":false},"author":30,"featured_media":77376,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[21,22],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/posts\/77373"}],"collection":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/comments?post=77373"}],"version-history":[{"count":11,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/posts\/77373\/revisions"}],"predecessor-version":[{"id":77396,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/posts\/77373\/revisions\/77396"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/media\/77376"}],"wp:attachment":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/media?parent=77373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/categories?post=77373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/tags?post=77373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}