artificial intelligence Archives - Creative Commons

From Signals to Infrastructure: Strengthening the Commons for the AI Era

Anna Tumadóttir,Sarah Hinchliff Pearson — Wed, 13 May 2026 08:00:45 +0000

We recently shared an update on the evolution of CC signals. As AI systems increasingly extract value from the commons without adequate consent, attribution, or transparency, sustaining a healthy commons requires stronger governance and accountability. This reflects a shift in our approach: from expressing preferences to rebalancing power to protect the commons.

In this post, we outline our plans to build upon and strengthen CC signals in order to support our goal of sustained access to human knowledge. We do not have all the answers yet. What we do have is a framework for how we will work toward them.

Recap: What’s At Stake

When it comes to AI, copyright operates in a landscape that is uneven and often unclear. Because of this, the CC licenses, while still important, are not sufficient to address how content is used in AI systems. You can read more on this here. CC licenses also do not fully capture the range of intentions creators and data holders have in an AI-mediated world.

Across the web, creators, communities, and institutions are turning to multiple forms of defensive enclosure to restrict access. These include:

Legal (e.g. licensing), such as open access publishers recommending CC BY-NC-ND as a mechanism of control, which ACM now does, which negatively impacts human collaboration.
Technical (e.g. CAPTCHAs, bot blocking, rate limiting), such as what news publishers are doing, which negatively impacts archiving efforts.
Financial (e.g. paywalled APIs), such as what X did post-acquisition, which negatively impacts researchers.

The problem is that these tools treat all machine use as the same, regardless of the purpose. In trying to limit large-scale extraction by AI developers, they also block public interest uses like research, preservation, and accessibility.

While our research is ongoing, there are early indications of a more fragmented and potentially shrinking commons, along with a weakening of long-standing public interest protections.

Building the Next Generation Infrastructure of Sharing

Open access through CC licenses created a spectrum of sharing. Today we need something similar for AI: a spectrum of participation, where creators and data-holding stewards are active participants in how knowledge is produced, shared, and used.

The commons we have built over the past 25 years did not emerge on its own. It was designed through legal frameworks, technical standards, and shared norms. The AI era requires the next generation of that infrastructure. We want a future where the global knowledge commons remains accessible, and where AI systems engage with it in ways that are transparent, accountable, and aligned with the public good.

Our Plans

CC is advancing several high-impact interventions as part of the CC signals framework to restore trust, strengthen participation, and embed public interest values into the AI knowledge ecosystem.

Helping People Make Informed Decisions in the Current Moment
Making Attribution the Norm in AI
Building New Tooling that Protects Public Interest Uses while Restoring Agency

Helping People Make Informed Decisions in the Current Moment

AI systems are using CC-licensed works in ways that are causing many to question whether the existing CC license suite still aligns with their goals.

These concerns take different forms: attribution that disappears inside AI systems, sensitive knowledge stripped from its original context, growing concentrations of value and power, and no clear mechanisms for reciprocity or accountability. But they share a common root: uncertainty about what the CC licenses actually mean in this new environment.

We want people who choose to CC license to do so with confidence. We also want institutions with CC licensing embedded in their policies to have a clear picture of what the licenses do and do not cover when it comes to AI. Over the next six months, we will provide sector-specific interim guidance to support CC licensors in navigating the new questions that AI raises for them. This guidance is not intended to resolve all legal ambiguity. Instead, during this period of uncertainty, we want to preserve the practice of sharing that AI is currently putting at risk, while we develop new tools and practices that address our communities’ concerns.

We will be holding a series of sector-specific virtual events to collect feedback on this interim guidance. Sign up for the CC newsletter for more information as soon as it becomes available.

Making Attribution the Norm in AI

Attribution has always been a cornerstone of the commons. It supports participation, enables transparency, and allows knowledge to be traced, evaluated, and built upon.

Today’s AI ecosystem is eroding this norm. Most generative systems do not meaningfully acknowledge the sources they rely on. As AI increasingly mediates access to knowledge, this has serious consequences: loss of provenance, reduced trust, and fewer incentives to share. The first iteration of CC signals included attribution as a preference; today we believe that attribution must be a requirement.

Our plan is to define best practices for attribution in AI contexts. AI developers often claim that attribution is simply not possible in LLMs. But this is a consequence of choices made during design, not a technical inevitability. We believe there is value in envisioning what attribution practices could look like in an AI ecosystem that prioritized them. And while there is no going back in time, we can demand attribution where it is technically possible within existing systems, such as Retrieval Augmented Generation (RAG), a method where AI systems pull from specific, traceable sources to generate responses.

Our work will involve detailing ideal attribution guidance for AI systems, end users, and creators. We will then demonstrate how attribution can be realized in RAG models. This initiative serves two purposes: building shared understanding of what attribution in AI can and cannot currently achieve, and giving creators and AI users the tools to advocate for attribution as a baseline expectation. Strengthening attribution helps ensure that knowledge can circulate widely without losing connection to the people and communities who created it.

CC is looking to connect with experts working on attribution standards and developers working on AI systems that preserve attribution. If that describes your work, we would love to hear from you.

Building New Tooling that Protects Public Interest Uses While Restoring Agency

Copyright alone cannot do this work. We believe maintaining a human-centered internet requires meaningful guardrails, upheld collectively. Our goal is to support an ecosystem that balances openness with agency, and access with accountability.

First, we are advocating for the development and usage of carefully scoped AI opt-outs that simultaneously sustain creator agency while protecting public interest uses. In an effort to address this need, we proposed additions to the IETF (the body that sets foundational internet standards) AI Preferences vocabulary that would help strike the right balance between creator agency and public interest reuse. It is essential that opt-out tooling and any related legislation protect public interest uses. This includes enabling cultural heritage institutions to preserve and analyze content, and supporting not-for-profit research and educational organizations in their work.

Second, we are doing research and development for a new tool designed to enable conditional access to openly shared collections and compilations. It will allow data stewards to set terms for accessing and using a collection or compilation that protect the sustainability of their technical infrastructure. These stewards may include libraries, archives, research institutions, data repositories, public knowledge projects, and cultural heritage organizations. Resource-heavy bulk reusers of data may be subject to more conditions, and public interest uses would be excluded entirely.

Without practical legal tools to define conditions for AI development, collections are left with blunt options: allow unrestricted extraction by AI developers, or restrict access entirely. Neither option reflects the goals of most knowledge stewards. This research and development is informed by close consultation with community members and stakeholders, such as dialogue with practitioners in the African context this past year, as well as broader explorations in the movement, such as this analysis on sharing of cultural heritage by Open Future Foundation, and the development of NOODL to rebalance power for marginalized language communities.

Many want to continue sharing their collections while ensuring that AI developers use them responsibly by respecting attribution, ensuring transparency, and meeting other safeguards aligned with their public interest missions. We want to build tooling to enable this in standardized, legally enforceable ways.

What Happens Next

The exploration of these kinds of tools requires us to look beyond copyright alone, which is a real paradigm shift for CC, and not one we take lightly. We believe that investigating the risks and benefits of legal tools that support conditional access is an essential part of stewarding the long-term health of the commons. We need to preserve access to valuable knowledge resources while ensuring that the institutions and communities who steward them remain active participants in shaping the AI ecosystem.

Here is where things stand. This month, we are convening a workshop in London to begin working through the design and governance questions that new tooling raises. Later this year, we will be seeking pilot adopters to help us test and refine the approach in practice. We will share updates as this work develops.

We have a clear plan, with these initiatives entering pilot phases within the year. Like many nonprofits, our ability to accelerate depends directly on the resources we have available. Support from our Open Infrastructure Circle has made progress to date possible, and as we mark our 25th anniversary, we have set a goal to raise $5 million to advance the next iteration of CC signals. If you are able, we invite you to support this work.

Let’s collectively build what the commons needs next.

The post From Signals to Infrastructure: Strengthening the Commons for the AI Era appeared first on Creative Commons.

How to Keep the Internet Human

Sarah Hinchliff Pearson — Thu, 12 Feb 2026 19:16:53 +0000

It is time to update our mental models about open knowledge

I like to say I am a “writer who lawyers”. I begin here because I want to name my biases up front. I am a lawyer, but I come to this work first and foremost as a writer thinking about the conditions that will allow us to continue to share knowledge publicly. And in spite of—or perhaps because of—the fact that I am a lawyer, I have a healthy skepticism about the power of legal terms and conditions. The law will play a role, but the challenge of keeping the internet human will ultimately be navigated by the stories we imagine and tell.

We need new stories.

I spent the first 15 years of my legal career working in intellectual property. For most of that time, I was part of the open movement, fighting overly restrictive intellectual property laws to promote access to knowledge. But over time, I began to feel like the message of open licensing did not resonate with me in the same way, especially in my identity as a writer. Eventually I left the open movement to go into the field of privacy.

Immersing myself in digital privacy led me to realize why the story of open felt incomplete. We had been undervaluing the role of boundaries around reuse. The tension between the instinct to share and the need for boundaries around reuse is the point. And right now, that tension is completely out of balance. Instead, what exists online is a free-for-all.

If you are familiar with the concept of a commons, you know it requires shared rules that govern reuse of resources. Those shared rules represent a mutual commitment by producers and reusers, and they ensure that the cycle leads to collective benefit and begins again. A free-for-all, on the other hand, has no shared rules. As a result, we are losing the instinct to share.

What happened to the commons?

It would be easy to blame AI for this situation, but it is not so straightforward. AI is simply speeding up and exacerbating longstanding challenges with open knowledge. As privacy scholar Daniel Solove has written, “AI is continuous with the data collection and use that has been going on throughout the digital age.”

In preparation for this talk, I went back and reread the brilliant CC Summit keynote “Open As In Dangerous” by Chris Bourg from 2018 and the seminal Paradox of Open report by the Open Future Foundation. For many years, these and countless other voices have been warning us about the vulnerabilities that open knowledge creates. Whether it is the use of CC-licensed photos for facial surveillance technology or the creation of Grokipedia, it is clear that open content is particularly vulnerable to abuse.

But of course, it is not just open content that is vulnerable. All content online today has essentially been treated as fair game. The free-for-all extends to everything online.

This has led to a vast renegotiation of what it means to share publicly, still currently underway. We see this in the massive wave of litigation against AI services, the rise of paywalls and commercial licensing deals, the introduction of new technologies to increase control over content in ways that scale back the open web, and the extreme backlash against AI by creators and the general public.

All of this constitutes a threat to open access to knowledge. It is unlikely that the incentives to share can outweigh all of the growing countervailing forces at play: economic, moral, safety, more. We cannot respond by accepting these risks and harms as inherent and inevitable costs of public sharing knowledge.

Changing our mental models

To meet the moment, we need to rethink our most fundamental assumptions about open knowledge.

The old taxonomies no longer apply.

For a very long time, we have used categories to help us determine the appropriate rules for sharing knowledge. Open content could be licensed one way, while open data had different parameters. This distinction no longer applies when everything online is used as data by machines. Even the difference between copyrighted material and public domain is not very useful, since even copyrighted works are largely used by machines for the public domain material within them (e.g., facts and ideas).

Copyright is not the main event.

The original “enemy” of the open movement was copyright, and things were simpler back then. Even the most restrictive open license was more permissive than the default under copyright law, so any boundaries we set around the commons were still fighting the copyright war. Overly restrictive copyright laws still cause problems today, but they are no longer the biggest threat against the commons. In fact, it is copyright’s weakness in the context of machine reuse that is the real challenge. The inapplicability of copyright in protecting against unwanted machine reuse guts the CC licenses of the same ability, creating the free-for-all even on CC-licensed content. And importantly, because the aim was to avoid having CC licenses impose restrictions on activity that was otherwise allowed under copyright, this was by design.

We have to stop confusing property with morality.

This is where I depart from my younger self and from many of my peers in the open movement. I think we have let important principles like the notion that facts and ideas should not be privately owned, or the fact that some permissionless reuse plays a critical role in free expression, convince us that the scope of copyright is an ethical line. The logic goes: if no one can own it, then no rules should apply. This leads to an impoverished sense of morality, where the only justification for constraint is property rights. As Robin Wall Kimmerer says, “In that property mindset, how we consume doesn’t really matter because it’s just stuff and the stuff all belongs to us. There is no moral constraint on consumption.”

The ethics of sharing—which is what open is about—needs to be broader than what we can own.

Boundaries benefit us all.

Boundaries on reuse are what create the reciprocity that fuels a commons. Without them, there is no assurance that sharing leads to collective benefit, and people lose their instinct to share. But boundaries can also have social value in their own right. Even when sharing in public, people rightfully expect some boundaries around how their works are used, regardless of what copyright law says. This is foundational in the field of privacy, but somehow we lose sight of it when we are sitting in the realm of content sharing. Daniel Solove writes: “People expect some degree of privacy in public, and such expectation is reasonable as well as important for freedom, democracy, and individual wellbeing.” Similarly, we establish boundaries around reuse of knowledge because those protections serve us all.

Open should not be a purity test.

The open movement has had incredible success creating global standards, and this has helped make it so successful. But the emphasis on standardization has led us to hyper-focus on definitions, and this focus is distracting us from the bigger picture. What matters is not open versus closed, or even abundance versus scarcity. We need to focus on values, not prescriptions. Open licensing has always been conditional, and it has always been a spectrum. This means we have to accept that there will be gray areas. What we lose in certainty, we will gain in relevance and moral clarity. As Rebecca Solnit says, “Categories are where thoughts go to die.”

Where do we go from here?

All of this leads back to where we began. We have to reconstruct the mutual commitment that keeps the commons cyclical.

Rebuilding the mutual commitment that comes with sharing knowledge requires us to balance opposing values. On the one hand, we must protect important freedoms of the reusing public. On the other, we must establish boundaries around responsible reuse. The goal is to be as open as possible and as restrictive as necessary. And before we start panicking about slippery slopes, we should remember there is an important limiting principle we can leverage: does the boundary shift power in ways that further concentrate it or redistribute it? We can also ask whether there are ways to mitigate a boundary’s effect on access.

We already have a good sense of the dimensions of boundaries around responsible reuse. They all have roots in the existing CC license suite.

Attribution: While the AI landscape complicates methods and norms for attribution, the principle is more important than ever for informational integrity, authors rights, and transparency.

Reciprocity: Molly Van Houweling calls this “extractability,” the idea that those extracting facts and ideas from others’ works have a moral responsibility to ensure that knowledge remains extractable by others. This is essentially about crafting a ShareAlike obligation for the age of AI.

Financial sustainability: This has been a longtime challenge in the open movement, and it is more urgent than ever. It is not about preserving business models, it is about financially sustaining the production of knowledge and culture as public goods.

Prohibitions on harmful use cases: This dimension may feel less familiar in open licensing, but the sentiment is one we hear regularly. There are simply some use cases or even actors that feel out of bounds for people sharing knowledge because of the harm they cause.

How do we catalyze a mutual commitment around prosocial boundaries in the current free-for-all environment? Open Future Foundation’s Paul Keller has written: “For any response to succeed in preserving a diverse and sustainable information ecosystem, collective action is required—both bottom-up, through coordinated action by information producers, and top-down, through political will to enable redistribution via fiscal interventions.” There is no single solution, and we need to tackle it from all directions.

For the bottom-up efforts, we can leverage the tools we have. Norms and social pressure have a role to play, though it is hard to put full faith in voluntary action right now. We can also explore methods for legal control, including both contract and copyright law. As Nilay Patel has said, “Copyright is the only functioning regulation on the internet,” which makes it impossible to avoid considering it as one lever to employ.¹ Finally, there is the strategy of controlling access. This is the most uncomfortable tactic because of the collateral damage it risks, and it requires extreme care. But if AI companies will not pay attention voluntarily, technical controls around access look increasingly necessary.

There are many in the open movement already experimenting with these efforts, including the Mozilla Data Collective, the differentiated access model proposed by Europeana and the Open Future Foundation, the NOODL license, and many more. Creative Commons is also actively thinking about how to build a framework that re-instills mutual commitment into the ecosystem. Many of you have been following along as we experiment with an AI preference signals framework we’ve been calling CC signals. While the path we will take is evolving, the goal is the same. We need to come together to define and sustain the boundaries that serve us all.

I will end with the words of Ruha Benjamin: “We need to give the voice of the cynical, skeptical grouch that patrols the borders of our imagination a rest.”

We can imagine a better way.

¹ While copyright law is ill-equipped to function as a method of control over machine reuse (and rightly so, considering the importance of not treating facts and ideas as private property), copyright law still has a role to play because of the uncertainty around its application on a global scale. Granting copyright permission in exchange for agreement to certain conditions could still be a valuable offer to some reusers.

The post How to Keep the Internet Human appeared first on Creative Commons.

Exploring a Books Data Commons for AI Training

Creative Commons — Mon, 08 Apr 2024 15:00:35 +0000

Our work on copyright has long focused on supporting libraries and archives in the service of their missions to preserve and ensure access to culture. Our 2022 copyright reform agenda centers those sorts of institutions (and more generally GLAMs) and the critical role they play in society. Among other things, that agenda calls attention to the ways in which copyright might impede libraries and archives who wish to make their collections available for research uses, including use for AI training in order to fulfill their public interest missions.

That issue – AI training – has become ever more relevant. The concept of mass digitization of books, including to support text and data mining, of which AI training is a subset, is not new. But AI training is newly of the zeitgeist, and its transformative use makes questions about how we digitize, preserve, and make accessible knowledge and cultural heritage salient in a distinct way.

In 2023, multiple news publications reported on the availability and use of a dataset of books called “Books3” to train large language models (LLMs), a form of generative AI tool. The Books3 dataset contains text from over 170,000 books, which are a mix of in-copyright and out-of-copyright works. It is believed to have been originally sourced from a website that was not authorized to distribute all of the works therein. In lawsuits brought against OpenAI, Microsoft, Meta, and Bloomberg related to their LLMs, the use of Books3 as training data was specifically cited.

The Books3 controversy highlights a critical question at the heart of generative AI: what role do books play in training AI models, and how might digitized books be made widely accessible for the purposes of training AI for the public good? What dataset of books could be constructed and under what circumstances?

Earlier this year, we collaborated with Open Future and Proteus Strategies on a series of workshops to explore these questions and more. We brought together practitioners on the front lines of building next-generation AI models, as well as legal and policy scholars with expertise in the copyright and licensing challenges surrounding digitized books. Our goal was also to bridge the perspective of stewards of content repositories, like libraries, with that of AI developers. A “books data commons” needs to be both responsibly managed, and useful for developers of AI models. Today, we’re releasing a paper based on those workshops and additional research.

While this paper does not prescribe a particular path forward, we do think it’s important to move beyond the status quo. Today, large swaths of knowledge contained in books are effectively locked up and inaccessible to most everyone. Large companies have huge advantages when it comes to access to books for AI training (and access to data in general). At the same time, as the paper highlights, there are already relevant examples of nonprofit and library-led efforts to provide responsible, fair access to books for many more people, not just the privileged few. We hope this paper can support further research, collaboration and investment in this space.

Read the full paper.

The post Exploring a Books Data Commons for AI Training appeared first on Creative Commons.

CC Joins Civil Society Letter Urging U.S. to Support Openness and Transparency in AI

Creative Commons — Thu, 28 Mar 2024 21:37:08 +0000

Over the last year, Creative Commons (CC) worked with other stakeholders to support open science and open source in the context of artificial intelligence (AI) and, specifically, the EU AI Act. This policy debate has now ratcheted up in the U.S. as well, after President Biden directed the National Telecommunications and Information Administration (NTIA) at the Department of Commerce to start a public consultation process regarding “Dual Use Foundation Artificial Intelligence Models with Widely Available Model Weights.”

This week, we joined a broad coalition of civil society and leading academics urging a tailored, evidence-driven approach. The letter highlights the critical benefits that open models can provide, and encourages the government to consider carefully the best ways to address the marginal risks that openness can create. As the letter states, “We do not claim that openness is always beneficial, and there are some situations where openness may exacerbate risks from AI.” However, risk should be evaluated relative to alternatives (e.g., the use of closed models or other digital tools to accomplish the same ends) and may be addressed through less restrictive means than direct limits on openness.

Read the full letter.

For more information on CC’s works on AI, contact us at info@creativecommons.org

The post CC Joins Civil Society Letter Urging U.S. to Support Openness and Transparency in AI appeared first on Creative Commons.

EU adopts landmark Artificial Intelligence Act

Brigitte Vézina,Mark Foster — Thu, 14 Mar 2024 15:29:17 +0000

Yesterday, the European Parliament (EP) adopted the Artificial Intelligence Act (AIA), the world’s first piece of legislation comprehensively regulating artificial intelligence (AI).

Creative Commons (CC) has been actively engaged with EU policymakers on the AIA for the past years, especially as copyright issues arose front and center in the negotiations last June. In this blog post, we share a few high-level assessments of where the text landed across various key aspects impacting the commons as well as some thoughts on what lies ahead.

Open source software

We welcome the fact that the final text excludes open-source AI from some obligations, even if these exclusions are subject to conditions and come alongside ambiguity of the meaning of “open” in the context of AI overall and in the specific context of this law. Other noteworthy positives include an exemption for AI systems “specifically developed and put into service for the sole purpose of scientific research and development” and another exemption from the detailed transparency requirements for open-source general purpose AI (GPAI) models, even if they still have the obligation to put in place a policy to respect copyright law and produce a summary about the content used for model training.

AI and creativity

The AIA is largely agnostic when it comes to how the creative industries (and indeed individual creators) use AI. From our community consultation outcomes, we see some creators embracing AI, using it as a creative tool to further develop new and innovative works. Others, however, remain concerned about the impact of AI on employment in certain sectors as well as the competition aspect of more cost-effective digital/virtual solutions vis-a-vis human created works.

In response to Chat GPT’s explosive adoption, the EP introduced significant provisions, including on copyright. CC provided direct input to policymakers in public and in private on this aspect of the legislation, which touches directly on CC’s core mission.

As ever with compromises there are aspects of the deal on copyright which are positive and others which will require further reflection and iteration. The AIA makes several cross-references to EU copyright legislation, in particular Articles 3 and 4 of the CDSM Directive on text and data mining. There are already some stakeholders who are pushing for revision of related EU legislation, including said Directive, in light of the AIA and recent technological developments. At CC we look forward to facilitating practical application of Articles 3 and 4. However, we caution against any fundamental revisions of the exceptions provided for in these articles. These are critical for ensuring a balanced copyright system.

What happens next? A gradual application process

The Council is expected to give its final approval in the coming weeks and the Act will likely be published in the EU’s Official Journal later this spring. The Act will enter into force 20 days later, while the individual provisions will start applying in a staggered manner, with articles on prohibited AI applications applicable after 6 months (circa end of 2024), provisions on general purpose AI applicable after 12 months (circa mid-2025) and the remaining provisions applicable after 24 months (circa mid-2026). Looking further into the future, EU policymakers will no doubt have to assess how the AIA works in practice and ensure appropriate enforcement.

We welcome the creation of the EU’s AI Office, which will soon have to start crafting technical rules, as mandated by the AIA. We look forward to engaging with the Office as it further develops important technical standards, in particular in developing an opt-out standard in alignment with existing copyright law. The Office will also have an important role in interpreting and further clarifying vaguely worded concepts, such as the requirement to publish “sufficiently detailed” summaries of inputs for large language models (LLMs) and foundation models. These rules will be crucial to provide clarity for firms and individuals regarding regime compliance.

CC will continue to engage constructively with EU policymakers to ensure that the EU’s copyright regime remains fit for purpose in an ever-evolving digital age and that AI can be harnessed for good in support of a thriving commons.

The post EU adopts landmark Artificial Intelligence Act appeared first on Creative Commons.

An Invitation for Creators, Activists, and Stewards of the Open Movement

Shannon Hong,Creative Commons — Sun, 11 Feb 2024 12:00:52 +0000

Dear Open Movement Creators, Activists, and Stewards,

A key question facing Creative Commons as an organization, and the open movement in general, is how we will respond to the challenge of shaping artificial intelligence (AI) towards the public interest, growing and sustaining a thriving commons of shared knowledge and culture.

So much of generative AI is built on the digital infrastructure of the commons and uses the vast quantity of images, text, video, and rich data resources of the internet. Organizations train their models with trillions of tokens from publicly available datasets like CommonCrawl, GitHub open source projects, Wikipedia, and ArXiV.

Access to the commons has enabled incredible innovations while creating the conditions for the concentration of power in entities that are able to amass the immense energy and data needed to train AI models. Community consultations at conferences like MozFest, RightsCon, Wikimania, and the CC Global Summit have also revealed concerns about transparency, bias, fairness, and attribution in AI.

Alignment Assembly

To start addressing some of these challenges, between 13 February and 15 March, Open Future will host an asynchronous, virtual alignment assembly for the open movement to explore principles and considerations for regulating generative AI. We hope to reach participants spread across different fields of open and coming from different regions of the world. We are organizing the assembly in partnership with Open Future and Fundación Karisma.

We want to bring to the conversation the perspectives of:

Activists and experts, including digital rights advocates and legal experts
Stewards: people from organizations that steward collections that are part of the digital commons such as Wikimedia, open access repositories, and cultural heritage collections
Creators: people who create works that form part of the digital commons, broadly: not only visual artists and musicians but also researchers who do open science or open source programmers

We will use the process of an alignment assembly, an experiment in collective deliberation and decision-making. This model is pioneered by the Collective Intelligence Project (CIP), led by Divya Siddarth and Saffron Huang. The model has been used by OpenAI, Anthropic, and the government of Taiwan.

You can sign up to take part in the process by registering your interest here (we will only use the contact information to invite you to the assembly and to provide updates and delete it once the assembly process is complete).

Background

Creative Commons has long been considering the intersection of copyright and AI. CC submitted comments to the World Intellectual Property Organization’s consultations on copyright and AI in 2020. When considering usage of CC-licensed work in AI, the organization explored in 2021 “Should CC-licensed work be used to train AI”. More recently, CC carried out consultations at MozFest, RightsCon, Wikimania, and the CC Global Summit, while publishing ongoing analysis of the AI landscape.

Ahead of the Creative Commons Global Summit last year, Creative Commons and Open Future hosted a workshop on generative AI and its impact on the commons. The group agreed and released a set of principles on “Making AI work for Creators and the Commons.” Now, we would like to test and expand this work.

Outcome

The Alignment Assembly on AI and the Commons builds on and continues all of this work.

We treat the principles as a starting point. We are using the alignment assembly methodology and the pol.is tool to understand where there is consensus and which principles generate controversy. In particular, how much alignment there is between the perspectives of activists, creators, and stewards of the commons.

At the end of the process, we will produce a report with the outcomes of the assembly and a proposal for a refined set of principles. As the policy debate about the commons and AI develops, we hope the assembly will provide insights into better regulation of generative AI.

The post An Invitation for Creators, Activists, and Stewards of the Open Movement appeared first on Creative Commons.

What does the CC Community Think about Regulating Generative AI?

Shannon Hong,Timid Robot Zehta,Nate Angell,Kat Walsh — Thu, 08 Feb 2024 12:00:31 +0000

In the past year, Creative Commons, alongside other members of the Movement for a Better Internet, hosted workshops and sessions at community conferences like MozFest, RightsCon, and Wikimania, to hear from attendees regarding their views on artificial intelligence (AI). In these sessions, community members raised concerns about how AI is utilizing CC-licensed content, and discussions touched on issues like transparency, bias, fairness, and proper attribution. Some creators worry that their work is being used to train AI systems without proper credit or consent, and some have asked for clearer guidelines around public benefit and reciprocity.

In 2023, the theme of the CC Global Summit was AI and the Commons, focused on supporting better sharing in a world with artificial intelligence — sharing that is contextual, inclusive, just, equitable, reciprocal, and sustainable. A team including CC General Counsel Kat Walsh, Director of Communications & Community Nate Angell, Director of Technology Timid Robot, and Tech Ethics Consultant Shannon Hong collaborated to use alignment assembly practices to engage the Summit community in thinking through a complex question: how should Creative Commons respond to the use of CC-licensed work in AI training? The team identified concerns CC should consider in relation to works used in AI training and mapped out possible practical interventions CC might pursue to ensure a thriving commons in a world with AI.

At the Summit, we engaged participants in an Alignment Assembly using Pol.is, an open-source, real-time survey platform, for input and voting. 25 people voted using the Pol.is, and in total 604 votes were cast on over 33 statements, with an average of 24 votes per voter. This included both pre-written seed statements and ideas suggested by participants.

The one thing everyone agreed on wholeheartedly: CC should NOT stay out of the AI debate. All attendees disagreed with the statement: “CC should not engage with AI or AI policy.”

Pol.is aggregates the votes and divides participants into opinion groups. Opinion groups are made of participants who voted similarly to each other, and differently from other groups. There were three opinion groups that resulted from this conversation.

Group A: Moat Protectors

Group A comprises 16% of participants and is characterized by a desire to focus on Creative Commons’ current expertise, specifically some relevant advocacy and the development of preference signaling. They uniquely support noncommercial public interest AI training, unlike B and C. This group is uniquely against additional changes like model licenses and strongly against political lobbying in the US.

Group B: AI Oversight Maximalists

Group B, the largest group with 36% of participants, strongly supports Creative Commons taking all actions possible to create oversight in AI, including new political lobbying actions or collaborations, AI teaching resources, model licenses, attribution laws, and preference signaling. This group uniquely supports political lobbying and new regulatory bodies.

Group C: Equitable Benefit Seekers

Group C, containing 32% of participants, is focused on protecting traditional knowledge, preserving the ability to choose where works can be used, and prioritizing equitable benefit from AI. This group strongly supports requiring authorization for using traditional knowledge in AI training and sharing the benefits of profits derived from the commons. Like group A, this group is against political lobbying in the US.

There are two key limitations of this assembly: participant sample size and participant representativeness. There are over 22,000 members in the Creative Commons slack community, which is only a subset of the many more members of the CC community more broadly. 30 people were present and active voting members of the assembly. While many participants were open movement leaders in their countries and represented the perspectives of more individuals, this sample is too small to have a complete picture of the CC community’s desires. We did not perform a demographic survey of the room, but data from the overall conference suggests that American and European perspectives may be overrepresented in our assembly.

Want to learn more about the specific takeaways? Read the full report.

We invite CC members to participate in the next alignment assembly, hosted by Open Future. Sign up and learn more here.

The post What does the CC Community Think about Regulating Generative AI? appeared first on Creative Commons.

Dispatches from Wikimania: Values for Shaping AI Towards a Better Internet

Shannon Hong,Creative Commons — Wed, 07 Feb 2024 22:12:48 +0000

Isolated Araneiform Topography, from UAHiRISE Collection on Flickr. Public Domain Mark.

AI is deeply connected to networked digital technologies — from the bazillions of works harvested from the internet to train AI to all the ways AI is shaping our online experience, from generative content to recommendation algorithms and simultaneous translation. Creative Commons engaged participants at Wikimania on August 15, 2023 to shape how AI fits into the people-powered policy agenda of the Movement for a Better Internet.

The session at Wikimania was one of a series of community consultations hosted by Creative Commons in 2023.

The goal of this session was to brainstorm and prioritize challenges that AI brings to the public interest commons and imagine ways we can meet those challenges. In order to better understand participant perspectives, we used Pol.is, a “real-time survey system, that helps identify the different ways a large group of people think about a divisive or complicated issue.” This system is a powerful way to aggregate and understand people’s opinions through written expression and voting.

Nate Angell and I both joined the conference virtually, two talking heads on a screen, while the majority of approximately 30 participants joined in-person in Singapore. After introducing the Movement for a Better Internet and asking folks to briefly introduce themselves, we immediately started our first Pol.is with the question: “What are your concerns about AI?” If you’re curious, you can pause here, and try out Pol.is for yourself.

In Pol.is, participants voted on a set of ten seed statements — statements that we wrote, based on previous community conversations,— they added their own concern statements, and then they voted on concern statements written by their peers in the room. Participants can choose “Agree,” “Disagree,” or “Unsure.” Overall, 31 total people voted and 532 votes were cast (with an average of 17.16 votes per person).

96% of participants agreed that “Verification of accuracy, truthfulness and provenance of AI-produced content is difficult.” This statement drove the most consensus among all participants in the group. Consensus indicates that people from different opinion groups have a common position, or in other words, people who do not usually agree with each other agree on this topic. The other two most consensus-driving concerns were: “Large-scale use of AI may have a negative impact on the environment” and “I suspect a push for greater copyright control would eventually be appropriated and exploited by big companies. E.g. Apple and privacy.”

The most divisive statement was: “AI is developing too fast and its impact is unclear.” Divisive implies the areas with the most differing opinions (rather than with the most disagreement, as widespread disagreement is a consensus too). The other three most divisive statements were also the most unclear statements, with more than 30% voting “Unsure”: “AI can negatively impact the education of students,” “AI can use an artist’s work without explicit permission or knowledge,” and “AI and the companies behind them steal human labor without credit and without pay.”

Back in our workshop room, we viewed the data report live, which was somewhat difficult due to limitations in text size. Participants in the room elaborated on their concerns, highlighting why they agreed or disagreed on particular points.

In the second half of the workshop, we asked participants to imagine ways we can meet one particular challenge. We focused our discussion on the only statement with 100% agreement: “AI makes it easier to create disinformation at scale.”

Participants were asked to write down their ideas in a shared document, and stand up to share their thoughts in front of the audience. The three major buckets for innovation in this space were education, technical advancement, and cultural advocacy. In education, participants brought up the need for critical thinking education to reinforce the ability to identify reliable sources and AI tools education to allow more people to understand how misinformation is created. Technical projects included developing AI to tackle disinformation, building a framework for evaluating AI tools during development, and creating better monitoring systems for misinformation. Participants also highlighted the need for cultural advocacy, from building the culture of citations and human-generated reference work to policy advocacy to maintain the openness of the commons.

Creative Commons will continue community consultations with Open Future Foundation in the next month. Sign up and learn more here.

The post Dispatches from Wikimania: Values for Shaping AI Towards a Better Internet appeared first on Creative Commons.

On Openness & Copyright, EU AI Act Final Version Appears to Include Promising Changes

Creative Commons — Mon, 11 Dec 2023 20:00:56 +0000

Throughout the last year, Creative Commons has actively engaged in the EU’s development of an AI Act. We welcomed its overall approach, focused on ensuring high-risk systems that use AI are trustworthy and safe. At the same time, we had concerns about the way it might impede better sharing and collaboration on the development of AI systems, and we joined with a coalition of AI developers and advocates offering suggestions for how to improve it. Rather than advocating for blanket exemptions, we supported a graduated, tailored approach – differentiating merely creating, sharing, and doing limited testing of new tools, versus offering a commercial service or otherwise putting powerful AI models into service, particularly at broad scale and impact.

We also raised concerns about late additions to the text related to copyright. While we generally support more transparency around the training data for regulated AI systems, the Parliament’s text included an unclear and impractical obligation to provide information specifically about use of copyrighted works.

This week, the EU’s political institutions announced that they have reached a tentative final agreement. We’re still awaiting a final text, and there are many other issues at stake related to the specific regulations on high-risk systems; a number of civil society organizations have raised concerns with, for example, changes to rules around predictive policing and biometric recognition, among other things.

At the same time, from the initial reported details (including this draft compromise text published by POLITICO), the final agreement appears promising relative to the recent Parliament text and from the perspective of supporting open source, open science, as well as on copyright. The devil is in the details, and we will update our views based on further review of the final text.

Open Source & Open Science

Consistent with our advocacy, the final version appears to clarify that merely providing and collaborating on AI systems under an open license is not covered by the Act, unless they are an AI system regulated by the Act (e.g., a defined “high-risk” system) that is commercially available or put into service.

As the AI Act progressed, focus shifted from particular high-risk systems to general purpose AI models (GPAI), sometimes referred to in terms of “foundation models.” This is a tricky issue, because it could have unintended consequences for a wide variety of beneficial uses of AI. In light of the Parliament’s proposed inclusion of these models, we had advocated for a tiered approach, requiring transparency and documentation of all models while reserving stricter requirements for commercial deployments and those put into service at some level of broad scale and impact.

On the one hand, the final Act also takes a tiered approach, reserving the strict requirements for models of “high impact” and “systemic risk.” On the other hand, the initial tiering is based on an arbitrary technical threshold, which at best only has a limited relationship to measuring actual real-world impact. Fortunately, it appears this tiering can be updated by regulators in the to-be-created AI Office in the future based on other quantitative and qualitative measures, and we hope that the final rules also appropriately distinguish between development of the pre-trained model, and follow-on, third party developers “fine-tuning” a model.

Interestingly, the draft text will exempt models that do not have “systemic risk” and are “made accessible to the public under a free and open-source license whose parameters, including the weights, the information on the model architecture, and the information on model usage,” with the exception of certain transparency requirements around training data and respect for copyright (see below). This provides further breathing room for open source developers, although it is worth noting that the definition of what constitutes an “open source license” in this context is still a matter of some debate. We hope those continuing discussions will help ensure these protections in the law are applied to those models that, by virtue of their openness, do provide critical transparency that facilitates robust accountability and trustworthy systems.

The exact rules will continue to evolve as the AI Act is implemented in the coming years, and other countries are also considering the role of openness. For instance, the U.S. Department of Commerce is soliciting input on “dual-use foundation models with widely available weights,” pursuant to the White House’s recent Executive Order.

As AI development and regulation continue to evolve next year, we will continue to work with a broad coalition to ensure better support for open source and open science. This fall, we were proud to join with a wide range of organizations and individuals in an additional joint statement emphasizing the importance of openness and transparency in AI – not only because it helps make the technology more accessible, but also because it can support trust, safety and security. We look forward to continuing to work with all stakeholders to make this a reality.

Copyright & Transparency

The final Act appears to take a more flexible approach to transparency around use of training data. Rather than expecting GPAI providers to list every specific work used for training and determine whether it is under copyright, it instead indicates that a summary of the collections and sources of data is enough (for example, it might be sufficient to state that one uses data from the web contained in Common Crawl’s dataset). The AI Office will create a template for meeting these transparency requirements. We welcome the new wording, which clarifies that the transparency requirement applies to any training data — not only to copyright-protected works. We will continue to engage on this topic to ensure it takes a flexible, proportionate approach, free of overreaching copyright restrictions.

The Act also requires that foundation model providers have policies in place to adhere to the copyright framework. It’s unclear exactly what this means besides restating that they must comply with existing law, including the opt-out stipulated in Article 4(3) of the DSM Directive. If that’s the intent, then it is an appropriate approach. As we said previously:

“We also believe that the existing copyright flexibilities for the use of copyrighted materials as training data must be upheld. The 2019 Directive on Copyright in the Digital Single Market and specifically its provisions on text-and-data mining exceptions for scientific research purposes and for general purposes provide a suitable framework for AI training. They offer legal certainty and strike the right balance between the rights of rightsholders and the freedoms necessary to stimulate scientific research and further creativity and innovation.”

The draft does create some uncertainty here, however. It states that models must comply with these provisions if put into service in the EU market, even if the training takes place elsewhere. On the one hand, the EU wants to avoid situations of “regulatory arbitrage,” where models are trained in a more permissive jurisdiction and then brought into the EU, without complying with EU rules. On the other hand, this threatens to create a situation where most restrictive rules set a global standard; to the extent that simply putting a model into service on a globally accessible website could put a provider in legal jeopardy, it could create uncertainty for developers.

The post On Openness & Copyright, EU AI Act Final Version Appears to Include Promising Changes appeared first on Creative Commons.

CC Responds to the United States Copyright Office Notice of Inquiry on Copyright and Artificial Intelligence

Creative Commons — Tue, 07 Nov 2023 19:44:45 +0000

In August, the United States Copyright Office issued a Notice of Inquiry seeking public responses to 34 questions (and several sub-questions) about the intersection of copyright law and artificial intelligence. The comment period closed on 30 October with over 10,000 individuals and organizations responding, representing a broad spectrum of interests on how copyright should apply in relation to generative AI. CC joined in the conversation to provide our own thoughts on copyright and AI to the copyright office.

Since our founding, we have sought out ways that new technologies can serve the public good, and we believe that generative AI can be a powerful tool to enhance human creativity and to benefit the commons. At the same time, we also recognize that it carries with it the risk of bringing about significant harm. We used this opportunity to explain to the Copyright Office why we believe that the proper application of copyright law can guide the development and use of generative AI in ways that serve the public and to highlight what we have learned from our community through the consultations we have held throughout 2023 and at our recent Global Summit about both the risks and opportunities that generative AI holds.

In this post we summarize the key point of our submission, namely:

AI training generally constitutes fair use
Copyright should protect AI outputs with significant human creative input
The substantial standard similarity should apply to Infringement by AI outputs
Creators should be able to express their preferences
Copyright cannot solve everything related to generative AI

AI training generally constitutes fair use

We believe that, in general, training generative AI constitutes fair use under current U.S. law. Using creative works to train generative AI fits with the long line of cases that has found that non-consumptive, technological uses of creative works in ways that are unrelated to the expressive content of those works are transformative fair uses, such as Authors Guild v. Google and Kelly v. Arriba Soft. Moreover, the most recent Supreme Court ruling on fair use, Andy Warhol Foundation v. Goldsmith, supports this conclusion. As we commented upon the decision’s release, the Warhol case focus on the specific way a follow-on use compares with the original use of a work indicates that training generative AI on creative works is transformative and should be fair use. This is because the use of copyrighted works for AI training has a fundamentally different purpose from the original aesthetic purposes of those works.

Copyright protection for AI outputs subject to significant human creative input

We believe that creative works produced with the assistance of generative AI tools should only be eligible for protection where they contain a significant enough degree of human creative input to justify protection, just like when creators use any other mechanical tools in the production of their works. The Supreme Court considered the relationship between artists and their tools vis-a-vis copyright over 100 years ago in Burrow-Giles v. Sarony, holding that copyright protects the creativity that human artists’ incorporate into their works, not the work of machines. While determining which parts of a work are authored by a human when using generative AI will not always be clear, this issue is not fundamentally different from any other situation where we have to determine the authorship of individual parts of works that are created without AI assistance.

Additionally, we believe that developers of generative AI tools should not receive copyright protection over the outputs of those tools. Copyright law already provides enough incentives to encourage development of these tools by protecting code, and extending protection to their outputs is unnecessary to encourage innovation and investment in this space.

Infringement should be determined using the substantial similarity test

We believe that the substantial similarity standard that already exists in copyright law is sufficient to address where AI outputs infringe on other works. The debate about how copyright should apply to generative AI has often been cast in all-or-nothing terms — does something infringe on pre-existing copyrights or not? The answer to this question is certainly that generative AI can infringe on other works, but just as easily it may not. As with any other question about the substantial similarity between two works, these issues will be highly fact specific, and we cannot automatically say whether works produced by generative AI tools infringe or not.

Creators should be able to express their preferences

In general, we believe there is value in methods that enable individuals to to signal their preferences for how their works are shared in the context of generative AI. In our community consultations, we heard general support for preference signals, but there was no consensus in how best to do this. Opt-ins and opt-outs may be one way, but we do not believe they need to be required by US copyright law; instead, we would like to see voluntary schemes, similar to approaches to web scraping, which allow for standardized expression of these preferences without creating strict barriers to usage in cases where it may be appropriate.

Transparency is necessary to build trust — Copyright is only one lens through which to consider AI regulation

We urge caution and flexibility in any approach to regulating generative AI through copyright. We believe that copyright policy can guide the development of generative AI in ways that benefit all, but that overregulation or inappropriate regulation can hurt both the technology and the public. For example, measures that improve transparency into AI models can build trust in AI models by allowing outside observers to “look under the hood” to investigate how they work. But these measures should not be rooted in copyright law. Copyright is just one lens through which we can view generative AI, and it is ill equipped to deal with many of the social harms that concern us and many others. Attempting to use copyright to solve all of these issues may have unintended consequences and ultimately do more harm than good.

We are happy to see the Copyright Office seeking out guidance on these many difficult questions. We will have to wait to see what comes from this, but we will hope for the best, and continue to engage our community so we can more fully understand what role generative AI should play in building the commons and serving the public good.

Read CC’s full submission to the Copyright Office >

The post CC Responds to the United States Copyright Office Notice of Inquiry on Copyright and Artificial Intelligence appeared first on Creative Commons.