{"id":76543,"date":"2025-05-15T17:32:39","date_gmt":"2025-05-15T17:32:39","guid":{"rendered":"https:\/\/creativecommons.org\/?page_id=76543"},"modified":"2025-05-28T14:00:51","modified_gmt":"2025-05-28T14:00:51","slug":"using-cc-licensed-works-for-ai-training-2","status":"publish","type":"page","link":"https:\/\/creativecommons.org\/using-cc-licensed-works-for-ai-training-2\/","title":{"rendered":"Using CC-Licensed Works for AI Training"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">The application of copyright law to AI training is complex.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Around the world, there are varying exceptions and limitations to copyright that permit AI training. Jurisdictions also often consider the specific uses of the copyrighted work within AI training and outputs.\u00a0<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">*Reminder* CC licenses are copyright licenses.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">That means that applying CC licenses to AI training is just as complex.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In an effort to untangle some of this complexity, we have prepared the following primer that tackles the issues from two distinct angles.\u00a0<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Practical guidance for how to follow the CC licenses for training data, even in situations where copyright may not require it.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">A legalistic approach, analyzing <\/span><i><span style=\"font-weight: 400;\">when<\/span><\/i><span style=\"font-weight: 400;\"> compliance with the CC licenses is legally required.\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">This primer may also be a useful tool for creators applying CC licenses to their work to better understand how their works may be used in AI training.\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">How to Follow the CC Licenses for Training Data*<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">*Even in situations where copyright may not require it.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Keep in mind: following this guidance is likely to lead to overcompliance with both copyright law and CC license terms. It assumes the most restrictive legal interpretation for those who wish to take a conservative approach and minimize risk.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">What follows is a quick look at CC-licensed content and AI training. Please <a href=\"https:\/\/creativecommons.org\/wp-content\/uploads\/2025\/05\/Using-CC-licensed-Works-for-AI-Training.pdf\">review the full analysis<\/a> for more details.\u00a0\u00a0\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/buttons\/88x31\/png\/by.png\" width=\"100\" height=\"35\" \/> \u00a0<b>Attribution (BY)<\/b><\/p>\n<p><span style=\"font-weight: 400;\">All CC licenses require attribution to the creator(s) of the licensed material. For AI model training, attribution could be a simple link to the source of the dataset used to train the model.\u00a0 Where retrieval-augmented generation (RAG) or other methods are available, providing attribution to the CC-licensed work tied to the particular model output with a link to the source is ideal.\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/buttons\/88x31\/png\/by-sa.png\" width=\"99\" height=\"35\" \/> <b>ShareAlike (SA)<\/b><\/p>\n<p><a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/\"><span style=\"font-weight: 400;\">CC BY-SA<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span style=\"font-weight: 400;\">CC BY-NC-SA<\/span><\/a><span style=\"font-weight: 400;\"> require that adaptations be shared under the same license. If AI models or outputs are based on ShareAlike content and they will be shared publicly, following the ShareAlike condition would require AI developers to use the same CC license as the original works.\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/buttons\/88x31\/png\/by-nc.png\" width=\"100\" height=\"35\" \/> \u00a0<img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/buttons\/88x31\/png\/by-nc-sa.png\" width=\"100\" height=\"35\" \/> <b>NonCommercial (NC)<\/b><\/p>\n<p><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/\"><span style=\"font-weight: 400;\">CC BY-NC<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span style=\"font-weight: 400;\">CC BY-NC-SA<\/span><\/a><span style=\"font-weight: 400;\"> give permission for NonCommercial uses only. If AI training data includes the NonCommercial restriction, then following the NC restriction would require that all stages, from copying the data during training to sharing the trained model, must not be for commercial gain.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/buttons\/88x31\/png\/by-nd.png\" width=\"100\" height=\"35\" \/> <b>NoDerivatives (ND)<\/b><\/p>\n<p><a href=\"https:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/\"><span style=\"font-weight: 400;\">CC BY-ND<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><span style=\"font-weight: 400;\">CC BY-NC-ND<\/span><\/a><span style=\"font-weight: 400;\"> prohibit creating derivative works. Following the NoDerivatives restriction would require that ND-licensed content not be used as training data.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">An Analysis of When Compliance with the CC Licenses is Legally Required\u00a0<\/span><\/h2>\n<p><i><span style=\"font-weight: 400;\">I\u2019m an AI developer. Do I Have to Comply With the CC License?<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">It depends. CC licenses apply only when copyright permission is required. If exceptions or limitations apply, then the CC license terms don\u2019t apply.\u00a0<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Step 1 &#8211; Does copyright apply?<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">There are two main aspects to the analysis of whether copyright applies when using CC-licensed works as inputs to training.\u00a0\u00a0<\/span><b><\/b><\/p>\n<ul>\n<li aria-level=\"1\"><b>Dataset Acquisition and Preparation for Training<\/b><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The process of training AI models usually requires copying the works that are used as training data. When those copied works are copyrighted, the training step can have copyright implications. Around the world, copyright law varies regarding whether and how such copying can be permissible.\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Memorization in Training<\/b><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Sometimes, AI models reproduce and store certain copyrightable expressions of the works on which they were trained (as opposed to simply copying data about them, for example). This is commonly referred to as <\/span><i><span style=\"font-weight: 400;\">memorization<\/span><\/i><span style=\"font-weight: 400;\">. Currently, the only way to know whether a model has memorized content is by identifying model outputs that are substantially similar to the original.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This means that the extent of memorization is impossible to quantify in the abstract. It is also important to note that not all instances of memorization are infringing. To varying degrees, model developers make efforts to adapt their methods to try to avoid memorization, and this can mitigate, but may not eliminate, the risk of copyright infringement.<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Step 2 &#8211; When Do CC License Conditions Apply?<\/span><\/h3>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The attribution (BY) and ShareAlike (SA) conditions, and NoDerivatives (ND) restriction are triggered only when works or adaptations of works are publicly shared.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The NonCommercial (NC) restriction applies to all uses requiring permission under copyright.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">For a deeper dive, <a href=\"https:\/\/creativecommons.org\/wp-content\/uploads\/2025\/05\/Using-CC-licensed-Works-for-AI-Training.pdf\">see our detailed legal analysis<\/a> and <a href=\"https:\/\/creativecommons.org\/faq\/#artificial-intelligence-and-cc-licenses\">flow chart<\/a>.<\/span><\/p>\n<figure id=\"post-76418 media-76418\" class=\"align-left\"><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The application of copyright law to AI training is complex.\u00a0 Around the world, there are varying exceptions and limitations to copyright that permit AI training. Jurisdictions also often consider the specific uses of the copyrighted work within AI training and outputs.\u00a0 *Reminder* CC licenses are copyright licenses. That means that applying CC licenses to AI&hellip;<\/p>\n","protected":false},"author":23,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"acf":[],"_links":{"self":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/pages\/76543"}],"collection":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/comments?post=76543"}],"version-history":[{"count":7,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/pages\/76543\/revisions"}],"predecessor-version":[{"id":76590,"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/pages\/76543\/revisions\/76590"}],"wp:attachment":[{"href":"https:\/\/creativecommons.org\/wp-json\/wp\/v2\/media?parent=76543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}