Theft of Fanworks Perpetrated by nyuuzyou
Event | |
---|---|
Event: | Unauthorized AI Scraping of Fanworks |
Participants: | |
Date(s): | April 2025 |
Type: | Controversy |
Fandom: | pan-fandom |
URL: | |
Click here for related articles on Fanlore. | |
On 15 April 2025, the website PaperDemon broke the news that a user by the name of nyuuzyou on the machine-learning platform HuggingFace had scraped artwork and writings across several platforms, notably including AO3, for use in AI training models. This action led to controversy and widespread denouncement of these actions, resulting in heated discussions.
By April 23rd, DMCA takedown notices had resulted in all of the datasets either being deleted or temporarily disabled.[1] However, by this same date, nyuuzyou uploaded the datasets and code to the website modelscope (based in China) along with their personal website (based in Russia) in an apparent attempt to skirt the DMCA takedown notices.[2] Within hours, the the datasets were taken off of modelscope, but remained live on the personal website.[1]
The datasets were reportedly published in English, Chinese, and Russian with some of the uploads connected to servers in China and Russia.[3][4] The use of Russian and Chinese-based platforms was flagged as particularly troublesome as American DMCA takedown notices would not be honored in those jurisdictions. This lead some users to suggest reporting the hosting websites to the Chinese and Russian governments on grounds of "hosting data [from AO3] riddled with LGBTQ content, porn, furry porn, underage content and so on."[5] Others noted that since AO3 is banned in China, that could be used as a strategy to take down the data.[5]
Nyuuzyou filed a counterclaim to the DMCA takedown notices, but had not heard back after a week.[4]
Websites Scraped
Altogether, seven platforms were scraped, all of whom acted quickly to get their users' content removed. These include.
- PaperDemon (both PaperDemon Art and Paper Demon Writing were impacted)
- Archive of our Own
- Artfol
- Artgram
- Character Hub
- Itaku
- PaintBerri
The scope of the datasets was noted to be extremely large. It was reported that all unlocked AO3 flics with IDs ranging from 1 and 63,200,000 were scraped.[3] In response to the safety of locked works, many users locked their works to prevent further theft and advised others to do the same.[6]
PaperDemon reported that the scrape included 49,382 artworks and 2,950 written pieces.[1]
Reaction
Users of the affected platforms, quickly responded in anger at the theft of their works with community members denouncing the use of works without the creators' consent.
You're waking the sleeping giant by publishing an AO3 scrape here. Fandom is VIOLENTLY anti-AI right now.Expect to be fighting DMCA takedown notices for the next century because once word gets out, everyone and their brother is going to be coming for this data. When fandom gets mad about something... ask app developers that try and make a profit off the archive.
This isn't a threat, honest, this is "Are you sure you know what you're in for once this gets into the general fandom zeitgeist?"AO3userwhoisnothappy, April 2025 https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/6
One AO3 user took to HuggingFace itself to write a passionate denunciation of the theft.
Once again worth reiterating to any lurkers keeping score at home: people who know they are in the right do not have to tie themselves in knots with thought experiments and hypotheticals to excuse their actions. The people who keep doing so know they violated people's consent, and they know they can't make things without violating people's consent. Bottom line.The Archive is an Archive -- exactly what it says on the tin: a place for writers to store their work. If everyone who's posted their work to the site decided after this to lock their fics in unseen or invite-only collections, the site would still be performing its primary purpose. A place for creators to store their work without fear of it being pulled down like fansites that used to have to deal with Anne Rice's litigation team, or Strikethrough, or any fan rupture experienced previously.
Reader experience is nice, but not the core of the site. It's why we don't have dislike buttons, or algorithms/More Like This functions, or even DMs. Readers can leave comments and kudos, and we can respond to them and build rapport if we chose to. But ultimately, writers are at the center of the site. If someone never gets a single comment or kudo, that has no impact on the status of the work itself. Readers can look for fics with tons of hits and kudos if that's how they want to decide their reading experience, but it makes no tangible difference on our end of things. We post things, and we know they will stay. Any reader interaction is lagniappe.
The people who are trying to frame theft as being "pro-information" are still anti-consent. They are anti-writer. They see writing as an end product, something they can use or collect or potentially profit from. They do not see it as the summation of a process, or choices, or even work. They are fundamentally extracting the human element from what they see as something collectible and storable. They have dehumanized you, the writer, and your creative labor, and your autonomy as to where your work goes and what is done with it. They will try to tell you that this is inevitably what happens to any text or medium on the internet, but one person uploading one fic is not the same as a studio conglomerate getting their movie pirated, or even a book released by a Big Five publisher. This is a solo, very occasionally collaborative, effort, undertaken by the artist's choice and from love. We wrote it for us, first and foremost, because we wanted to see it exist. Them trying to frame us protecting our work as somehow being "anti-reader" fundamentally misunderstand the fic writing process, because they only see the end product, not the other elements that go into it.
Their models cannot just exist -- they need end users to make them profitable, or to justify their creation. They can't train their models on writing they alone have generated. Even if they just trained it on just the works by people who claim to be all for it, they still can't make it worthwhile. They cannot make this stuff without violating people's consent, and while they might try to justify doing so, they can't get around that basic fact. It doesn't work without stolen effort. They don't have anything else. It's why they're so defensive and juvenile, because they know that however they try to spin it, they're still violating people's basic autonomy on a 1:1 level.
You made something. They took it to use it without your permission, in a way that violates the integrity of your project and your rights as a creator. If you objected to this, as many of us obviously do, they can't change that. They can only find loopholes, or talk down to you out of both sides of their mouth, or hide behind "everyone's doing it, why can't I?" because a lack of spine will do that to you.
The people here might try to intimidate you from standing up for your autonomy and your rights by threatening that the readers will react badly, that their machine will be an alternative, but that's because at the end of the day they need us to still do the heavy lifting for them. If they didn't, they wouldn't have involved us by taking our work at all.
If you want to take actions to protect your work, do that. If you want to restrict access, that's your choice alone. It just means you see your work as more than just a commodity.
People who care about you as a human and a creative, who don't condone exploitation, will understand.
The creeps here can't do anything without us. That's why they want you to think it's all inevitable and there's no point in demanding being treated fairly. Because they're counting on creativity and labor that isn't theirs, to violate your consent and autonomy without you objecting. You don't have to give them that, or even humor a "debate." There is no debate when it comes to you and your creative work.RaraeAves, 1 May 2025 https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/3
Conversations between people in support of the scraping and those against it to become rather heated. Despite the negative backlash, some individuals defended the scraping as being "pro-information."[4] A HuggingFace user by the name of hikitoxin went so far as to claim the DMCA takedown notices were "unfounded" and constituted an "act of perjury."[7] The user claimed that, since AO3 and the OTW did not own the rights to the fanworks published on the site, they did not have legal rights to submit the claims.[7] While conceding that individual users could issue DMCA takedown notices, hikitoxin claimed that the dataset fell under terms of fair use.[7]
Kalomaze, another HuggingFace user, also expressed disagreement with critiques of the dataset, writing "from my perspective, this seems sort of like protesting big pharma by bringing a flamethrower to a clinic in your town or something. this only hurts people who are doing what they are doing downstream of the existence of large AI companies who aren't being transparent about their data practices to begin with."[7]
Further Reading
- The Endless Appetite for Fanfiction by Elizabeth Minkel (December 31, 2024)
Similar Controversies
- Podfic and Last.fm (October 2010)
- Ebooks Tree (2015)
- AO3 App Wars (2020)
- AO3 & AI Generated Content (May 2023)
- Lore.fm (May 2024)
- Theft of Fanfiction Perpetrated by Cliff Weitzman, WordStream, Speechify (December 2024)
Sources
- ^ a b c https://web.archive.org/web/20250428154521/https://www.paperdemon.com/app/g/pdarpg/events/view/994/immediate-action-required-your-art-and-writing-has-been-scraped-and-published-in-an-ai-dataset/1
- ^ https://weenwrites.tumblr.com/post/782277032063205376
- ^ a b https://www.reddit.com/r/FanFiction/comments/1k6mmxs/ao3s_data_was_scraped_for_ai_what_to_know/
- ^ a b c https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/3
- ^ a b https://old.reddit.com/r/AO3/comments/1k6a3t6/ao3_has_been_scraped_again_for_genai_purposes/moosipe/
- ^ https://queenofcats17.tumblr.com/post/781998121219653632
- ^ a b c d https://web.archive.org/web/20250429031724/https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/193