This new data poisoning tool lets artists fight back against generative AI

Elle · 1 year ago

This new data poisoning tool lets artists fight back against generative AI

@MamboGator@lemmy.world · edit-2 4 months ago

deleted by creator

@0xD@infosec.pub · 1 year ago

I don’t see a problem with it training on all materials, fuck copyright. I see the problem in it infringing on everyone’s copyright and then being proprietary, monetized bullshit.

If it trains on an open dataset, it must be completely and fully open. Everything else is peak capitalism.

@Smoogs@lemmy.world · 1 year ago

You’re not owed nor entitled to an artist’s time and work for free.

Turun · edit-2 1 year ago

Of course not, it’s the artists decision to put it on the internet for free.

Technically that’s the root of the issue. This does not grant a license to everyone who looks at it, but if a license is required to train a model is unclear and currently discussed in court.

@kayrae_42@lemmy.world · 1 year ago

The problem is the only way for artists to get people to see and eventually buy their art or commissions is to post some of their work publicly. Historically you would go out on the street and set up a stall, now social media is our digital street. Galleries don’t take everyone, having the ability to even get a meeting with one is difficult without the right connections. Most artists are never successful enough to completely live off their art, if they can make any money at all it is great for them. Then along comes an AI model that takes their work because it’s on the internet scrapes it into its training set and now any chance they had in an over saturated market is even smaller, because hey, I can just do this with AI. This idea that copyright and IP shouldn’t exist at all is kinda absurd. Would you just go through a street art walk, take high res photos of every picture they have on display, not take any business cards, and when they ask what you are doing, go “it’s ok, I’m training an AI data model so people can just make work that looks exactly like this. They shouldn’t have to ever buy from you. Capitalism is a joke. Bye!” The art walk was free, but it was also a sales pitch, because that’s how the art world works. You are hoping to get seen, that someone likes it enough to buy, and maybe buy more.

V H · 1 year ago

This idea that copyright and IP shouldn’t exist at all is kinda absurd.

For the majority of human existence, that was the default.

Copyright exists as an explicit tradeoff between the rights of the public to be able to do as they please with stuff introduced into the public sphere, and a legal limitation infringing on the publics liberty for a limited time for the purpose of encouraging the creation of more works for the public benefit. It was not introduced as some sort of inherent right, but as a trade between the public and creators to incentivise them.

Stripping it away from existing artists who has come to depend on it without some alternative would be grossly unfair, but there’s nothing absurd about wanting to change the bargain over time. After all, that has been done many times, and the copyright we have now is vastly different and far more expansive and lengthy than early copyright protection.

Personally, I’d be in favour of finding alternative means of supporting creators and stripping back copyright as a tradeoff. The vast majority of creators earn next to nothing from their works; only a very tiny minority makes a livable wage of art of any form at all, and of the rest the vast majority of profits take place in a very short period of initial exploitation of a work, so we could allow the vast majority to earn more from their art relatively cheaply, and affect the rest to a relatively limited degree, while benefiting from the reduced restrictions.

@kayrae_42@lemmy.world · 1 year ago

I agree that copyright lasts far too long, but the idea I can post a picture today, and in a hour it’s in an AI model without my consent bothers me. Historically there was a person to person exchange. But now we are so detached from it all I don’t think we can have that same affordance of no types of protections. I’m not saying one person can solve this. But I don’t see UBI or anything like that ever happening. As a person who has lived on disability most of their life, people don’t like to share their wealth with anyone for any reason. I’ve never been able to sell art for a living and am now going to school for data science. So I know about both ends of this. Just scraping without consent is unethical and many who do this have no idea about the art world or how artist create in general.

V H · 1 year ago

I doesn’t need to be full on UBI. In a lot of countries grants mechanisms and public purchasing mechanisms for art already make up a significant proportion of income for artists. Especially in smaller countries, this is very common (more so for literary works, movies and music where language provides a significant barrier to accessing a bigger audience, but for other art too). Imagine perhaps a tax/compulsory licensing mechanism that doesn’t stop AI training but instead massively expands those funding sources for people whose data are included in training sets.

This is not stoppable, not least because it’s “too cheap” to buy content outright.

I pointed out elsewhere that e.g. OpenAI could buy all of Getty Images for ~2% of their currently estimated market cap based on a rumoured recent cash infusion. Financing vast amounts of works for hire just creates a moat for smaller players while the big players will still be able to keep improving their models.

As such it will do nothing to protect established artists, so we need expansion of ways to fund artists whether or not inclusion of copyrighted works in training sets becomes restricted.

@kayrae_42@lemmy.world · 1 year ago

Those grants, and public purchases make up a significant portion of income for established main stream artists. If you work on commission only online, or never went to art school those won’t cover you.

These large tech companies become so highly valued at the start because of venture capital and then in 5-10 years collapse under their own weight. How many of these have come up and are now close to drowning after pushing out all competitors? Sorry if I’m not excited about an infusion of cash into a large for profit company that is just gobbling up anything anyone posts online without consent to make a quick buck.

I’m not against AI. I’m against the ethics of AI at the moment because it’s awful. And AI leans into biases it finds and there are not a lot of oversights on this.

Turun · 1 year ago

This idea that copyright and IP shouldn’t exist at all is kinda absurd

I don’t hold this opinion at all.

I’m just saying that there are uses for which you don’t need a license. Say, visiting an art exhibition and then going home and trying to draw similar pictures. Wether AI training falls into this category or instead requires a license is currently unclear.

Btw, two spaces before the line break
Creates the spacing you want.

@kayrae_42@lemmy.world · 1 year ago

As an artist who studies data science, I would say doing art and generating art are an entirely different process. AI has no reference outside of the information we give it. It had no real understanding of lighting, spacial awareness. We can tell it every tank is a cat, every flashlight is a pig and it will never question it. If we tell a toddler that every tank is a cat, they may call a tank a cat, but they will never think a that “cat” is a house pet. They will never think that “pig” will oink or be turned into steaks. An AI however would if your language conventions were the same in the prompt.

If you go to the art walk and go home and try to recreate a style, you were inspired. If an AI model is trained on many styles and you tell it “portrait, woman, Van Gogh style, painterly, blue tones” then do you understand what you asked for? Was the ai inspired by Van Gogh? Did the ai study his techniques? No. It broke down his art pixel by pixel, rearranged it in a filter styled overlay over a woman, most likely a young woman-because of algorithmic bias which has been studied- in shades of blue. Humans take the time to study the why, the how. Ai does not. Humans are not just meat robots.

I should say I’m not against AI art. I’m against gathering against consent. If it was opt in, or if there was some type of pay for program that would be fine. Even if it was pennies each month. But the fact that they scrape without consent. Or are now going back and adding it into TOS where it never was before feels scummy. AI art has a place, and is a helpful tool. But it’s not a replacement for artists, it has many flaws still, that might never be worked out.

Thank you for helping me with line break.

@barsoap@lemm.ee · edit-2 1 year ago

I am perfectly entitled to type random stuff into google images, pick out images for a mood board and some as reference, regardless of their copyright status, thank you. Studying is not infringement.

It’s what every artist does, it’s perfectly legal, and what those models do is actually even less infringing because they’re not directly looking at your picture of a giraffe and my picture of a zebra when drawing a zebra-striped giraffe, they’re doing it from memory.

@Smoogs@lemmy.world · 1 year ago

Art takes effort. You’re not entitled to that for free.

@barsoap@lemm.ee · edit-2 1 year ago

And if you think that working with AI does not take effort you either did not try, or don’t have an artistic bone in your body. Randos typing “Woman with huge bazingas” into an UI and hitting generate don’t get copyright on the output, rightly so: Not just did they not do anything artistic, they also overlook all the issues with whatever gets generated because they lack the trained eye of an artist.

@9thSun@midwest.social · 1 year ago

How is training AI with art on the web different to a person studying art styles? I’d say if the AI is being monetized in some capacity, then sure maybe there should be laws in place. I’m just hard-pressed to believe that anyone can have sole control of anything once it gets on the Internet.

@Zeth0s@lemmy.world · edit-2 1 year ago

I work in AI and I believe it is different. Society is built to distribute wealth, so that everyone can live a decent life. People and AI should be treated differently in front of the law. Also, non-commercial, open source AI should be treated differently than commercial or closed source models

V H · 1 year ago

Society is built to distribute wealth, so that everyone can live a decent life.

As a goal, I admire it, but if you intend this as a description of how things are it’d be boundlessly naive.

@Zeth0s@lemmy.world · edit-2 1 year ago

That’s absolutely not how it is now, just the goal we should set for ourselves. A goal I believe we should consider when regulating AI

V H · edit-2 1 year ago

To me, that’s not an argument for regulating AI, though, because most regulation we can come up with will benefit those with deep enough pockets to buy themselves out of the problem, while solving nothing.

E.g. as I’ve pointed out in other debates like this, Getty Images has a market cap of <$2bn. OpenAI may have had a valuation in the $90bn range. Google, MS, Adobe all also have shares prices that would trivially allow them to purchase someone like Getty to get ownership of a large training set of photos. Adobe already has rights to a huge selection via their own stock service.

Bertelsmann owns Penguin Random-House and a range ofter publishing subsidiaries. It’s market cap is around 15 billion Euro. Also well within price for a large AI contender to buy to be able to insert clauses about AI rights. (You think authors will refuse to accept that? All but the top sellers will generally be unable to afford to turn down a publishing deal, especially if it’s sugar-coated enough, but they also sit on a shit-ton of works where the source text is out-of-copyright but they own the right to the translations outright as works-for-hire)

That’s before considering simply hiring a bunch of writers and artists to produce data for hire.

So any regulation you put in place to limit the use of copyrighted works only creates a “tax” effectively.

E.g. OpenAI might not be able to copy artist X’s images, but they’ll be able to hire artist Y on the cheap to churn out art in artist X’s style for hire, and then train on that. They might not be able to use author Z’s work, but they can hire a bunch of hungry writers (published books sells ca 200 copies on average; the average full time author in the UK earns below minimum wage from their writing) as a content farm.

The net result for most creators will be the same.

Even wonder why Sam Altmann of OpenAI has been lobbying about the dangers of AI? This is why. And its just the start. As soon as these companies have enough capital to buy themselves access for data, regulations preventing training on copyrighted data will be them pulling up the drawbridge and making it cost-prohibitive for people to build open, publicly accessible models in ways that can be legally used.

And in doing so they’ll effectively get to charge an “AI tax” on everyone else.

If we’re going to protect artists, we’d be far better off finding other ways of compensating them for the effects, not least because it will actually provide them some protection.

@Zeth0s@lemmy.world · 1 year ago

UBI is the known solution to protect workers. Solution is there, people aren’t ready for it

V H · 1 year ago

As long as people aren’t ready for it, then it doesn’t solve the immediate problem that needs to be solved today.

@BearOfaTime@lemm.ee · 1 year ago

Lol.

How does UBI break trademark and copyright law (and therefore legal cases)?

Do you really think the current power brokers will suddenly sit in their hands and stop trying to (mostly successfully) control as much as they can?

@Zeth0s@lemmy.world · edit-2 1 year ago

UBI is needed because most of the jobs people are currently doing are already not needed. They are needed just to redistribute wealth, but most of the jobs are currently already useless (if you work in corporate, public sector or retail you know what I am talking about). In the future more will become useless. Current copyright laws are already outdated and don’t work anymore. Only safe solution for people who want to dedicate their lives to visual art is UBI. Because of the known reasons. Most “artists” are not really doing art, simply a job for entertainment industry that in the future will be done by much fewer people due to technological and organizational changes. As it is already happening now, even before AI.

UBI is a solution for similar situations, that will be even more common in future. We need better solutions to redistribute wealth, from what you call “power brokers” to larger society

@realharo@lemm.ee · edit-2 1 year ago

How is training AI with art on the web different to a person studying art styles?

Human brains clearly work differently than AI, how is this even a question?

The term “learning” in machine learning is mainly a metaphor.

Also, laws are written with a practical purpose in mind - they are not some universal, purely philosophical construct and never have been.

V H · 1 year ago

Human brains clearly work differently than AI, how is this even a question?

It’s not all that clear that those differences are qualitatively meaningful, but that is irrelevant to the question they asked, so this is entirely a strawman.

Why does the way AI vs. the brain learn make training AI with art make it different to a person studying art styles? Both learn to generalise features that allows them to reproduce them. Both can do so without copying specific source material.

The term “learning” in machine learning is mainly a metaphor.

How do the way they learn differ from how humans learn? They generalise. They form “world models” of how information relates. They extrapolate.

Also, laws are written with a practical purpose in mind - they are not some universal, purely philosophical construct and never have been.

This is the only uncontroversial part of your answer. The main reason why courts will treat human and AI actions different is simply that they are not human. It will for the foreseeable future have little to do whether the processes are similar enough to how humans do it.

@realharo@lemm.ee · edit-2 1 year ago

Now you’re just cherry picking some surface-level similarities.

You can see the difference in the process in the results, for example in how some generated pictures will contain something like a signature in the corner, simply because it resembles the training data - even though there is no meaning to it. Or how it is at least possible to get the model to output something extremely close to the training data - https://gizmodo.com/ai-art-generators-ai-copyright-stable-diffusion-1850060656.

That at least proves that the process is quite different to the process of human learning.

The question is how much those differences matter, and which similarities you want to focus on.

Human learning is similar in some ways, but greatly differs in other ways.

The fact that you’re picking and choosing which similarities matter and which don’t is just your arbitrary choice.

V H · edit-2 1 year ago

You can see the difference in the process in the results, for example in how some generated pictures will contain something like a signature in the corner

If you were to train human children on an endless series of pictures with signatures in the corner, do you seriously think they’d not emulate signatures in the corner?

If you think that, you haven’t seen many children’s drawings, because children also often pick up that it’s normal to put something in the corner, despite the fact that to children pictures with signatures is a tiny proportion of visual input.

Or how it is at least possible to get the model to output something extremely close to the training data

People also mimic. We often explicitly learn to mimic - e.g. I have my sons art folder right here, full of examples of him being explicitly taught to make direct copies as a means to learn technique.

We just don’t have very good memory. This is an argument for a difference in ability to retain and reproduce inputs, not an argument for a difference in methods.

And again, this is a strawman. It doesn’t even begin to try to answer the questions I asked, or the one raised by the person you first responded to.

That at least proves that the process is quite different to the process of human learning.

Neither of those really suggests that all (that diffusion is different to humans learn to generalize images is likely true, what you’ve described does not provide even the start of any evidence of that), but again that is a strawman.

There was no claim they work the same. The question raised was how the way they’re trained is different from how a human learns styles.

@9thSun@midwest.social · 1 year ago

I appreciate your responses, thank you!

@FooBarrington@lemmy.world · 1 year ago

I agree that the training isn’t fundamentally different, but that monetization of the output has to be controlled. The big difference between AI and humans is the speed with which they create - you have to employ an army of humans to match the output of a couple of GPUs. For noncommercial projects this is amazing. For commercial projects, it destroys the artists livelihoods.

But this simply means that training shouldn’t be controlled, inference in commercial contexts should be.

@rhombus@sh.itjust.works · 1 year ago

The real issue comes in ownership of the AI models and the vast amount of labor involved in the training data. It’s taking what is probably hundreds of thousands of hours of labor in the form of art and converting it into a proprietary machine, all without compensating the artists involved. Whether you can make a comparison to a human studying art is irrelevant, because a corporation can’t own an artist, but they can own an AI and not have to pay it.

@regbin_@lemmy.world · edit-2 1 year ago

Disagree. It’s only unethical if you use it to generate the artist’s existing pieces and claim it as yours.

@MamboGator@lemmy.world · edit-2 4 months ago

deleted by creator

@9thSun@midwest.social · 1 year ago

I don’t see how AI training couldn’t be considered transformative as the whole idea is to consume input, break it down into data, and output something new. The way I’m understanding what you’re saying is like this: Instead of only paying royalties when I try to monetize a cover song, I’d have to pay every time I practiced it.

@MamboGator@lemmy.world · edit-2 4 months ago

deleted by creator

@9thSun@midwest.social · 1 year ago

I don’t understand how you’re separating the the generated artworks from the AI that’s generating the work, but I do see your point. If a company puts out a tool for free I don’t think they should be on the hook for someone using that and creating a product. At the end of it all though, I think whoever has made any hard financial gains should should payout whoever contributed.

Elle · 1 year ago

Until the law catches up with the technology, people need ways of protecting themselves.

I agree, and I wonder if the law might be kicked into catching up quicker as more companies try to adopt these tools and inadvertently infringe on other companies’ copyrighted material. 😅