The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

FatCat@lemmy.world · 2 years ago

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

calcopiritus@lemmy.world · 2 years ago

I’ll train my AI on just the bee movie. Then I’m going to ask it “can you make me a movie about bees”? When it spits the whole movie, I can just watch it or sell it or whatever, it was a creation of my AI, which learned just like any human would! Of course I didn’t even pay for the original copy to train my AI, it’s for learning purposes, and learning should be a basic human right!

Valmond@lemmy.world · 2 years ago

In the meantime I’ll introduce myself into the servers of large corporations and read their emails, codebase, teams and strategic analysis, it’s just learning!

stephen01king@lemmy.zip · 2 years ago

That would be like you writing out the bee movie yourself after memorizing the whole movie and claiming it is your own idea or using it as proof that humans memorizing a movie is violating copyright. Just because an AI is violating copyright by outputting the whole bee movie, it doesn’t mean training the AI on copyright stuff is violating copyright.

Let’s just punish the AI companies for outputting copyright stuff instead of for training with them. Maybe that way they would actually go out of their way to make their LLM intelligent enough to not spit out copyrighted content.

Or, we can just make it so that any output made by an AI that is trained on copyrighted stuff cannot be copyrighted.

calcopiritus@lemmy.world · 2 years ago

If the solution is making the output non-copyrighted it fixes nothing. You can sell the pirating machine on a subscription. And it’s not like Netflix where the content ends when the subscription ends, you have already downloaded all the not-copyrighted content you wanted, and the internet would be full of non-copyrighted AI output.

Instead of selling the bee movie, you sell a bee movie maker, and a spiderman maker, and a titanic maker.

Sure, file a copyright infringement each time you manage to make an AI output copyrighted content. Just run it on a loop and it’s a money making machine. That’s fine by me.

stephen01king@lemmy.zip · 2 years ago

Yeah, because running the AI also have some cost, so you are selling the subscription to run the AI on their server, not it’s output.

I’m not sure what is the legality of selling a bee movie maker, so you’d have to research that one yourself.

It’s not really a money making machine if you lose more money running the AI on your server farm, but whatever floats your boat. Also, there are already lawsuits based on outputs created from chatgpt, so it is exactly what is already happening.

calcopiritus@lemmy.world · edit-2 2 years ago

Yeah, making sandwiches also costs money! I have to pay my sandwich making employees to keep the business profitable! How do they expect me to pay for the cheese?

EDIT: also, you completely missed my point. The money making machine is the AI because the copyright owners could just use them every time it produces copyright-protected material if we decided to take that route, which is what the parent comment suggested.

stephen01king@lemmy.zip · 2 years ago

They should pay for the cheese, I’m not arguing against that, but they should be paying it the same amount as a normal human would if they want access to that cheese. No extra fees for access to copyrighted material if you want to use it to train AI vs wanting to consume it yourself.

And I didn’t miss your point. My point was that the reality is already occurring since people are already suing OpenAI for ChatGPT outputs that the people suing are generating themselves, so it’s no longer just a hypothetical. We’ll see if it is a money making machine for them or will they just waste their resources from doing that.

calcopiritus@lemmy.world · 2 years ago

Media is not exactly like cheese though. With cheese, you buy it and it’s yours. Media, however, is protected by copyright. When you watch a movie, you are given a license to watch the movie.

When an AI watches a movie, it’s not really watching it, it’s doing a different action. If the license of the movie says “you can’t use this license to train AI, use the other (more expensive) license for such purposes”, then AIs have extra fees to access the content that humans don’t have to pay.

stephen01king@lemmy.zip · 2 years ago

Both humans and AI consume the content, even if they do not do so in the exact same way. I don’t see the need to differentiate that. It’s not like we have any idea of the mechanism by which humans consume a content to make the differentiation in the first place.

Danterious@lemmy.dbzer0.com · 2 years ago

There is actually already a website where people just recreated the bee movie by hand so idk it might actually work as a legal argument.

_Anti _{Commercial-AI} _license _(CC _BY-NC-SA _4.0)

ZILtoid1991@lemmy.world · 2 years ago

I don’t think that’s a feasible dream in our current system. They’ll just lobby for it, some senators will say something akin to “art should have been always a hobby, not a profession”, then make adjustments for the current copyright laws so that they can be copyrighted.

NeoNachtwaechter@lemmy.world · 2 years ago

learning should be a basic human right!

Education is a basic human right (except maybe in Usa, then it should be one there)

calcopiritus@lemmy.world · 2 years ago

Yeah. A human right.

FatCat@lemmy.world · 2 years ago

I am thrilled to see the output you get!

TriflingToad@lemmy.world · 2 years ago

I don’t think LLMs should be taken down, it would be impossible for that to happen. I do, however think it should be forced into open source.

Camzing@lemmy.world · 2 years ago

No but you would definitely design a car based on other designs made before.

General_Effort@lemmy.world · 2 years ago

Let’s engage in a little fantasy. Someone invents a magic machine that is able to duplicate apartments, condos, houses, … You want to live in New York? You can copy yourself a penthouse overlooking the Central Park for just a few cents. It’s magic. You don’t need space. It’s all in a pocket dimension like the Tardis or whatever. Awesome, right? Of course, not everyone would like that. The owner of that penthouse, for one. Their multi-million dollar investment is suddenly almost worthless. They would certainly demand that you must not copy their property without consent. And so would a lot of people. And what about the poor construction workers, ask the owners of constructions companies? And who will pay to have any new house built?

So in this fantasy story, the government goes and bans the magic copy machine. Taxes are raised to create a big new police bureau to monitor the country and to make sure that no one use such a machine without a license.

That’s turned from magical wish fulfillment into a dystopian story. A society that rejects living in a rent-free wonderland but instead chooses to make itself poor. People work to ensure poverty, not to create wealth.

You get that I’m talking about data, information, knowledge. The first magic machine was the printing press. Now we have computers and the Internet.

I’m not talking about a utopian vision here. Facts, scientific theories, mathematical theorems, … All such is free for all. Inventors can get patents, but only for 20 years and only if they publish them. They can keep their invention secret and take their chances. But if they want a government enforced monopoly, they must publish their inventions so that others may learn from it.

In the US, that’s how the Constitution demands it. The copyright clause: [The United States Congress shall have power] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

Cutting down on Fair Use makes everyone poorer and only a very few, very rich people richer. Have you ever thought about where the money goes if AI training requires a license?

For example, to Reddit, because Reddit has rights to all those posts. So do Facebook and Xitter. Of course, there’s also old money, like the NYT or Getty. The NYT has the rights to all their old issue about a century back. If AI training requires a license, they can sell all their old newspapers again. That’s pure profit. Do you think they will their employees raises out of the pure goodness of their heart if they win their lawsuits? They have no legal or economics reason to do so. The belief that this would happen is trickle-down economics.

Kühlschrank@lemmy.world · 2 years ago

I thought the larger point was that they’re using plenty of sources that do not lie in the public domain. Like if I download a textbook to read for a class instead of buying it - I could be proscecuted for stealing. And they’ve downloaded and read millions of books without paying for them.

FatCat@lemmy.world · 2 years ago

And they've downloaded and read millions of books without paying for them.

Do you have a source on that?

Katzastrophe@feddit.org · 2 years ago

Most AI models used Books3 as part of their dataset which is a collection of pirated books. Here are a few articles talking about it:

https://www.theverge.com/2024/8/20/24224450/anthropic-copyright-lawsuit-pirated-books-ai

https://www.abc.net.au/news/2023-09-29/australian-authors-copyright-books3-generative-i-chatgpt/102914538

https://www.theatlantic.com/technology/archive/2023/08/books3-ai-meta-llama-pirated-books/675063/

FatCat@lemmy.world · 2 years ago

Thank you

Shanedino@lemmy.world · 2 years ago

Maybe if you would pay for training data they would let you use copyright data or something?

T156@lemmy.world · 2 years ago

Had the company paid for the training data and/or left it as voluntary, there would be less of a problem with it to begin with.

Part of the problem is that they didn’t, but are still using it for commercial purposes.

andrew_bidlaw@sh.itjust.works · 2 years ago

Their business strategy is built on top of assumption they won’t. They don’t want this door opened at all. It was a great deal for Google to buy Reddit’s data for some $mil., because it is a huge collection behind one entity. Now imagine communicating to each individual site owner whose resources they scrapped.

If that could’ve been how it started, the development of these AI tools could be much slower because of (1) data being added to the bunch only after an agreement, (2) more expenses meaning less money for hardware expansion and (3) investors and companies being less hyped up about that thing because it doesn’t grow like a mushroom cloud while following legal procedures. Also, (4) the ability to investigate and collect a public list of what sites they have agreement with is pretty damning making it’s own news stories and conflicts.

kibiz0r@midwest.social · 2 years ago

Not even stealing cheese to run a sandwich shop.

Stealing cheese to melt it all together and run a cheese shop that undercuts the original cheese shops they stole from.

TheKMAP@lemmynsfw.com · 2 years ago

Whatever happened to copying isn’t stealing?

I think the crux of the conversation is whether or not the world is better with ChatGPT. I say yes. We can tackle the disinformation in another effort.

calcopiritus@lemmy.world · 2 years ago

When you copy to consume yourself it’s way different than when you copy to sell the copy for a lower price.

TheKMAP@lemmynsfw.com · 2 years ago

They’re not selling the copy, bruh. They’re selling a technology that very few understand. Smart people pretend they get it, but they don’t. That’s how rare the math is.

calcopiritus@lemmy.world · 2 years ago

So because you don’t understand it, everything it does should be legal?

It’s not rare maths. There are trns of thousands of AI experts. And most CS graduates (millions) have a good understanding on how they work, just not the specifics of the maths.

Yeah, they’re not selling a copy, they are just selling a subscription to a copying machine loaded with the information needed to make a copy. Totally different.

I should start a business of printers and attach a USB with the PNG of a dollar bill. And of course my printers won’t have any government mandated firmware that disables printing fake money.

I’m not printing fake money! It’s my clients! Totally legal.

gcheliotis@lemmy.world · 2 years ago

Though I am not a lawyer by training, I have been involved in such debates personally and professionally for many years. This post is unfortunately misguided. Copyright law makes concessions for education and creativity, including criticism and satire, because we recognize the value of such activities for human development. Debates over the excesses of copyright in the digital age were specifically about humans finding the application of copyright to the internet and all things digital too restrictive for their educational, creative, and yes, also their entertainment needs. So any anti-copyright arguments back then were in the spirit specifically protecting the average person and public-serving non-profit institutions, such as digital archives and libraries, from big copyright owners who would sue and lobby for total control over every file in their catalogue, sometimes in the process severely limiting human potential.

AI’s ingesting of text and other formats is “learning” in name only, a term borrowed by computer scientists to describe a purely computational process. It does not hold the same value socially or morally as the learning that humans require to function and progress individually and socially.

AI is not a person (unless we get definitive proof of a conscious AI, or are willing to grant every implementation of a statistical model personhood). Also AI it is not vital to human development and as such one could argue does not need special protections or special treatment to flourish. AI is a product, even more clearly so when it is proprietary and sold as a service.

Unlike past debates over copyright, this is not about protecting the little guy or organizations with a social mission from big corporate interests. It is the opposite. It is about big corporate interests turning human knowledge and creativity into a product they can then use to sell services to - and often to replace in their jobs - the very humans whose content they have ingested.

See, the tables are now turned and it is time to realize that copyright law, for all its faults, has never been only or primarily about protecting large copyright holders. It is also about protecting your average Joe from unauthorized uses of their work. More specifically uses that may cause damage, to the copyright owner or society at large. While a very imperfect mechanism, it is there for a reason, and its application need not be the end of AI. There’s a mechanism for individual copyright owners to grant rights to specific uses: it’s called licensing and should be mandatory in my view for the development of proprietary LLMs at least.

TL;DR: AI is not human, it is a product, one that may augment some tasks productively, but is also often aimed at replacing humans in their jobs - this makes all the difference in how we should balance rights and protections by law.

31337@sh.itjust.works · 2 years ago

AI are people, my friend. /s

But, really, I think people should be able to run algorithms on whatever data they want. It’s whether the output is sufficiently different or “transformative” that matters (and other laws like using people’s likeness). Otherwise, I think the laws will get complex and nonsensical once you start adding special cases for “AI.” And I’d bet if new laws are written, they’d be written by lobbiests to further erode the threat of competition (from free software, for instance).

Michal@programming.dev · 2 years ago

What do you think “ingesting” means if not learning?

Bear in mind that training AI does not involve copying content into its database, so copyright is not an issue. AI is simply predicting the next token /word based on statistics.

You can train AI in a book and it will give you information from the book - information is not copyrightable. You can read a book a talk about its contents on TV - not illegal if you’re a human, should it be illegal if you’re a machine?

There may be moral issues on training on someone’s hard gathered knowledge, but there is no legislature against it. Reading books and using that knowledge to provide information is legal. If you try to outlaw Automating this process by computers, there will be side effects such as search engines will no longer be able to index data.

Eccitaze@yiffit.net · 2 years ago

Bear in mind that training AI does not involve copying content into its database, so copyright is not an issue.

Wrong. The infringement is in obtaining the data and presenting it to the AI model during the training process. It makes no difference that the original work is not retained in the model’s weights afterwards.

You can train AI in a book and it will give you information from the book - information is not copyrightable. You can read a book a talk about its contents on TV - not illegal if you’re a human, should it be illegal if you’re a machine?

Yes, because copyright law is intended to benefit human creativity.

If you try to outlaw Automating this process by computers, there will be side effects such as search engines will no longer be able to index data.

Wrong. Search engines retain a minimal amount of the indexed website’s data, and the purpose of the search engine is to generate traffic to the website, providing benefit for both the engine and the website (increased visibility, the opportunity to show ads to make money). Banning the use of copyrighted content for AI training (which uses the entire copyrighted work and whose purpose is to replace the organizations whose work is being used) will have no effect.

Michal@programming.dev · 2 years ago

What do you mean that the search engines contain minimal amount of site’s data? Obviously it needs to index all contents to make it searchable. If you search for keywords within an article, you can find the article, therefore all of it needs to be indexed.

Indexing is nothing more than “presenting data to the algorithm” so it’d be against the law to index a site under your proposed legislation.

Wrong. The infringement is in obtaining the data and presenting it to the AI model during the training process. It makes no difference that the original work is not retained in the model’s weights afterwards.

This is an interesting take, I’d be inclined to agree, but you’re still facing the problem of how to distinguish training AI from indexing for search purposes. I’m afraid you can’t have it both ways.

gap_betweenus@lemmy.world · 2 years ago

Copyright laws protects the ability of copyright holder to make money. The laws were created before AI and now obviously have to be adapted to new technology (like you didn’t really need copyright before the invention of printing). How exactly AI will be regulated is in the end up to society to decide, which most likely will come down who has the better lobby.

spacesatan@lazysoci.al · 2 years ago

I’m I the only person that remembers that it was “you wouldn’t steal a car” or has everyone just decided to pretend it was “you wouldn’t download a car” because that’s easier to dunk on.

✺roguetrick✺@lemmy.world · 2 years ago

You wouldn’t shoot a policeman and then steal his helmet.

C126@sh.itjust.works · 2 years ago

These anti piracy commercials have gotten really mean.

Cornelius_Wangenheim@lemmy.world · edit-2 2 years ago

People remember the parody, which is usually modified to be more recognizable. Like Darth Vader never said “Luke, I am your father”; in the movie it’s actually “No, I am your father”.

ShittyBeatlesFCPres@lemmy.world · 2 years ago

Maybe add a spoiler alert next time. Jeez.

Eccitaze@yiffit.net · 2 years ago

Spoiler alert, but Rosebud was his sled all along.

JasonDJ@lemmy.zip · edit-2 2 years ago

I’m pretty sure it’s either Mandela Effect or a massive gaslighting conspiracy. Though I guess that’s true for everything that’s collectively misremembered.

renrenPDX@lemmy.world · 2 years ago

Then OpenAI should pay for a copy, like we do.

mightyfoolish@lemmy.world · 2 years ago

Is their an official statement if OpenAI pays for at least one copy of whatever they throw into the bots?

daniskarma@lemmy.dbzer0.com · edit-2 2 years ago

Counteroffer. We eliminate copyright laws all together. For anyone and everyone.

Let move to a system in which we found the projects before their release. And once released they are available to everyone for free.

Also let’s make a system where everyone can work a basic work like 20-30 hours a week and get a living wage and the rest of the time we can just produce art of any kind of thing for free to anyone and we’ll already had our needs covered.

jasonkozdra@lemmy.world · 2 years ago

And free cotton candy and rainbows for everybody!

Loki@discuss.tchncs.de · 2 years ago

Even if you come to the conclusion that these models should be allowed to “learn” from copyrighted material, the issue is that they can and will reproduce copyrighted material.

They might not recreate a picture of Mickey Mouse that exists already, but they will draw a picture of Mickey Mouse. Just like I could, except I’m aware that I can’t monetize it in any way. Well, new Mickey Mouse.

ClamDrinker@lemmy.world · 2 years ago

This is an issue for the AI user though. And I do agree that needs to be more conscious in people’s minds. But I think time will change that. Perhaps when the photo camera came out there were some shmucks that took pictures of people’s artworks and claimed it as their own because the novelty of the technology allowed that for a bit, but eventually those people are properly differentiated from people properly using it.

Roflmasterbigpimp@lemmy.world · 2 years ago

Okay that’s just stupid. I’m really fond of AI but that’s just common Greed.

“Free the Serfs?! We can’t survive without their labor!!” “Stop Child labour?! We can’t survive without them!” “40 Hour Work Week?! We can’t survive without their 16 Hour work Days!”

If you can’t make profit yet, then fucking stop.

Nimo@lemmy.world · 2 years ago

I hate to say this but “let the market decide” if Ai is something the consumer wants/needs they’ll pay for it otherwise let it die.