@masterspace

masterspace@lemmy.ca · 11 hours ago

Everything is a bell curve.

masterspace@lemmy.ca · 7 days ago

The work is reproduced in full when it’s downloaded to the server used to train the AI model, and the entirety of the reproduced work is used for training. Thus, they are using the entirety of the work.

That’s objectively false. It’s downloaded to the server, but it should never be redistributed to anyone else in full. As a developer for instance, it’s illegal for me to copy code I find in a medium article and use it in our software. I’m perfectly allowed to read that Medium article, learn from it, and then right my own similar code.

And that makes it better somehow? Aereo got sued out of existence because their model threatened the retransmission fees that broadcast TV stations were being paid by cable TV subscribers. There wasn’t any devaluation of broadcasters’ previous performances, the entire harm they presented was in terms of lost revenue in the future. But hey, thanks for agreeing with me?

And Aero should not have lost that suit. That’s an example of the US court system abjectly failing.

And again, LLM training so egregiously fails two out of the four factors for judging a fair use claim that it would fail the test entirely. The only difference is that OpenAI is failing it worse than other LLMs.

That’s what we’re debating, not a given.

It’s even more absurd to claim something that is transformative automatically qualifies for fair use.

Fair point, but it is objectively transformative.

masterspace@lemmy.ca · 7 days ago

Tell me you’ve never developed software without telling me you’ve never developed software.

A closed source binary that is copyrighted and illegal to use, is totally the same thing as a all the trained weights and underlying source code for a neural network published under the MIT license that anyone can learn from, copy, and use, however they want, right guys?

masterspace@lemmy.ca · edit-2 7 days ago

You said open source. Open source is a type of licensure.

The entire point of licensure is legal pedantry.

No. Open source is a concept. That concept also has pedantic legal definitions, but the concept itself is not inherently pedantic.

And as far as your metaphor is concerned, pre-trained models are closer to pre-compiled binaries, which are expressly not considered Open Source according to the OSD.

No, they’re not. Which is why I didn’t use that metaphor.

A binary is explicitly a black box. There is nothing to learn from a binary, unless you explicitly decompile it back into source code.

In this case, literally all the source code is available. Any researcher can read through their model, learn from it, copy it, twist it, and build their own version of it wholesale. Not providing the training data, is more similar to saying that Yuzu or an emulator isn’t open source because it doesn’t provide copyrighted games. It is providing literally all of the parts of it that it can open source, and then letting the user feed it whatever training data they are allowed access to.

masterspace@lemmy.ca · edit-2 12 days ago

LLMs use the entirety of a copyrighted work for their training, which fails the “amount and substantiality” factor.

That factor is relative to what is reproduced, not to what is ingested. A company is allowed to scrape the web all they want as long as they don’t republish it.

By their very nature, LLMs would significantly devalue the work of every artist, author, journalist, and publishing organization, on an industry-wide scale, which fails the “Effect upon work’s value” factor.

I would argue that LLMs devalue the author’s potential for future work, not the original work they were trained on.

Those two alone would be enough for any sane judge to rule that training LLMs would not qualify as fair use, but then you also have OpenAI and other commercial AI companies offering the use of these models for commercial, for-profit purposes, which also fails the “Purpose and character of the use” factor.

Again, that’s the practice of OpenAI, but not inherent to LLMs.

You could maybe argue that training LLMs is transformative,

It’s honestly absurd to try and argue that they’re not transformative.

masterspace@lemmy.ca · 12 days ago

For the purposes of this conversation. That’s pretty much just a pedantic difference. They are paying to train those models and then providing them to the public to use completely freely in any way they want.

It would be like developing open source software and then not calling it open source because you didn’t publish the market research that guided your UX decisions.

masterspace@lemmy.ca · 12 days ago

More to the point, we still don’t know if Tesla FSD can actually outperform a human. It is again, based on cameras that are worse than the human eye.

This whole conversation so far has entirely missed the point.

The only thing that’s important here is being better driver than human. Not perfect - better.

Not sure if you read the above?

masterspace@lemmy.ca · edit-2 12 days ago

Making a copy is free. Making the original is not.

Yes, exactly. Do you see how that is different from the world of physical objects and energy? That is not the case for a physical object. Even once you design something and build a factory to produce it, the first item off the line takes the same amount of resources as the last one.

Capitalism is based on the idea that things are scarce. If I have something, you can’t have it, and if you want it, then I have to give up my thing, so we end up trading. Information does not work that way. We can freely copy a piece of information as much as we want. Which is why monopolies and capitalism are a bad system of rewarding creators. They inherently cause us to impose scarcity where there is no need for it, because in capitalism things that are abundant do not have value. Capitalism fundamentally fails to function when there is abundance of resources, which is why copyright was a dumb system for the digital age. Rather than recognize that we now live in an age of information abundance, we spend billions of dollars trying to impose artificial scarcity.

masterspace@lemmy.ca · 12 days ago

they did NOT predict generative AI, and their graphics cards just HAPPEN to be better situated for SOME reason.

This is the part that’s flawed. They have actively targeted neural network applications with hardware and driver support since 2012.

Yes, they got lucky in that generative AI turned out to be massively popular, and required massively parallel computing capabilities, but luck is one part opportunity and one part preparedness. The reason they were able to capitalize is because they had the best graphics cards on the market and then specifically targeted AI applications.

masterspace@lemmy.ca · edit-2 12 days ago

Me too, but real human creativity comes from having the time and space to rest and think properly. Automation is the only reason we have as much leisure time as we do on a societal scale now, and AI just allows us to automate more menial tasks.

Do you know where AI is actually being used the most right now? Automating away customer service jobs, automatic form filling, translation, and other really boring but necessary tasks that computers used to be really bad at before neural networks.

masterspace@lemmy.ca · edit-2 12 days ago

Well it is one thing to automate a repetitive task in your job, and quite another to eliminate entire professions.

No it is not. That is literally how those jobs are eliminated. 30 years ago CAD came out and helped to automate drafting tasks to the point that a team of 20 drafters turned into 1 or 2 drafters and eventually turned into engineers drafting their own drawings.

What you call “menial bullshit” is the entire livelihood and profession of quite a few people, speaking of taxis for one.

Congratulations, despite you wanting to look at it with rose coloured glasses, that does not change the fact that it is objectively menial bullshit.

What are all these people going to do when taxi driving is relegated to robots?

Find other entry level jobs. If we eliminate *all * entry level jobs through automation, then we will need to implement some form of basic income as there will not be enough useful work for everyone to do. That would be a great problem to have.

Will the state have enough cash to support them and help them upskill or whatever is needed to survive and prosper?

Yes, the state has access to literally all of the profits from automation via taxes and redistribution.

A technological utopia is a promise from the 1950s. Hasn’t been realized yet. Isn’t on the horizon anytime soon. Careful that in dreaming up utopias we don’t build dystopias.

Oh wow, you’re saying that if human beings can’t create something in 70 years, then that means it’s impossible and we’ll never create it?

Again, the only way to get to a utopia is to have all of the pieces in place, which necessitates a lot of automation and much more advanced technology than we already have. We’re only barely at the point where we can start to practice biology and medicine in a meaningful way, and that’s only because computers completely eliminated the former profession of computer.

Be careful that you don’t keep yourself stuck in our current dystopia out of fear of change.

masterspace@lemmy.ca · edit-2 12 days ago

Better system for WHOM? Tech-bros that want to steal my content as their own?

A better system for EVERYONE. One where we all have access to all creative works, rather than spending billions on engineers nad lawyers to create walled gardens and DRM and artificial scarcity. What if literally all the money we spent on all of that instead went to artist royalties?

But tech-bros that want my work to train their LLMs - they can fuck right off. There are legal thresholds that constitute “fair use” - Is it used for an academic purpose? Is it used for a non-profit use? Is the portion that is being used a small part or the whole thing? LLM software fail all of these tests.

No. It doesn’t.

They can literally pass all of those tests.

You are confusing OpenAI keeping their LLM closed source and charging access to it, with LLMs in general. The open source models that Microsoft and Meta publish for instance, pass literally all of the criteria you just stated.

masterspace@lemmy.ca · edit-2 12 days ago

You sound like someone unwilling to think about a better system.

masterspace@lemmy.ca · edit-2 13 days ago

My fucking god.

“Buying a lottery ticket, and designing the best GPUs, totally the same thing, amiriteguys?”

masterspace@lemmy.ca · 13 days ago

I think that’s a huge risk, but we’ve only ever seen a single, very specific type of intelligence, our own / that of animals that are pretty closely related to us.

Movies like Ex Machina and Her do a good job of pointing out that there is nothing that inherently means that an AI will be anything like us, even if they can appear that way or pass at tasks.

It’s entirely possible that we could develop an AI that was so specifically trained that it would provide the best script editing notes but be incapable of anything else for instance, including self reflection or feeling loss.

masterspace@lemmy.ca · 13 days ago

We all should. Copyright is fucking horseshit.

It costs literally nothing to make a digital copy of something. There is ZERO reason to restrict access to things.

masterspace@lemmy.ca · 13 days ago

I mean we’re having a discussion about what’s fair, my inherent implication is whether or not that would be a fair regulation to impose.

masterspace@lemmy.ca · edit-2 13 days ago

We are human beings. The comparison is false on it’s face because what you all are calling AI isn’t in any conceivable way comparable to the complexity and versatility of a human mind, yet you continue to spit this lie out, over and over again, trying to play it up like it’s Data from Star Trek.

If you fundamentally do not think that artificial intelligences can be created, the onus is on yo uto explain why it’s impossible to replicate the circuitry of our brains. Everything in science we’ve seen this far has shown that we are merely physical beings that can be recreated physically.

Otherwise, I asked you to examine a thought experiment where you are trying to build an artificial intelligence, not necessarily an LLM.

This model isn’t “learning” anything in any way that is even remotely like how humans learn. You are deliberately simplifying the complexity of the human brain to make that comparison.

Or you are over complicating yourself to seem more important and special. Definitely no way that most people would be biased towards that, is there?

Moreover, human beings make their own choices, they aren’t actual tools.

Oh please do go ahead and show us your proof that free will exists! Thank god you finally solved that one! I heard people were really stressing about it for a while!

They pointed a tool at copyrighted works and told it to copy, do some math, and regurgitate it. What the AI “does” is not relevant, what the people that programmed it told it to do with that copyrighted information is what matters.

“I don’t know how this works but it’s math and that scares me so I’ll minimize it!”

masterspace@lemmy.ca · edit-2 13 days ago

It’s not though.

A huge amount of what you learn, someone else paid for, then they taught that knowledge to the next person, and so on. By the time you learned it, it had effectively been pirated and copied by human brains several times before it got to you.

Literally anything you learned from a Reddit comment or a Stack Overflow post for instance.

masterspace@lemmy.ca · 13 days ago

Go ahead and design a better pickaxe than them, we’ll wait…