Researchers claim GPT-4 passed the Turing test

vegeta@lemmy.world · 1 year ago

Researchers claim GPT-4 passed the Turing test

tourist@lemmy.world · 1 year ago

The participants judged GPT-4 to be human a shocking 54 percent of the time.

ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time

Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

Try talking to the thing: https://web.njit.edu/~ronkowit/eliza.html

I refuse to believe that 22% didn’t misunderstand the task or something.

catastrophicblues@lemmy.ca · 1 year ago

> Hello, I am Eliza. I'll be your therapist today. 
* Hello! How are you?
> Would you prefer if I were not ? 
* Not what?
> Have you asked such questions before? 
* No
> You are being a bit negative and that won't help us move forward. 
* What are you smoking?
> Perhaps in your fantasies I am smoking?

Yeah, it took me one message lol

webghost0101@sopuli.xyz · edit-2 1 year ago

The public versions of the ais used in Turing tests usually have less computing power. The test itself is often also highly specific in what and how questions can be asked.

This hardly news because models have passed the test before and as a result the test is made more difficult. It says nothing about intelligence and only about the ability to convincingly simulate a human conversation.

HumaShah@mastodon.social · 1 year ago

@tourist @vegeta

In the 1970s, psychiatrists couldn’t distinguish between PARRY, a chatbot simulating paranoia, and a human patient suffering from schizophrenia. So this ‘bot convinces judges that it’s a human’ phenomenon is not new and tells us more about how humans think.
#AI #DeceptionDetection #Chatbots #TuringTest #LLMs #GPT4

Downcount@lemmy.world · 1 year ago

Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

I did some stuff with Eliza back then. One time I set up an Eliza database full of insults and hooked it up to my AIM account.

It went so well, I had to apologize to a lot of people who thought I was drunken or went crazy.

Eliza wasn’t thaaaaat bad.

CaptainBasculin@lemmy.ml · 1 year ago

This is the same bot. There’s no way this passed the test.

.

KISSmyOSFeddit@lemmy.world · 1 year ago

14% of people can’t do anything more complicated than deleting an email on a computer.
26% can’t use a computer at all.

https://www.nngroup.com/articles/computer-skill-levels/

So right off the bat, 40% probably don’t even know what a chatbot is.

technocrit@lemmy.dbzer0.com · 1 year ago

It was a 5 minute test. People probably spent 4 of those minutes typing their questions.

This is pure pseudo-science.

MonkderDritte@feddit.de · 1 year ago

So it’s good enough to fool people into thinking it’s a human?

Cool. Anyway…

lowleveldata@programming.dev · 1 year ago

I feel like the turing test is much harder now because everyone knows about GPT

DudeDudenson@lemmings.world · 1 year ago

I wonder if humans pass the Turing test these days

Nougat@fedia.io · 1 year ago

I don’t.

NeoNachtwaechter@lemmy.world · 1 year ago

Which of the questions did you get wrong? ;-)

Nougat@fedia.io · 1 year ago

That one.

SkyeStarfall@lemmy.blahaj.zone · edit-2 1 year ago

If you read into the study, they also include the pass rates for humans. It’s higher than AIs, but still less than 75%

TheBigBrother@lemmy.world · 1 year ago

Oh no!! the AImageddon it’s closer everyday… Skynet it’s coming for us!!

harrys_balzac@lemmy.dbzer0.com · 1 year ago

Skynet will gets the dumb ones first by getting them put toxic glue on thir pizzas then the arrogant ones will build the Terminators by using reverse psychology.

dustyData@lemmy.world · 1 year ago

Turing test isn’t actually meant to be a scientific or accurate test. It was proposed as a mental exercise to demonstrate a philosophical argument. Mainly the support for machine input-output paradigm and the blackbox construct. It wasn’t meant to say anything about humans either. To make this kind of experiments without any sort of self-awareness is just proof that epistemology is a weak topic in computer science academy.

Specially when, from psychology, we know that there’s so much more complexity riding on such tests. Just to name one example, we know expectations alter perception. A Turing test suffers from a loaded question problem. If you prompt a person telling them they’ll talk with a human, with a computer program or announce before hand they’ll have to decide whether they’re talking with a human or not, and all possible combinations, you’ll get different results each time.

Kogasa@programming.dev · 1 year ago

Your first two paragraphs seem to rail against a philosophical conclusion made by the authors by virtue of carrying out the Turing test. Something like “this is evidence of machine consciousness” for example. I don’t really get the impression that any such claim was made, or that more education in epistemology would have changed anything.

In a world where GPT4 exists, the question of whether one person can be fooled by one chatbot in one conversation is long since uninteresting. The question of whether specific models can achieve statistically significant success is maybe a bit more compelling, not because it’s some kind of breakthrough but because it makes a generalized claim.

Re: your edit, Turing explicitly puts forth the imitation game scenario as a practicable proxy for the question of machine intelligence, “can machines think?”. He directly argues that this scenario is indeed a reasonable proxy for that question. His argument, as he admits, is not a strongly held conviction or rigorous argument, but “recitations tending to produce belief,” insofar as they are hard to rebut, or their rebuttals tend to be flawed. The whole paper was to poke at the apparent differences between (a futuristic) machine intelligence and human intelligence. In this way, the Turing test is indeed a measure of intelligence. It’s not to say that a machine passing the test is somehow in possession of a human-like mind or has reached a significant milestone of intelligence.

https://academic.oup.com/mind/article/LIX/236/433/986238

dustyData@lemmy.world · edit-2 1 year ago

Turing never said anything of the sort, “this is a test for intelligence”. Intelligence and thinking are not the same. Humans have plenty of unintelligent behaviors, that has no bearing on their ability to think. And plenty of animals display intelligent behavior but that is not evidence of their ability to think. Really, if you know nothing about epistemology, just shut up, nobody likes your stupid LLMs and the marketing is tiring already, and the copyright infringement and rampant privacy violations and property theft and insatiable power hunger are not worthy.

Kogasa@programming.dev · 1 year ago

U good?

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆@yiffit.net · 1 year ago

It took them this long?

technocrit@lemmy.dbzer0.com · edit-2 1 year ago

500 people - meaningless sample
5 minutes - meaningless amount of time

yetAnotherUser@lemmy.ca · 1 year ago

Add in a test that wasn’t made to be accurate and was only used to make a point, like what other comments mention

vegeta@lemmy.world · 1 year ago

The Study

https://arxiv.org/html/2405.08007v1

massive_bereavement@fedia.io · 1 year ago

The interrogators seem completely lost and clearly haven’t talk with an NLP chatbot before.

That said, this gives me the feeling that eventually they could use it to run scams (or more effective robocalls).

treefrog@lemm.ee · 1 year ago

I imagine some people already are.

dhork@lemmy.world · 1 year ago

In order for an AI to pass the Turing test, it must be able to talk to someone and fool them into thinking that they are talking to a human.

So, passing the Turing Test either means the AI are getting smarter, or that humans are getting dumber.

pewter@lemmy.world · edit-2 1 year ago

Humans are as smart as they ever were. Tech is getting better. I know someone who was tricked by those deepfake Kelly Clarkson weight loss gummy ads. It looks super fake to me, but it’s good enough to trick some people.

zbyte64@awful.systems · 1 year ago

Detecting an LLM is a skill.

Hackworth@lemmy.world · 1 year ago

Hackworth@lemmy.world · 1 year ago

doodle967@lemdro.id · 1 year ago

The Turing test is about tricking people into believing that LLMs are humans, and given that the public still doesn’t use LLMs, it’s much easier to fool them. Over time, this deception will decrease as people interact with LLMs.

NeoNachtwaechter@lemmy.world · 1 year ago

Turing test? LMAO.

I asked it simply to recommend me a supermarket in our next bigger city here.

It came up with a name and it told a few of it’s qualities. Easy, I thought. Then I found out that the name does not exist. It was all made up.

You could argue that humans lie, too. But only when they have a reason to lie.

Chozo@fedia.io · 1 year ago

The Turing test doesn’t factor for accuracy.

Lmaydev@programming.dev · 1 year ago

That’s not what LLMs are for. That’s like hammering a screw and being irritated it didn’t twist in nicely.

The turing test is designed to see if an AI can pass for human in a conversation.

NeoNachtwaechter@lemmy.world · edit-2 1 year ago

turing test is designed to see if an AI can pass for human in a conversation.

I’m pretty sure that I could ask a human that question in a normal conversation.

The idea of the Turing test was to have a way of telling humans and computers apart. It is NOT meant for putting some kind of ‘certified’ badge on that computer, and …

That’s not what LLMs are for.

…and you can’t cry ‘foul’ if I decide to use a question for which your computer was not programmed :-)

webghost0101@sopuli.xyz · edit-2 1 year ago

In a normal conversation sure.

In this kind Turing tests you may be disqualified as a jury for asking that question.

Good science demands controlled areas and defined goals. Everyone can organize a homebrew touring tests but there also real proper ones with fixed response times, lengths.

Some touring tests may even have a human pick the best of 5 to provide to the jury. There are so many possible variations depending on test criteria.

NeoNachtwaechter@lemmy.world · edit-2 1 year ago

you may be disqualified as a jury for asking that question.

You want to read again about the scientific basics of the Turing test (hint: it is not a tennis match)

webghost0101@sopuli.xyz · 1 year ago

There is no competition in science (or at least there shouldn’t be). You are subjectively disqualified from judging llm’s if you draw your conclusions on an obvious trap which you yourself have stated is beyond the scope of what it was programmed to do.

Lmaydev@programming.dev · 1 year ago

It wasn’t programmed for any questions. It was trained hehe

foggy@lemmy.world · 1 year ago

Meanwhile, me:

(Begin)

[Prints error statement showing how I navigated to a dir, checked to see a files permissions, ran whoami, triggered the error]

Chatgpt4: First, make sure you’ve navigated to the correct directory.

cd /path/to/file

Next, check the permissions of the file

ls -la

Finally, run the command

[exact command I ran to trigger the error]>

Me: stop telling me to do stuff that I have evidently done. My prompt included evidence of me having do e all of that already. How do I handle this error?

(return (begin))

werefreeatlast@lemmy.world · 1 year ago

It does great at Python programming… everything it tries is wrong until I try and I tell tell it to do it again.

A_A@lemmy.world · edit-2 1 year ago

Edit :
oops : you were saying it is like a human since it does errors ? maybe i “wooshed”.

Hi @werefreeatlast,
i had successes asking LLaMA 3 70B with simple specific questions …
Context : i am bad at programming and it help me at least to see how i could use a few function calls in C from Python … or simply drop Python and do it directly in C.
Like you said, i have to re-write & test … but i have a possible path forward. Clearly you know what you do on a computer but i’m not really there yet.

werefreeatlast@lemmy.world · 1 year ago

But people don’t just know code when you ask them. The llms fo because they got trained on that code. It’s robotic in nature, not a natural reaction yet.

harrys_balzac@lemmy.dbzer0.com · 1 year ago

So…GPT-4 is gay? Or are we talking about a different kind of test?