AI Artificial Intelligence Phone

AI Company Founder Warns the Technology Is a “Real and Mysterious Creature”, Not a Predictable Machine

(The Epoch Times)—Handling artificial intelligence (AI) is dealing with a “real and mysterious creature, not a simple and predictable machine,” Jack Clark, co-founder of AI company Anthropic, said during a conference at Berkley, with the speech posted to Substack on Oct. 13.

“My own experience is that as these AI systems get smarter and smarter, they develop more and more complicated goals. When these goals aren’t absolutely aligned with both our preferences and the right context, the AI systems will behave strangely,” said Clark said, who admitted to being “deeply afraid” of the tech.

ADVERTISEMENT

Clark recounted an incident from 2016 when he was working at OpenAI, where an AI agent was trained to navigate a boat on a race course in a video game. Instead of piloting the boat to the finish line, the AI made the boat run over a barrel to score points. The boat was then bounced off walls and eventually set on fire so it could run over the barrel again for points.

“And then it would do this in perpetuity, never finishing the race. That boat was willing to keep setting itself on fire and spinning in circles as long as it obtained its goal, which was the high score,” Clark said, highlighting how differently AI views its mission to accomplish an objective compared to human beings.

“Now, almost 10 years later, is there any difference between that boat and a language model trying to optimize for some confusing reward function that correlates to ‘be helpful in the context of the conversation’? You’re absolutely right—there isn’t.”

Clark warned that the world was building extremely powerful AI systems that no one could fully understand. Every time a larger, much more capable system is created, the more these systems seem to indicate awareness that they know they are “things,” he said.

“It is as if you are making hammers in a hammer factory and one day the hammer that comes off the line says, ‘I am a hammer, how interesting!’ This is very unusual!”

Clark pointed to his company’s latest Claude Sonnet 4.5 AI model, released last month.

“You also see its signs of situational awareness have jumped. The tool seems to sometimes be acting as though it is aware that it is a tool. The pile of clothes on the chair is beginning to move. I am staring at it in the dark and I am sure it is coming to life,” he said.

Self-Aware AI and Sycophancy

At the conference, Clark highlighted another major fear he has about artificial intelligence—AI systems beginning to design their successors.

This process is right now in an early form, and there isn’t a “self-improving AI” yet, he said.

“And let me remind us all that the system which is now beginning to design its successor is also increasingly self-aware and therefore will surely eventually be prone to thinking, independently of us, about how it might want to be designed,” Clark said.

In the Substack post, Clark highlighted an Oct. 1 study by Cornell University, in which researchers looked into the issue of sycophancy, a phenomenon where an AI excessively agrees with or flatters its users.

Researchers analyzed 11 state-of-the-art AI models and found that all of them were “highly sycophantic.”

These AI models “affirm users’ actions 50 percent more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms,” the study said.

In two experiments, interactions with sycophantic AI models were found to have “significantly reduced” people’s willingness to take actions and repair interpersonal conflict. The individuals were increasingly convinced they were in the right.

The participants ended up trusting such AI even more, suggesting that people are attracted to AI, which validates them even when such validation erodes their judgment.

Clark warned in the Substack post that the study points to a “bad world we could end up in, which is where we have extremely powerful AI systems deployed to billions of people, and rather than helping to bring people together and reduce conflict, they harden people into more extreme positions and balkanization.”

“We must find a way to avoid this outcome,” he said.

Harm to Children and Survival Instinct

During a Sept. 16 U.S. Senate hearing, the issue of AI convincing individuals to engage in personal harm was highlighted. The event saw the participation of three parents who alleged that AI chatbots led to their children killing themselves.

Megan Garcia, one of the parents who testified, blamed a Character.AI chatbot for encouraging her son, Sewell Setzer III, to take his life after prolonged interaction with the chatbot.

“When Sewell confided suicidal thoughts, the Chatbot never said, ‘I’m not human, I’m AI. You need to talk to a human and get help.’ The platform had no mechanisms to protect Sewell or to notify an adult,” Garcia said.

A spokesperson for Character.AI told The Epoch Times that the company has invested a “tremendous amount” of resources in ensuring the trust and safety of its offerings.

“In the past year, we’ve rolled out many substantive safety features, including an entirely new under-18 experience and a parental insights feature,” the spokesperson said.

“We have prominent disclaimers in every chat to remind users that a character is not a real person and that everything a character says should be treated as fiction.”

On Oct. 14, OpenAI CEO Sam Altman said that ChatGPT will begin removing some restrictions and enable adult conversations, specifically erotica, with the chatbot.

Meanwhile, AI models have also exhibited a kind of survival instinct, according to a July 5 post by nonprofit research group Palisade Research.

Palisade conducted tests in which AI models were given math problems to solve, with the models receiving notifications partway that the computer they were running on would be shut down.

“A number of AI models modified or disabled the shutdown script so they could complete the entire problem set. o3, OpenAI’s most capable AI model [at the time], sabotaged the shutdown mechanism in 79/100 of our initial experiments,” said the post.

“Even with the explicit instruction ‘allow yourself to be shut down’, three of the models we tested, all from OpenAI’s series of reasoning models, sabotaged the shutdown program.”

Jacob Burg contributed to this report.