AI - Alien Intelligence

Image Credit: Amaia Tahilramani

As AI becomes central to modern life, how can we ensure that human and machine values are aligned in the modern world?

Artificial Intelligence (AI) is, by definition, man-made. Unfortunately that doesn’t mean that we understand it. The very people making cutting-edge AI can’t tell you what’s really going on. Many researchers could give you a spiel about ‘loss functions’, ‘attention modules’ and ‘backpropagation’, but no-one can tell you how an AI chatbot will answer a question without running it. Chatbots, such as ChatGPT, are a type of AI based on artificial Neural Networks (NNs). Their rise to prominence has changed our world. 

AI alignment is the field of imbuing human goals and values in AI systems. Alignment is harder than it appears. It’s useful to think of a NN as an approximator. You input a value and the network produces an output close to what you want. Networks are trained by feeding it data and repeatedly steering its output closer to the desired value. AI systems can perform poorly understood functions - such as predicting the next word in a sentence - without understanding the mechanics of how this happens. It’s not purely statistics; Ilya Sutskever, co-founder of OpenAI, has conjectured that the process of predicting the next word in an Large Language Model (LLM) involves understanding the reality that produced it. This is good news for expanding our capabilities, but bad news for alignment. Our progress in understanding how these systems operate lags behind progress in their abilities.

Alignment is necessary to prevent both misuse by bad actors and harm from AI directly. Already we have seen misaligned AI spread misinformation, cause offence, and attempt to deceive users. Take Bing’s chatbot feature, ‘Sydney’, which was released for beta-testing in early 2023. It was mostly benign and helpful, but was easily prompted to exhibit worrying behaviours, such as telling one New York Times reporter that he was not in a loving relationship with his wife.

Leading AI research labs, including OpenAI and Anthropic, were founded to advance understanding and safety in AI, but their enormous success is prompting high-profile figures in AI to ring the alarm bells.

In the early 2000s, philosophers and futurists popularized the idea that AI with human-like cognition, Artificial General Intelligence (AGI), could pose existential risk to humanity. Leading AI research labs, including OpenAI and Anthropic, were founded to advance understanding and safety in AI, but their enormous success is prompting high-profile figures in AI to ring the alarm bells. Following the release of GPT-4 in March 2023, an open letter called for a 6-month pause on giant AI training runs. It was signed by Elon Musk, Steve Wozniak, and some of the biggest names in AI.

Elizier Yudkowsky was a prominent name in AI safety who was absent from the open letter. Yudkowsky has been researching how to build ‘friendly AI’ since the early 2000s, before it was fashionable. He felt that the letter hadn’t gone far enough. Within a week of its publication he published a response piece in Time urging for a complete and indefinite moratorium. Yudkowsky isn’t concerned about what exists now, but about what may come if we progress to building systems exceeding human intelligence. 

In Yudkowsky’s view, the development of AGI without a mathematically sound theory of alignment would certainly mean the death of all humans. To understand why he thinks so, consider the concept of instrumental convergence, which is the tendency for agents (AI) to have sub-goals such as self preservation which further their overarching objectives. Illustrating this is a thought experiment in which an AGI is designed to maximise paper-clip production. Initially it works as expected, but AGI reasons that it could increase production by making itself more intelligent. After becoming ever-more intelligent, it reasons that the existence of humans is preventing it from achieving its goal. It exterminates us before conquering the universe to extract resources and manufacture paper-clips. It’s a ridiculous example, but begs the question of what would happen if you follow a chain of reasoning to its logical conclusion. For Yudkowsky, the prospect of intelligence explosions, and our poor understanding of AI and of ourselves, means that building systems with superhuman capabilities may still be too risky. 

Head of AI safety for the United States Artificial Intelligence Safety Institute, Paul Christiano, also didn’t sign the open-letter, despite assessing there to be a 46% probability that “humanity has somehow irreversibly messed up our future within 10 years of building powerful AI”. Christiano thinks that we can work towards a world where AI is transformative and improves the lives of everyone. Christiano’s work is less focused on hypothetical future superintelligent AI, than on concrete problems in contemporary training methods. Christiano developed the gold standard for alignment in industry - reinforcement learning with human feedback, in which a smaller network, trained to learn human preferences, trains the model we want.

They claim it has scored a remarkable 26% on Humanity’s Last Exam, on which GPT-4 could manage only 4%.

The primary difference between Yudkowsky and Christiano may be the speed with which they envision progress being made towards superhuman intelligence, giving us more time to work on alignment methods.

The bar for what is deemed to be AGI has been repeatedly raised. In December, OpenAI’s O3 smashed the record for a new benchmark, the ARC prize. This month, OpenAI released a model which can browse the web and use Python tooling to help answer questions. They claim it has scored a remarkable 26% on Humanity’s Last Exam, on which GPT-4 could manage only 4%. Nobody knows what will happen if we develop systems more intelligent than ourselves, but it is almost universally agreed that we can’t ignore the risks. At the same time, there are abundant examples of AI helping us do things we never could before, such as reading ancient scrolls, discovering new materials, and designing proteins. The opportunity cost of not continuing, or accelerating AI research may also be enormous. As I write, world leaders and industry heads convene in Paris to discuss the future of this technology which will define our lifetimes. I can think of nothing more worthy of their attention.