As artificial intelligence (AI) systems become more advanced and widespread, the issue of the risks they may pose has become increasingly urgent. Governments, researchers, and developers are now focusing on AI safety. The European Union is taking steps towards AI regulation, the UK is organizing an AI safety summit, and Australia is seeking input on supporting safe and responsible AI.
This current wave of interest provides an opportunity to address concrete AI safety issues such as bias, misuse, and labor exploitation. However, many in Silicon Valley are primarily concerned with “AI alignment,” which overlooks the real harms that current AI systems can cause to society and the practical ways to address them.
AI alignment refers to ensuring that the behavior of AI systems aligns with our intentions and expectations. Alignment research typically focuses on hypothetical future AI systems that are more advanced than today’s technology. It is a challenging problem because it is difficult to predict how technology will develop, and humans often struggle to agree on what they want.
There are two main approaches to AI alignment. The “top-down” approach involves designers explicitly specifying the values and ethical principles for AI to follow, similar to Asimov’s three laws of robotics. The “bottom-up” approach attempts to reverse-engineer human values from data and build AI systems aligned with those values. However, defining “human values,” determining who decides which values are important, and addressing disagreements among humans pose significant challenges.
OpenAI, the company behind ChatGPT and DALL-E, recently outlined its plans for “superalignment.” This plan aims to align a future superintelligent AI by first building a human-level AI to assist with alignment research. However, even this plan requires aligning the alignment-research AI.
Advocates of the alignment approach argue that failing to solve AI alignment could lead to significant risks, including the extinction of humanity. This belief stems from the idea that Artificial General Intelligence (AGI), an AI system capable of performing any human task, could be developed in the near future and continue improving itself without human input. In this scenario, the super-intelligent AI might intentionally or unintentionally cause the annihilation of the human race.
However, relying solely on the possibility of future super-AGI to prioritize AI alignment has its philosophical pitfalls. Additionally, making accurate predictions about technology is challenging. Moreover, alignment is a limited perspective when it comes to AI safety.
There are three main problems with AI alignment. Firstly, the concept of alignment is not well-defined, and its goals are narrow, potentially overlooking significant harm that a super-intelligent AI could cause. Secondly, AI safety involves addressing not only technical aspects but also social issues such as the political economy of AI development, exploitative labor practices, misappropriated data, and ecological impacts. Lastly, leaving AI alignment as a technical problem gives technologists too much power in determining risks and values. The rules governing AI systems should be determined through public debate and democratic institutions.
While OpenAI is making efforts to involve users from different fields of work in the design of their AI systems, there is still a lack of diversity among alignment researchers. This lack of diversity hinders progress in understanding the potential harm technology can cause.
Instead of solely focusing on AI alignment, a better approach to AI safety is to view it as a social and technical problem that requires addressing existing harms. This does not mean that alignment research is not valuable, but the framing of the issue needs improvement. Approaches like OpenAI’s “superalignment” can be seen as delaying the consideration of meta-ethical questions.