Many influential people have expressed serious concern regarding artificial intelligence lately. I’m not influential, but for me too, the introduction of ChatGPT was a shock. Somehow I knew that this must happen soon: In 2021 I wrote a blog post containing the statistics published by OpenAI about AI progress. And it was clear already in 2018 that the field was progressing exponentially with a very short doubling time. But still I was emotionally not prepared for the sudden appearance of this kind of SciFi technology.
As a result, my interest in AI safety multiplied recently. During the past 12 weeks I have participated in a course about AGI safety organized by BlueDot Impact (a project funded by Open Philantropy). The course was of very high quality (thank you!) and I have met many interesting people from all over the world. I have learned about the fundamental problems of creating powerful and safe AI and how they can be (maybe) controlled or mitigated. It has become clear to me that AGI safety is an extremely challenging problem which needs to be investigated intensively right now.
But I also realized that we probably need a broader view at the problem. If powerful AI comes into the wrong hands (which will happen rather sooner than later) it could already do great harm today. We don‘t need artificial general intelligence or even superintelligence to end up in a complete disaster soon. The greatest dangers in the near future come from humans using AI for the wrong purpose and not from out of control paperclip factories. What if we knew exactly what to wish from a godlike AI-„genie in a bottle“ (i.e. we could solve the so called „outer alignment problem“) but could not do so because we are forced to compete with other humans we perceive as evil? If we have to use increasingly powerful AIs to maintain dominance over other groups of people we fear?
Most people believe that human nature cannot be changed. That there will be always some (other than me!) evil people which need to be controlled by society. I believe that this view of ourselves is wrong and will also become soon very dangerous. I am quite sure that evil is present in all humans (which includes unfortunately also me). Of course not all humans (are forced to) express it to an equal degree, but they are all fully capable to do evil in certain situations nonetheless. We must learn to understand evil not so much as the product of specific childhood experiences (even if this view is also valid and useful in some contexts) but rather as the result of evolutionary game theory doing its job over billions of years.
What is the advantage of looking at the problem of evil from this angle? I believe that a solution is possible if we understand human nature better. If we understand the fundamental processes which lead to human non-cooperative behavior, we might find an antidote for it. Surprisingly, research results from the field of AI safety could be very useful in this endeavour. We could simulate the evolution of an artificial mind (implemented with an artificial neural network) in simple, well defined virtual environments and study its behavior. We could use techniques which were invented to analyze artificial intelligences (e.g. techniques to study the role of individual neurons in neural networks) to study such simulated minds. From this we could learn which properties/features of the experimental setup lead to which kind of behavior (desirable or not) and why.
The problem of AGI safety is in fact an aspect of a much larger problem: in the near future we will have a highly complex ecosystem of artificial and natural agents which interact. We need to understand all these different agents in the context of this diverse zoo.
I therefore want to suggest a new field of research: the study of human minds using the simulated evolution of artificial neural networks. Maybe there are already people working on this somewhere (if you know about such activities, please let me know!).
But how could this help to improve our society? I will try to convey an intuition for this (I do not have much more myself at the moment) in the following.
In my capstone project for the BlueDot Impact „AGI safety fundamentals“ course I created a simple python framework for research on the prisoner‘s dilemma game (a classic problem from decision theory / game theory). I do not want to waste time presenting a detailed discussion of the problem here, as you can find a good one on Wikipedia. Many real world situations have a similar structure. For instance, the problem of climate change is ultimately equivalent to the prisoner‘s dilemma.
The prisoner‘s dilemma is a game with two „players“ where cooperation of both players would result in a better outcome for both players but defection (i.e. not cooperating) is actually the preferable strategy for a rational agent. The result is therefore suboptimal for both players. If the game is played several times („iterative prisoner‘s dilemma“) it makes sense for the agents to cooperate under certain circumstances.
Now consider the following argument: during the past hundred years the world has changed dramatically: while in the beginning we lived in villages where our peers where mostly family members and other well known villagers, we today operate in a highly interconnected global village where we have short contacts with a large number of mostly unknown or anonymous people („Twitter“).
This means that we today much more often play „single play“ games (or „iterative games with only a few itertations“) compared to hundred years ago (and millions of years before!), where we played mostly „iterative games with many iterations“. It is not surprising that non cooperative behavior (i.e. exploitation of others) has become a dominant strategy. Of course this development cannot make us happy: according to the conditions under which our mind has evolved we prefer to be predominantly cooperative. Evil was once reserved for very rare and special situations, mostly when dealing with members of other competing groups.
This would mean that we have created a society which makes us unhappy (and is threatening nature) because we don‘t understand ourselves. I‘m therefore quite sure that an increased consciousness could help us to enjoy the huge benefits of mutual cooperation.
In the following months I will publish some results in this field on this blog (from various researchers and some from my own experiments for which I will also publish the code).
Image: Gino Crescoli (from Pixabay)
Follow me on X to get informed about new content on this blog.