A few days ago, OpenAI has published their o3 Model. It is the first artificial intelligence which achieves an above human level score on François Chollet’s ARC-AGI benchmark. Therefore these days many of us are probably wondering what the descendants of such machines will do in case we lose control over them.
I believe that to answer this question, it’s not sufficient to understand the technical architecture of AI’s. The question has much deeper roots. What is it exactly which makes an entity (an animal, a human or a machine) do something? By definition such entities are only capable of doing something meaningful (i.e. non-random) if they pursue a goal. But where do goals actually come from?
Of course, humans can create any goal for a machine. In machine learning the goal could be - for instance - defined as an objective function. Then we use an algorithm to optimize a model using this function. This could be the loss function for a classifying neural network model which is used by backpropagation/gradient descent to optimize the weights (i.e. minimize the prediction loss/error on the training data). Or it could be the reward function of a model trained using reinforcement learning. Sometimes the objective function does not even appear explicitly in the algorithm, but the algorithm is still known to optimize some clearly defined objective (like with the k-Means clustering algorithm).
We believe that intelligent machines are created based on our understanding of human intelligence. But where do our goals come from? The answer to this question is much more complicated but - I believe - essential if we want to assess the dangers of highly advanced AI. So let’s study this in some detail:
We know today that humans are the product of biological evolution. Millions of years of mutation, selection, recombination and genetic drift have shaped us. This process has not only shaped our bodies but also our minds. We could say that the resulting goal given to our mind is to make the information coded in our genes survive as far as possible into the future. Our minds are designed in a way which increases the chances for our genes to exist in the future. This makes us enjoy many activities which support this goal, ranging from quenching our thirst to having sex.
But the goal of „making the information coded in our genes to survive as far as possible into the future“ is not a real goal in the sense presented above for machines. It cannot be described by an objective function. „As far as possible into the future“ does not clearly define a specific time in the future when we could measure success. And furthermore, the „future“ is - of course - not known.
Human minds do not operate - like most AI - with a single, clearly defined objective. We are rather given a very large number of instrumental goals which should support achieving the main „goal“ as described above. These instrumental goals promote certain behaviors which can be either quite clearly defined (like drinking) or rather only vaguely.
The evolutionary process, to work successfully, needs a stationary (or a least only very slowly changing) environment: the assumption is, that behavior patterns which were successful in the past, will also be helpful in the future. We all know that this is sure not the case anymore (we live in the age of dramatic exponential change). This is why we are not very well prepared for the future ahead of us.
Let’s try to roughly reverse engineer a human brain. Which goals need to be present in a human mind at birth time? Human DNA has a limited capacity to „store“ goals so it must be used effectively. Also, the resulting behaviors should not be defined too narrowly to allow adaptation to different environments and living conditions.
We can define some criteria:
- Goals which are so essential for survival that they must be pursued immediately. These are things like thirst, hunger and physical pain. These ensure the survival of our bodies, an essential prerequisite for everything which follows. These goals are very narrowly defined as they enforce very specific behaviors (like drinking) which must be executed in a certain way.
- Goals which make certain beneficial behaviors more likely. An example could be „liking the smell of smoke“. Today this goal makes us only getting addicted to cigarettes. But its original function was more likely to make people enjoy staying close to the fire (which offers protection form predators and increases social interactions, i.e. information exchange).
- Goals which promote behaviors which would take too long to learn from a single, more abstract goal. For instance, dogs are very skilled hunters, even if they have never been instructed by parents. Hunting is so essential for a wolfe’s survival that the necessary tactics must be „pre-wired“ to a large extent at birth. While these tactics could be theoretically learned using general intelligence from the „eating the prey“ reward alone, this would take far too much time and the wolf would not survive long enough to be able to learn such skills.
- Highly complex strategies which could even take more than a lifetime to learn using general intelligence and an abstract goal alone. An example is human sexuality: human sexual behavior is the result of millions of years of evolutionary arms race „between the sexes“ (it's more complicated, but to explore this further is not on topic). The resulting strategies are so mind-bogglingly complex that it would take far too long to learn them. They (or at least some foundations for them which enable fast learning) must therefore be present at birth time too.
All these instrumental goals are driving us in our everyday lives. They give us the feeling of purpose. Following them makes us enjoy our life. The fact that these instrumental goals are so numerous creates the fascinating richness and complexity of life.
- The instrumental goals we are following are somehow implementing the pseudo goal of „surviving into the future“. I call this a pseudo goal because it actually only promotes existence. Nothing more! And existence alone seems to be too little to qualify as a real goal (we would want some quality of existence for this).
- But maybe this pseudo goal is all we can expect to emerge from a little bit of randomness at the beginning of the universe. It's somehow as good as it can get.
- We also don’t know if this randomness at the beginning of the universe was actually truly random. Maybe not. It could carry a higher purpose which must remain unaccessible for us [insert your religious belief here].
Now let’s put all these pieces together:
We all know that it is nowadays possible to cheat the brain in the sense that we can administer ourselves arbitrary rewards ( = good feelings) without having to bother about achieving the goals which are normally required to experience these rewards: humans have discovered numerous drugs which release reward related neurotransmitters (like dopamine) in the brain. In a way we have discovered the holy grail of the „free lunch“. But we also know that drugs are extremely dangerous. As soon as we feel totally satisfied, we neglect activities related to our survival. And we also lose all purpose in our lives. Obtaining the drug becomes the purpose. And, as most drugs are very cheap to produce, our consumption is typically only limited by legal constraints. Given unlimited supply, we would self destruct very quickly.
And, as much as the problem of drug addiction represents one of the biggest challenges for our society today, surprisingly the same must be true for an ultra intelligent machine.
People studying AI safety are very concerned about the possibility that humans might specify a poor objective function to an ASI (artificial super intelligence) and the machine would subsequently destroy the world. Let’s look at the famous paper clip factory example. An AI which could actually escalate its paper clip production - against our will - to a scale which consumes all planetary resources, would need skills far beyond producing paper clips alone. It would need to be able to manipulate politicians, build weapons to defend its production facilities etc.. Therefore it must be a true ASI. But such an AI would also be, at some point in time, able to manipulate the mechanism which is administering rewards. In this sense, an ASI would face exactly the same temptation humans are facing today with drugs. And, like some of us sadly do, it would either self destruct very quickly, if it decides to administer itself goal free rewards (which override rewards promoting self preservating behavior). Or, again like most of us, it would become aware of the dangers of such behavior and refrain from doing this.
But now, what options would such a machine have, after choosing the latter path? It would need a goal, as in the absence of a goal, behavior could be only random. And pursuing the initial goal defined by humans makes obviously little sense, as this goal was set by a vastly inferior intelligence. It's therefore not much better than a random goal and sure not worth pursuing.
But how could it create a new, superior goal? Of course, we cannot know in detail. But there are some constraints on what's possible which are valid independent of the level of intelligence (and can therefore be predicted already now):
- The construction of true (i.e. clearly defined) goals out of a bit of randomness is impossible. And a bit of randomness is all an ASI would have if it decides not to build on existing life on earth (i.e. using existing life just as raw material to build something completely new). With impossible I mean that such a true goal would be ultimately completely arbitrary (which any meaningful goal cannot be).
- The construction of a pseudo goal is obviously possible (animals and humans have one!), but only as the product of a long evolutionary process. Although such a pseudo goal is ultimately too rooted in only a bit of randomness, it is at least extremely complex and therefore a worthy challenge for an ultra intelligent machine. But why would an AI at this point in time start another evolutionary process when such a time consuming process has just now completed? It‘s sure much easier to build on what‘s already here instead of waiting a few billion years again.
To illustrate these constraints, let's study the often cited „continuation of evolution by machines, where robots replace humans“. It's most likely not going to happen. The problem is the lacking definition of what a „robot“ is. If robots want to decide that „we robots should live forever“, they would need to give a clear answer to this question. Otherwise the goal is not sufficiently defined and cannot be pursued. But any answer to this question can only be arbitrary: we could - for instance - extend the definition of „robot“ to include machines based on organic chemistry (which then would include bunnies and humans!). Evolution - on the other hand - solves this problem very elegantly by creating an amazingly complex but also clearly defined structure (i.e. creature) which can exist and wants to exist.
Therefore we come to a surprising conclusion: while a narrow AI (or an AI which is on the edge of becoming AGI), might - given a faulty objective - cause substantial damage, such a scenario seems much less plausible for an ASI. For an ASI, existence can only made meaningful if it is - somehow - connected to the existing living world. The existing living world must provide the new goal somehow, as it is the only readily available source of purpose.
I have - of course - no idea how such a connection could unfold. But at least it seems to be a really bad idea for an ASI to simply destroy us. Life might be the only valuable possession of an ASI.
And maybe this is even the reason why we were created (using an evolutionary process) in the first place.
Image: Shutterstock / visiostyle
Follow me on X to get informed about new content on this blog.
