Why the creation of human-like A.I. might require godlike powers

How far away is the creation of human-like A.I.? With “human-like A.I.” I mean artificial intelligence which does not only possess human level intelligence, but also shows human like behavior in a wide range of situations. This could be the ability to show emotions, facial expressions, making jokes etc., all behaviors which would make it difficult or impossible to tell such an A.I. apart from a real human. In SciFi, human-like humanoid robots are a popular theme and some companies are already promising to create them in the near future (e.g. in the form of intelligent “girlfriend” robots/dolls).

In the following, I will try to explore, what it would take to actually build such a machine based on the current state of science and technology. Furthermore I will try to clarify the difference between human level A.I. and human-like A.I.. Please note that this text is speculative as there are many things we don‘t fully understand about our brain yet and I’m not an expert in A.I., biology and neuroscience. Please read this for entertainment and inspiration.

Why do wolves prefer to hunt deer calves instead of adult deer? Probably for the same reason we humans prefer calf meat over cow meat: calf meat is more tender and because of this, tastes better. But why to prefer the meat of the young animal, why this preference for the tender? The nutritional values of cow and calf meat are the same. In fact the brain seems to implement a strategy by influencing our taste preference: It takes less energy and is less dangerous to hunt down a young calf compared to an adult animal (and it is also more likely to be free from parasites). The strategy to prefer young animals over adults when hunting is therefore implemented subconsciously by the reward we get when eating the prey. This makes sense, because wolves don’t know consciously about parasites nor do they have access to long term statistics about hunting success.

In machine reinforcement learning, we typically are dealing with a system composed of three components: the agent, the environment and a reward function. The agent observes the environment, takes actions and gets rewarded (or punished by negative reward) depending on the action it takes. The agent learns to maximize the cumulative reward over time.

It is often difficult to specify a good reward function. The easiest way is, to give the agent a reward only when the task is completed successfully (e.g. the game is won). But this makes it hard for the agent to learn, as it has to try many actions before it gets a reward and it is difficult to figure out which decisions have led it to getting the reward (the so called „credit assignment problem“). Hence often it is necessary to give the agent rewards for achieving intermediate goals (e.g. scoring points for making some progress towards winning the game).

Many researchers believe that goal oriented learning in biological systems happens in a similar way as it is studied in machine reinforcement learning. Specifically the behavior of animals also seems to be determined by a reward function (the wolf example above).

But what is the ultimate goal for an animal or human? We might say that it is successful reproduction (the orgasm is the strongest reward we can experience). But this is not true. Life does not end after reproduction. Life — I hope — never ends. Humans not only try to reproduce, but try to produce successful offspring (i.e. which again is hoped to produce successful offspring. Parents hope to become grandparents one day). This happens for instance by providing children a good education and supporting them as long as possible. When we are planning our actions, we are unconsciously looking far into an infinite future.

Therefore the reward function in biological systems rewards only the achievement of intermediate goals like finding food, mating, raising offspring. There is simply no last, final goal to achieve which can be easily described (what is the goal of life itself?). And if rewards are given for the achievement of many intermediate goals, there is a large number of conflicts to resolve (e.g. when to prefer food over sex?).

Consequently we can expect the human reward function to be very complex.

In machine reinforcement learning we like to see the agent as some kind of learning creature/robot which is given reward or punishment by the external and manually defined reward system. But for biological systems the reward system is part of the creature, the brain.

We feel the presence of the reward system each time we enjoy something or suffer, which is the case most of the time. The reward system in the brain implements strategy by giving hints which short term goals should be achieved (e.g. finding food because of hunger) while the part which I will call the intelligent agent system implements tactics (what should be done to achieve the short term goals, like how to hunt for food). The agent system of the brain is trying to maximize the cumulated rewards produced internally in the brain. The reward system provides rewards which guide us in performing well in our environment. And performing well means carrying our genes over many generations into the far future.

While the agent system of our brain is shaped during our lifetime by learning, the reward system part is hard coded and is shaped by evolution. The reward system is optimized by the darwinian natural selection process where individuals with a better reward function (i.e. a superior strategy) have more numerous and more successful offspring. There is a huge difference in the timescales over which these two components are optimized: the agent system of the brain can be trained/optimized in a few years of lifetime while the reward system needs thousands of years to change. This design allows to integrate experience from thousands of generations into the brain while still being able to adapt quickly to changing environmental conditions between generations or even within a lifetime. It also helps to reduce the amount of behavior which needs to be coded in DNA.

The design of the reward system is ultimately coded in DNA while the agent system learns during its lifetime by creating connections between neurons.

What does all this mean for the creation of human-like A.I.?

If we get the human reward system right, any system which is able to optimize this reward successfully might show human like behavior. Therefore such an agent might not even need to mimic architecture of the human brain but could achieve its goals based on a very different design. While it might be possible in the future to create an artificial intelligent agent system of the human brain, the problem looks very hard for the human reward system. We could try to hard code a reward function based on the feelings we experience. In some simple cases this could work (specifying simple rules like “IF dehydrated REWARD drinking”). But I think such an effort would, considering the enormous complexity of the reward function and the fact that most activity is hidden in our subconscious, yield very poor results. Just imagine the countless possible conflicts between different rewards which need to be resolved by fine tuning rewards.

We could try to apply inverse reinforcement learning (IRL), where we try to find the reward function based on observations of the agents behavior in the environment. But how could such an algorithm measure the quality of the generated reward function? If we let humans decide, we can‘t build an automated system to generate the reward function and the process would be extremely slow. But an automated solution would have to be based on measurable criteria, such as survival and successful reproduction over several generations in a realistic earth-like virtual world. An IRL algorithm will have to run such a simulation many many times to calculate a good reward function. This looks like a computationally very intensive approach. As we are social animals the virtual world would have to contain other simulated humans. But to simulate them properly, we would actually already need the reward function we are looking for. Another problem is, that the algorithm might find a reward function which performs well in a limited set of test scenarios but is actually very different from the true human reward function and therefore could produce catastrophic results — like the robot killing a human — in some situations (finding the reward function is an underdefined problem). And last but not least there are ethical questions as it is not clear if the simulated humans could experience consciousness and suffer in the process

Another option might be an evolutionary algorithm: We could reproduce nature’s approach in a virtual environment, letting virtual humans live and reproduce over many generations and let natural selection optimize the reward function. But some of our behavior is ruled by very old parts of the brain („reptile brain“). So to accurately reproduce human behavior we would have to start the simulation with a very primitive organism and run the simulation all the way to the present time with all its complexity. During this simulation we also need to maintain a large enough population to make the natural selection work. This does not seem to be easier than the IRL method and could be equally unethical. It is actually equivalent to accurately simulating the biological history of a whole planet!

After all it seems safe to conclude that if humans ever succeed in implementing anything like this, they have become true gods.

So we can hope to be able to tell apart humans from robots for quite a long time to come even if the basic problem of how to create the necessary level of intelligence might be solved much earlier.

But we will probably soon see systems which show human-like behavior in a very limited range of clearly defined situations (such as robots learning to perform simple tasks from human instruction or better conversational chat-bots).