Safety in artificial intelligence (AI) is a hot topic these days. How can we ensure that an AI system would not harm humans? Let’s imagine an autonomous AI robot which is programmed to help humans by doing housework. Now let’s imagine that while the robot is cleaning the house, guests arrive with children who are curious about the robot. The primary utility function of the robot is to clean the house, not engage socially with people, so the AI robot might push or even harm the children in order to fulfill its task. A simple solution is to have a “kill-switch” on the AI robot body, which, if activated, would cause the AI system to turn itself off in situations where it may cause harm. However, pressing the kill-switch would also prevent the robot from completing its primary function, housework, so it might attempt to prevent or even fight any person trying to activate the kill-switch. This is obviously not an ideal solution.