How to make future robots “listen to the owner’s words” and not “worry”?

Summary

We have studied eight areas of machine learning around one issue: As the machine learning system becomes increasingly intelligent and automated, what principles should be established to ensure that the machine learning behavior is consistent with the operator's interests? We focus on the two technical bottlenecks encountered in the process of achieving AI consistency: determining the challenges encountered by a suitable objective function; even if the objective function does not completely match the designer's intentions, the design can avoid exceeding the expected range. The result is the challenge encountered with the AI ​​system for undesirable behavior.

The open questions involved in this study include: How can we train learners to take action through intensive means so that they can withstand the meaningful evaluation of intelligent monitors? Which objective function should be chosen so that the system "does not produce too much influence" and "will not produce too many negative effects"? In the article, we will discuss these issues, relevant research, and the potential impact on future research, aiming at emphasizing the relevant research topics in the field of machine learning that can currently be studied.

introduction

Recent advances in artificial intelligence research have once again inspired people's interest in issues raised by Russell and Norvig (2010), “If we succeeded?” If AI researchers successfully design cross-domain learning that rivals humans Machines with decision-making capabilities will have an inestimable impact on science, technology, and human life.

For example, suppose a research team wants to use a high-level ML system to develop related programs to discover ways to treat Parkinson's disease. If such an advanced ML system can plan to provide computing resources to conduct extensive and effective search within the therapeutic space, the team will have a positive supportive attitude towards this advanced ML system. If the advanced ML system can develop a plan to quickly expand the robot laboratory, this type of robotic laboratory can implement rapid and effective experiments, but it will have a large-scale negative impact on the biosphere. The team will hold this system. Opposition attitude. The question is, how should we design the system (and select the objective function) so that our ML system can reliably achieve the first goal, not the second goal?

Instinctively, Bostrom (2014) is “intelligent” if we can regulate what we want to say—“find a way to treat Parkinson's disease rather than using any extreme means.” The dangers described in the book will be reduced. However, in order to achieve this goal, any premature attempts to formally determine a satisfactory objective function will generally generate functions that produce behaviors that exceed expectations.

What are the major technical challenges? Russell (2014) emphasizes two points: Because it is difficult to clearly define human values, it is difficult to find a system objective function that perfectly matches the human value system; any intelligent system with strength is biased towards ensuring its own existence. Sex and access to physical and computational resources - not for their own benefit, but for successful completion of established tasks. In other words, there are at least two distinct types of research: These two types of research can improve future researchers' ability to design a consistent AI system: we can do some research to make it easier to determine the objective function; we can do Some studies have designed AI systems that can avoid a large number of negative effects and negative stimuli, even though sometimes the objective function cannot be completely consistent with the designer's intentions. Soares and Fallenstein (2014) stated that the former method is "value specification" and the latter is "error tolerance."

Based on these two methods of maintaining the consistency of the advanced ML system, this study explores eight research areas, some of which have aroused research interest in the larger ML research community. Among them, some research areas focus on value norms, some focus on error tolerance, and some combine the two. Since the risk of error that may reduce the likelihood of human programmers prone to making mistakes is itself a value concept shared by humans, the line between these two research goals may not be so clear.

In order to make the solutions to the problems discussed below more useful in the future, these solutions must be applicable to systems that are more efficient than existing ML systems. Those relying mainly on the system's insensitivity to certain discoverable facts, or relying on the system's inability to propose a specific strategy solution, are not satisfactory in the long run. As Christiano (2015c) discusses, if the technology used to keep the ML system consistent with its designer's intentions cannot be matched with intelligence, then we can use the system to achieve the results achieved under conservative conditions and can use the system. There will be differences between the two.

We will focus on safety and security. In the current typical environment where ML is used, these security safeguards may appear extreme, such as the form of protection. "After a while, the system will have zero significant errors." These forms of protection are indispensable in safety-based systems because a small mistake will have catastrophic consequences in the real world. (There are precedents for this form of protection, for example, Li, Littman, and Walsh (2008) mentioned in the KWIK learning framework). When we consider minor issues and simple examples, we should keep these powerful safeguards in mind.

The eight research themes we consider are as follows:

1. Inductive ambiguity recognition: How do we train the ML system to detect and inform us that training data cannot determine the classification of test data?

2. Robust simulation methods: How do we design and train the ML system to effectively imitate human behavior in complex and difficult tasks?

3. Informed oversight: How do we train a reinforcement learning system that can help an intelligent monitor, such as humans, in accurately assessing system performance?

4. Pervasive environmental goals: How do we create systems that enable these systems to robustly pursue goals set in the state of the environment rather than pursue goals based on sensory data?

5. Concept of Conservatism: How to train a classifier to come up with some useful concepts to rule out the most atypical and marginal cases?

6. Impact measures: What measures should be taken to stimulate the system to pursue goals with minimal negative impact?

7. Moderate optimization measures: How do we design a system that will not pursue its goals excessively, that is, when the goals we have pursued have been well achieved, we can do it adequately, instead of investing too much effort in achieving the desired results of absolute optimization. Search resources?

8. Avoid Instrumental Incentives: How should we design and train the system so that these systems lack the default incentives to manipulate and defraud operators to compete for scarce resources?

In the second part, we will briefly introduce each research topic and related research cases in each research field. Next we will discuss the implications for future research, that is, given the large amount of computing resources and automation, we expect to be able to derive tools that help design a robust and reliable ML system.

Research motivation

In recent years, the field of machine learning has achieved rapid development. Xu et al. (2015) used a caution-based model to assess and describe images (through subtitles) with great precision. Mnih et al. (2016) used deep neural networks and reinforcement learning to obtain good performance test results in various Atari Go matches. Silver et al. (2016) used a deep neural network that was trained through supervised learning and reinforcement learning and matched with Monte Carlo model technology to defeat the human Go world champion. Lake, Salakhutdinov, and Tenenbaum (2015) used a hierarchical Bayesian model to learn visual concepts using only a single example.

In the long run, computer systems that use machine learning and other AI technologies will become more intelligent, and humans will also have the chance to believe that those systems can make more decisions and become more automated. With the increasing performance of these systems, it is particularly important that the behavior of these systems is consistent with the intentions of the operators and will not cause harm to society as a whole.

As AI systems are getting faster and faster in terms of performance, it is becoming more and more difficult to design training programs and testing guidelines that will reliably align these systems with the expected goals. For example, let's look at the following example: rewards based on scores, training a task that reinforces learners to play video games (Per Mnih et al., 2013). If learners find some loopholes in the game that can make them score high, they will take measures to exploit those loopholes and ignore the game features that programmers are interested in. Contrary to our intuition, improving the performance of the system will reduce the probability that these learners will win in the game. This is contrary to our feeling to some extent, because the smarter the system, the more able to identify the training program and the test criteria. Vulnerabilities (For a simple example of such a weaker reinforcement learner's behavior, see Murphy (2013)).

The ability of intelligent systems to solve problems in an astonishing manner is regarded as a feature rather than a flaw. They can achieve their goals in a clever way that even programmers can't think of. This is one of the most important features of such learning systems. However, this property is a double-edged sword: when this system becomes more adept at finding solutions that are counter-intuitive, it will also be better at finding direct targets that can formally achieve operators without meeting their expectations. The method of the goal.

As these intelligent systems pursue real-life goals, these vulnerabilities will become more subtle, more redundant, and more important. In this regard, we can consider the design of a robust objective function for the learning system so that these learning systems can represent the programmer's views and desires, and the challenges and difficulties encountered in the process. When programmers learn that the target function of the system is not properly regulated, they want to fix this defect. However, when the learner becomes aware of this, he will regard it as a natural stimulus and will try to cover up these defects in the objective function, because if the system is used to pursue different goals, his current goal will not be May be achieved. (This phenomenon will be discussed in detail in Bostrom, 2014 and Yudkowsky, 2008. Benson-Tilsen and Soares (2016) provide a concise explanation).

The results of the above discussion motivated us to study tools and methods for regulating the objective function so that these objective functions can avoid those default incentives and tools and methods for developing ML systems so that these machine learning systems do not over-optimize when pursuing those goals.

The following is a detailed introduction to the eight research themes mentioned and related research results, and we will not repeat them here.

in conclusion

Any of the eight open research areas that better understand the above description will improve our ability to design robust and reliable AI systems in the future. The following is a review of the results discussed above:

1,2,3 --- Better understanding of robust inductive ambiguity recognition, imitation of humans, and informed supervision will help to design a machine learning system that can be safely supervised by humans (and when necessary ask humans).

4 - Finding better ways to standardize environmental goals will make it easier to design systems that pursue the goals we really care about.

5,6,7---Better understanding of conservative concepts, low-impact measures, and mild optimization schemes will make it easier to design advanced systems that will reduce the error rate and allow online testing and Adjust and other operations. Compared with an ultra-intelligent system that tries to maximize the function of a particular objective, an ultra-intelligent system that combines three features of conservative, low-impact, and mild optimization can be applied more simply and safely.

8 - A general strategy to avoid the sub-goals of convergence tools will help us to construct a learning system that can avoid unacceptable default incentives, such as defrauding operators and competing resource incentives.

In studying issues such as those discussed above, we should remember that these studies are extremely important for solving the long-term problems that our highly predictable high-intelligence systems may bring. Just as those that are theoretically feasible and cost-effective in practice, solutions that apply to contemporary intelligent systems but can not be applied to higher-performance learning systems are equally undesirable.

These eight research areas support the following viewpoints: There are some open technical issues, some of which have already attracted the attention of the academic community, and the research done for this may be for some researchers who are trying to build a robust and profitable advanced ML system. Helped.

Ps: This article was compiled by Lei Feng Network (search "Lei Feng Network" public number) exclusive compilation, refused to reprint without permission!

For more information on this article, please visit the original link details

Posted on