This post was originally published by Jeremy Kahn at Fortune
The pandemic has driven many companies to accelerate digital transformation. This is particularly true in manufacturing, where the pandemic has forced businesses think about how to operate with fewer workers on machine shop floors and assembly lines.
Automation is the order of the day. And, increasingly, artificial intelligence is playing a role in these efforts—predicting when machines will need maintenance, directing growing armies of factory robots on their daily rounds, and optimizing workflows throughout the entire manufacturing process.
In the coming weeks, I will be writing more about this trend. But today, I want to talk about the way in which industry’s turn to A.I. is accelerating a particular type of machine learning: deep reinforcement learning (or deep RL). This combines deep neural networks, the kind of machine learning software very loosely based on the wiring of the human brain, with reinforcement learning, which involves learning from experience rather than historical data. Deep RL is behind most of the big breakthroughs in computers that can play various kinds of games better than top humans, including DeepMind’s achievements in Atari, the strategy game Go, and most recently, with Starcraft II, as well as OpenAI’s work on the video game Dota 2 and Facebook’s and Carnegie Mellon University’s poker-playing software.
It has been difficult, until very recently, to transfer these same methods to the real world. The algorithms for deep RL can be hard to implement. They usually require a good simulator in which the A.I. can be trained, and most businesses don’t have one. Even with a simulator, there can be concerns about how exactly the system will perform—or even whether it will be safe—if there are subtle differences between the simulator and the real world.
This is starting to change, says Chris Nicholson, the founder and CEO of Pathmind, a San Francisco company that helps industrial customers use deep reinforcement learning. He says that many companies now have enough data that they can create decent simulators of their operations. Even relatively simple simulations, he says, can be used to find ways to do things more efficiently. The most sophisticated businesses graduate to “digital twins,” complete virtual copies of their operations. This allows them to see in advance exactly how any adjustment to a process will impact the whole operation. They can run predictive analytics not just for single machines, but across the whole business.
Nicholson also points to another startup, based in nearby Berkeley, called Covariant, that has used deep RL to teach industrial robots to identify, grasp, manipulate, and sort a variety of different-sized objects, a major milestone in robotics. Covariant teaches the software that will control the robots in a simulator before loading it onto the real robots, who then transfer those skills to the real world. Covariant has a partnership with ABB, the world’s largest producer of industrial robots, so deep RL, at least for teaching robots, may become mainstream far faster than people realize.
Nicholson says there are several advantages to using deep reinforcement learning over traditional supervised learning methods. In supervised learning, software is trained by examining a large set of historical data and learning what set of inputs is most likely to result in what set of outputs. But what happens when the future no longer looks like the past? In these circumstances, supervised learning systems often fail.
Deep RL systems meanwhile are potentially more robust to shifting inputs, Nicholson says. You can train the system in the simulator to respond to all kinds of Black Swan events—like say, the supply chain disruptions caused by a global pandemic—even if your business has never encountered that situation before. “A year ago, the supermarket in my town ran out of toilet paper,” Nicholson says. “That’s because traditional optimizers cannot handle novel disruptions.” Deep RL systems, on the other hand, can learn how to cope with data variations, big and small.
Deep RL systems can also find all kinds of ways to improve the performance and efficiency of a complex system that humans have never thought of, because the software can experiment endlessly, and cheaply, inside a simulation. But, ironically, the counterintuitive nature of some of the insights deep RL algorithms produce can be an impediment to the technology’s adoption—managers don’t trust what the system is telling them if it seems to violate a long-held rule of thumb. This is especially true because most deep RL systems can’t really explain the rationale for their choices.
“People really like deterministic systems and they like them even when they fail,” Nicholson says. Pathmind has overcome some of the hesitancy to use deep RL by deploying a deterministic optimization algorithm—one that uses an explicit and explainable set of rules and which always produces the same output for a given set of inputs—alongside a deep RL algorithm, so customers can see when the deep RL system provides a better solution. After a few cases where unconventional suggestions provide a big boost to the company’s bottom line, most customers become converts. “One customer told us, if it wasn’t counterintuitive, I wouldn’t need to pay for it,” he says.
Nicholson notes that in the science of information theory, one way to assess the value of a given piece of communication is to ask how surprising it is. The greater the surprise, the more informational value that message carries. It’s that way for A.I. too, he says. We want A.I. to surprise us—but only in a good way.
We often complain A.I. systems lack common sense. But that’s not the same thing as saying conventional wisdom is always right. Instead, what we want is a system that won’t do stupid things that a human would never do, but will do clever things that we never would. It’s a fine balance to strike, but deep RL might just be the path to get there.
This post was originally published by Jeremy Kahn at Fortune