Beyond the Black Box

Charting the Course to Understandable AI

Mar 09, 2023

Experts in artificial intelligence (AI) can be loosely categorized into two groups: AI evangelists and AI skeptics. AI evangelists believe that machine learning can make our world a better place. While acknowledging that the risks of AI are real, evangelists tend to see the risks as merely technological challenges that will eventually, but certainly, be overcome. Evangelists urge for more funding for AI, which they see as the most useful technology humanity has have ever developed.

AI skeptics take a darker view. They believe that, at best, AI will displace human-decision making in ways we cannot foresee, with consequences that are not just unknown, but unknowable. They caution that unchecked AI might trample on fundamental rights of human beings. More than anything, they fear the rise of AI that can pose an existential risk to humankind. Skeptics call for more attention to be paid to AI safety, and some even want to see AI development halted before AI becomes the most dangerous technology humanity has ever developed.

At our company, Diveplane, we have made it our mission to achieve the optimistic goals of AI evangelists in a way that calms the pessimistic fears of AI skeptics. By navigating the deep waters of mathematics and information theory, we’ve discovered how to make AI that’s incredibly useful—while keeping it incredibly safe.

To explain how we do it, we’ve set up this Substack. Today we’re going to start with the problem of AI Alignment: How do we structure the algorithms, environments, reward functions, and training data of our AI in order to reliably align its behavior with our intent?

In his book The Alignment Problem, programmer and researcher Brian Christian explains the problem as follows:

“As machine-learning systems grow not just increasingly pervasive but increasingly powerful, we will find ourselves more and more often in the position of the ‘sorcerer’s apprentice’: we conjure a force, autonomous but totally compliant, give it a set of instructions, then scramble like mad to stop it once we realize our instructions are imprecise or incomplete—lest we get, in some clever, horrible way, precisely what we asked for.”

Let’s call an AI system that reliably aligns its behavior with its user’s intent an Aligned AI. A system that sometimes acts in ways that are out of alignment with its user’s intent, let’s call an Unaligned AI. To help understand the difference, imagine that the AI is a genie capable of granting wishes. An Aligned AI is a benevolent genie, like Robin Williams in Aladdin, that grants what you actually wished for. If you ask to be rich, your genie creates gold from thin air. An Unaligned AI is a malevolent efreet, fulfilling your wish literally in a way you might never have wanted. If you ask to be rich, your genie murders your beloved parents so you collect life insurance. The famous short story The Monkey’s Paw might as well be about Unaligned AI.

Unfortunately, most AI systems today are deep neural networks; and deep neural networks are inevitably going to end up unaligned at least some of the time. And for mission-critical applications, “some of the time” is too often.

In the book Weapons of Math Destruction, mathematician Cathy O’Neil specifies three factors that make AI models dangerous: opacity, scale, and damage. The issue of opacity comes down to a single question: How can we be sure that an AI algorithm will pursue the behaviors we want it to do if we don’t understand why it does what it does? The problem of opacity is so great that the forthcoming EU AI Act has an entire section devoted to addressing it (about which we’ll have more to say soon).

Opacity wasn’t as much of a problem before the rise of neural networks. Traditionally, software has been meticulously coded by software developers working with domain experts, and as such was never entirely opaque to someone with the source code. It might take time, but a competent computer scientist could almost always discern why a particular program did what it did. When software is transparent to its developers, users don’t have to fear the unexpected. Imagine if Call of Duty started to siphon power from the refrigerator to give you a faster frame rate! “My Twitch stream looked great, but all my food went bad.” Fortunately, that doesn’t happen.

Deep neural networks, however, create models based on iterative training on example data. The result is a problem-solving system that is fast, accurate – and utterly inscrutable. Deep neural networks conceal their decision-making within countless layers of artificial neurons all separately tuned to countless parameters. As a result, the developers of a deep neural network not only don’t control what the AI does, they don’t even know why it does what it does. Deep neural networks are almost totally opaque – and that makes them dangerous.

Despite the best efforts of researchers tackling this so-called black box problem, deep neural networks remain virtually incomprehensible to their creators, and the list of examples of “Neural Networks Gone Wild” grows longer every day. Here’s some highlights:

London’s Metropolitan police leveraged a neural network to scan for pornography. Unfortunately, the neural network kept identifying sand dunes as naked breasts.
Microsoft’s neural network chatbot Tay was supposed to mimic the behavior of a curious teenage girl on Twitter. Unfortunately Tay, became a racist, misogynist, Holocaust denier in less than 24 hours.
Google Photos used neural networks to identify individuals, objects, animals, food, and skylines. Then it inexplicably identified photos of African-American individuals as gorillas.

These outcomes range from the absurdly comical to the grossly offensive, but they have to date been relatively harmless. AI skeptics fear they won’t stay harmless for long. Already, at least one person has died in an AI-driven vehicle.

Yet despite the dangers, neural networks are being rolled out worldwide to control key infrastructure and critical business and governmental functions. Unchecked proliferation of Unaligned AI risks driving us towards the worst-case outcomes that AI skeptics fear.

It doesn’t have to be this way.

Black box neural networks can, should, and will give way to something better – a something better we call Understandable AI®. Our Understandable AI® systems are designed around the principles of Predict, Explain and Show, transparently revealing the exact features, data, and certainty driving the prediction, creating user confidence that operational decisions are built on a foundation of fairness and transparency.

To achieve that, our Understandable AI® uses non-parametric instance-based learning. Although instance-based algorithms have a rich history, early implementations of the algorithms suffered from notable limitations in accuracy and scalability. We have developed a fast query system and probability-space kernel that makes instance-based learning practical for enterprise use. We call this machine learning platform Diveplane Reactor™ and it provides accuracy that exceeds neural networks, combined with transparency, interpretability, and flexibility unparalleled in the entire AI space.

Understandable AI

Beyond the Black Box

Charting the Course to Understandable AI

It doesn’t have to be this way.

Discussion about this post