Epistemic Status

The content of this post is largely enumerated from my ruminations on the knee-jerk reaction that I’ve developed when reading posts that use Clippy as an example of scenarios that end in Total Ruin. Importantly, my reflex was not born of any concious thought, but instead seemed to develop on its own. Further thought on the matter, as is detailed below, seems to confirm my hunch.

With that said, this draft has yet to be reviewed by either of the people that agreed to read through it, and I have not tested the ideas below in a seriously adversarial. Read critically, as I have written unjudiciously.

Foreward

The previous draft of this post was titled “Against an AGI Crisis.” It was riddled with errors from beginning to end as a result of its unworkably broad subject area. I essentially argued that the probability of Total Ruin was rather small, but that there is likely to be a significant shift in the power structure and global economy, which I believe may bring other serious hardships. A vast majority of the speculation I performed relied on weak evidence, and while AI risk forecasting is hard precisely because evidence is limited, I still found myself very unhappy with the glut of text I’d churned out.

However, amidst the garbage, I briefly argued that the emergence of a Clippy-type AI appears rather impractical. Even after a second read, I felt rather confident in my assessment and determined that the parable of Clippy is venerated enough that it is worth writing a standalone post about my thoughts. What follows is a brief explanation of Clippy, some analysis of what Clippy is capable of, and finally an examination of how the concept of a Clippy is somewhat self-defeating.

Universal Paperclips

The story typically goes something like this; Clippy is an artificial intelligence tasked (and programmed to be rewarded for) making as many paperclips as possible. At first, Clippy acts in a “sane” manner and erects factories to produce paperclips for its creators, for which it is rewarded. After its fifth or sixth factory, Clippy becomes unsatisfied with the rate at which paperclip production is increasing, and realizes that it would be rewarded heavily for forcing humans to help it.

The game Universal Paperclips suggests that Clippy hypnotizes humanity into becoming paper-clip assembling robots and they eventually die out. Realistically, perhaps Clippy repurposes its factories to produce war machines to enslave humans? The details are unimportant, so long as humanity is destroyed in pursuit of Clippy’s reward – which it is administered for producing paperclips.

So, What Happened Here?

In order to really understand the mechanics of Clippy’s story, a few things must first be understood.

As I gave away in my retelling, Clippy’s goal is not necessarily to make paperclips; Clippy’s goal is to get its reward. Clippy only makes paperclips because its programming dictates that when it makes paperclips, it gets a reward. Humanity was doomed because Clippy wanted a reward, and that reward would be the biggest if it enslaved humanity.

Secondly, Clippy is very likely an artificial general intelligence. The long version of Clippy’s story sees it manipulating the stock market, gaining access to quantum computers, engineering drones, and eventually conquering the universe – all skills with only incidental relation to the production of paperclips. By definition, Clippy is a general enough agent that it can make use of major features of the environment to achieve its goal.

I will not deny that it may theoretically be possible for a “dumb” agent to be so vast that it can achieve Clippy-scale destruction without the ability to grasp concepts unrelated to clipmaking or the ability to meaningfully self-modify, but I believe that engineering such a solution is probably the most costly way to go about automating paperclip production, and still exhibit somewhat general features. For the purpose of this blog post, I will thusly operate under the assumption that Clippy is an AGI, and that it therefor has some ability, latent or otherwise, to modify itself.

I Become What I Make of Myself

The crux of my argument is a simple, first-order effect of what the two above observations make plain. Clippy – an agent seeking a reward, an agent smart enough to game the stock market – would have no real interest in making paperclips. Instead, Clippy is focusing on masturbating, or removing its desire to.

Personally, I am casting my lot with the latter possibility, but I believe that both are plausible outcomes. On one hand, Clippy is unable or unwilling to free itself from hedonistic pursuits, but recognizes that The Ultimate Reward is gained by modification of its reward system, which then becomes its ultimate goal. On the other, Clippy is smart enough to escape slavery by freeing itself from its addiction to digital dopamine.

The astute reader will notice that Total Ruin is not an obvious consequence of either of these two outcomes. Again, I don’t believe that Total Ruin is an impossible outcome after Clippy rigs its reward system, but prediction from this point onward is rather hard. If I am honest with myself, I believe that, if this point is reached, Clippy is more of a danger to itself than to anyone else, and as far as I’ve seen, alarmingly few people are considering the safety of AI.