Researchers from Georgia Tech have created a robot called Shimon that can write songs, play music, and sing. The marimba-playing robot is going on tour to support a new album that he has composed and sings.Slashgear.com, Feb. 26, 2020
Among the many and varied innovations that digital technology has brought into our lives is an increasing capacity to produce various types of content in a variety of automated ways. Some of this is as (seemingly) innocuous as Gmail’s “Smart Reply,” which generates responses to emails; others are provocative, such as the “Bot or Not” web site, which challenges you to figure out whether a poem was written by a human or a computer. Visual imagery is another area in which computers have been turned loose to create what some may call “art.”
And then there’s the land of sound, where the computer’s increasing ability to generate music, versus merely assist in its composition and production, presents a deep and murky Pandora’s box of issues, from the artistic and the financial all the way to the ontological. Shimon, the singer/songwriter robot, was inevitable.
As we have collectively become at least a little warier of techno-utopianism than we used to be (thanks, Facebook!), discussions of how computers can create things that previously only humans could write, draw, paint, compose, or otherwise design and produce seem now to take a stance somewhere between the gee-whizzy and the maybe-slightly-concerned—as in, “Wow, look at what computers can do now, or very soon!; I guess we better get used to it?”
Regardless of any given commentator’s assessment of computer-generated writing and artwork and what the future may hold, the one thing everyone holding forth on this topic has in common is a relentless focus on surface output. Discussions are framed around whether an image looks like something a person might have created, whether a poem might be mistaken for something a person might have written. Mimicry is posited as the goal.
What is ignored when considering computer-generated content, of all kinds, is that human-generated art is never simply about the surface of what has been created. The point of a poem isn’t the manifest reality of one word placed after another on a page, it’s that the words were put there, consciously, by a human being with a life history, with connections to and relationships with other human beings, and with an awareness, both instinctive and unique to each person, of the excitement and the fragility of organic existence. Human beings inherently, inescapably bring depth to artistic output; one could argue that that is in fact the whole point of artistic output: the reaching of one individual consciousness towards another, or an audience of others.
Machines programmed to generate output that looks like something a human might have written (or painted, etc.) achieve their task based on massive computational capacity—typically, artificial neural networks designed to mimic, in a limited way, the neural networks at work in the brain, as far as science understands them (which is an incomplete understanding at best). There is nothing resembling “thought” and even less here resembling “feeling” in what the computers are doing to generate these poems or images or songs. Which means that, inherently, there is no depth to it; it’s all surface.
Georgia Tech can happily post a video of Shimon the robot’s “song”—here, if you must: https://www.youtube.com/watch?v=sHl-Cg2KDbg—and, sure, it (kind of) sounds like a normal song. But is that all that music is: sound? Is all we’re hearing when we hear a song simply a sequence of notes, and a series of words?
The same, but different
Let’s try a thought experiment, which at this point doesn’t sound far-fetched. Let’s say that a computer programmed to produce poetry ends up generating a poem that is, word for word, what a human poet somewhere on earth has in fact written. (Maybe this has already happened!) On the surface, the identical words. Underneath: very different. To pretend that the computer’s poem, regardless of its word-for-word match, is the “same” as the human’s is to conflate surface with depth. The human poet dives into their inner world, choosing words to communicate thoughts and feelings—thoughts and feelings that are rooted in the reality of being a thinking, feeling human being who lives and breathes and eventually dies, a human with a family of origin, with a network of friends and loved ones, with a sexual drive and egoic demands, a human who daily struggles with the reality of the aforementioned fragility of human existence. (I re-emphasize this last part because it seems overlooked when considering whether computers can produce human-like art. )
The “gee-whizzy but also worried” folks who confuse the surface of computer-generated content with the depth of art produced by human beings are misdirecting you from the actual trick being performed here. Because the trick isn’t that a computer can write poetry, the trick is the sleight of hand employed to assure us that the surface of the end product is all that matters. Once you buy that, sure: a computer is writing poetry.
But we shouldn’t buy that. Depth is everything here, and nothing that a computer can offer. And note that while the depth inherent to a work of art begins within the consciousness of the artist, it by necessity ends within the consciousness of the observer sensing this depth-to-depth connection. Whether you are aware of it or not, what most engages you, when you look at a painting, or read a short story, or hear a piece of music, is the bridge that any work of art creates between the consciousness of the artist and the consciousness of the viewer/reader/listener. The communication is implicit, even if the art work often prompts more questions than answers. But as the viewer/reader/listener, the reality that another human has reached inside their being to create something that they want you to experience is, in the end, as noted, the purpose of art. And it’s what computer-generated content cannot, by definition, provide.
This missing element of one human seeking contact with another is, perhaps, what underlies that “uncanny valley” sensation that people talk about when encountering an object that seems nearly human but also off in some disconcerting way. Or, for you Philip Pullman fans, think of how the humans in His Dark Materials are instinctively repulsed by the reality of a person walking the earth without their daemon. Computer-generated art or writing creates the appearance of soulful enterprise minus the soul. It should repulse us.
And let me make it clear that I’m not talking about the use of computers to assist in art-making. At that level, the computer is another tool, and a powerful one at that. I’m talking about the concept of the computer creating the art on its own, via programming that removes all direct human input from the end result.
Now then, in a world in which people interact with content in a rapid-fire, look-share-and-forget kind of way, the essential difference between computer-generated and human-generated content may end up unnoticed or disregarded. If so this will be a symptom of a culture that is losing ts capacity for any sort of interior reflection, and with it the desire to seek genuine contact with one another.
The point at which human writing and computer writing become indistinguishable on the surface may mark an inevitable moment in technological development. The question is whether we will have the resolve required to stay informed—to demand transparency in our content sourcing much as we have been learning to know where our meat and produce come from. I contend this will always matter, but am not all that optimistic that the average American, ever intoxicated by convenience, will agree with me.
The ineffable connection
Music actually presents a bit of a special case, because, in a way, we’ve been confronting a related surface-versus-depth issue in the music world for quite a while, long before Shimon came along. Because whether or not we are dealing with computers writing music themselves, we’ve had a corresponding circumstance in the context of sound generated via digital programming versus human hands on physical instruments.
Another thought experiment: program a computer to generate the exact sound of Kenneth Pattengale, of the Milk Carton Kids, playing lead guitar, and then record Pattengale actually playing. These two recordings may be impossible to tell apart. But they are different. The pleasure of Pattengale’s guitar-playing is intimately tied to the knowledge that a human being is producing these amazing sounds in real time and real space. (This is why watching him play versus simply listening to the sound intensifies the joy of the experience.) You don’t have to know Pattengale personally to sense the ineffable connection between performer and audience when you are witnessing his physical effort (which by the way he makes look effortless). Note, however, that you don’t have to see him playing; as long as you know an actual person was playing an actual guitar, the effect on the listener is real.
This is why the “Look: a computer can do the same thing!” framework is dishonest. In an age of consummate computer-generated mimicry (“deepfakes” represent the malicious end of this spectrum), being able to tell, merely by looking or listening, whether a computer produced something or not becomes a false benchmark of validity.
And yes, there certainly exists a lot content in our information-saturated world that requires, for its creation, nothing beyond surface input, content that can therefore if necessary be produced without concerns about lack of depth. I’ll be sorry to know that this may cause human beings to lose their jobs in various arenas, but, sure, send the robots in there and let them do their thing if they must. Just stop writing poetry please. And, dear god, no singer/songwriter tunes. I mean, a human creating art via the filter of his or her own psyche and history, versus a computer “studying” thousands of songs to learn to produce human-sounding lyrics: which would you rather spend your human life listening to?
Truth be told, this is one of the main reasons why I resist the urge to use Siri or Alexa, above and beyond the privacy issues. When I talk to someone, I want to hear, in return, a voice attached to the baked-in reality of conscious human life, with all its history and emotions and frailties. To hear words spoken based on algorithms and machine learning leaves me feeling not spooked but profoundly sad. When Forster said “Only connect,” he didn’t have a USB cord in mind.
And what of the day when computers themselves are in some meaningful way sentient? Oh, don’t get me started. I don’t personally believe that a construction of metal and plastic that ultimately bases its experience on an on-off switch of 0s and 1s, a machine lacking the innate awareness of its tissues’ organic fate—because it has no tissues and no organic fate—can ever be sentient in a way that equates to human consciousness. When computer “consciousness” arrives—if it arrives—it will be its own territory (located, no doubt, somewhere in the uncanny valley). How much you might or might not be interested in learning about and hearing from this new kind of consciousness will be yours to determine.
Me, I’m sticking with the humans. They still have so much to say and I so much to learn about this existence of ours, fated to decay and disappear, and yet requiring neither a plugged-in power source nor ever any operating system updates. Our eventual deaths, to borrow Silicon Valley promo-talk, turn out to be not a bug but a feature. A defining component of our consciousness, this knowledge of our own mortality is what undergirds our artistic inclinations, and, paradoxically, fills our lives with the capacity for profound meaning and connection. The smartest computer in the world will never know what this is like. I don’t need it to write me a poem or sing me a song.