I held an idea in my head: the colossal necrotic lord of a wasteland empire lumbering through the ruins of his kingdom. I imagined a towering creature formed of bones, millions of repurposed femurs and fibulas piled and fused to shape a god-like behemoth that’s inhuman gaze stared down from a cold sky. I didn’t surface this idea for any noble reason, nope– there will be no tale of cyclopean elder-gods or any eerie vision to post to Instagram. I wanted to feed my idea to an AI and see what came out.
Unfortunately, I wrote the description above after I fed the beast. What I pumped into the text box of the image generator was:
Towering, 400 ft. tall, slender creature made from four-million bones, walking on two legs through an ancient city and casting it’s faceless gaze, through many tiny eyes, down at the viewer. In the style of Salvador Dali.
I conjured this vision with the goal of familiarizing myself with the creative AI tools that are getting so much attention. I’ve touched on this before, but I was a little behind in my explorations. In the past, I’ve supplied most of the art and design details that appear on this site and in my projects. The only time I’ve used AI was in a post specifically about AI. While I’m as apprehensive as most creative-minded netizens, I refuse to be afraid of the robot overloads who’ve come to replace me– and if these are the tools of the future, I’m not going to balk… so I rolled up my sleeves.
Spoiler: I’m not replacing my work with robo-work any time soon, though I do see opportunities where it could speed things up. I will also happily share my adventures in the space (that way, if I go missing, you guys can find me in Tron).
The Contestants
In my post on writing with ChatGPT I included AI rendered images of giant robots attacking cities. Midjourney created those images. I didn’t revisit Midjourney for this foray into the robot psyche, partly because it’s no longer free. Surprise, surprise. That being said, I did use paid generators in this experiment, but I don’t plan to pay for Midjourney. While it’s largely considered the leading “text to image” AI tool, engaging it on Discord is a bit wonky, and I don’t have the need to generate a ton of images. All of the tools tried here are paid, though two of the platforms have free demos.
Without further ado, today’s contestants are:
- Davinci (free demo)
- Firefly/Photoshop
- OpenArt (free demo)
- Dall-E
Part of what sparked this ad-hoc experiment was a curiosity into how literally a machine would take my prompts. I had the idea, based on what I was imagining, of including a very specific, astronomical, number of bones for the construction of my colossus. I specifically wondered if I asked an A.I. to generate something in the millions if it would crash a server? Or, would it be ignored?
I also wanted to see if any of the images resembled my vision– but I’ll get more into that later.
Davinci
I didn’t know that Davinci existed until I ended up on their website. Turns out there are lots and lots of these things, though there is a significantly lower number of engines they run on. One thing that caught my attention regarding Davinci was that a person could upload their own image to riff off of.
This is one of the more interesting aspects of this type of tool, in my opinion. This is where my needs could actually be met: helping me scale a process where my own work is used. Worth noting: when we talk about “engines” we’re also talking about the data set, which means some of these tools tend to have a “look” because their logic is informed by a segment of artist’s real art. It’s the ugliest part of this movement and a large reason why I don’t have interest in making AI art willie nilly– because Ai art has the genetics of someone else’s creation. If I could use it to scale my own creations, that would be different.
Here are the first three options Davinci put in front of me.
It took me aback to see how much of my description didn’t factor into the final outcomes. Some of that is on me. I realized that my term “viewer”, which meant POV to me, could be interpreted as other subjects in the composition viewing the creature. You will see a lot of “viewers” wandering these ancient cities.
Lots of it isn’t. Within my own imagination the sky seemed to be the limit, even within the parameters of my prompt, but all of these tools deviated enough that no image fully matches my description. The simple request for two legs gets ignored a lot. “Faceless” and “tiny eyes” also tend to be more than these A.I.s want to consider– maybe there isn’t a ton of surrealist monster art in these databases to reference, I don’t know.
Davinci wasn’t so bad, in that regard, though the pictures are lackluster. The middle image is the closest to what I had in my head (and may actually depict four-million bones), though the setting was completely nixed. I also had something more dynamic in mind: like a skeletal kaiju looking directly down on you. These compositions are all flat… though I kinda like the giant vole/dog.
I didn’t play with Davinci much, and they advertise photo-like images, but the “brushstrokes” in these renderings have that surface blurred look that’s indicative of lots of these early AI models. It’s a look I associate with software trying to save bandwidth. In general, I wasn’t enthusiastic about any of these.
Firefly/Photoshop
It’s worth noting that there is a lot of value in the Adobe Creative Suite, so the hefty price is not purely for their image generation tools– which did not hold up well. I’ve been making blog post covers and digital art in Photoshop for years, so this contestant was of particular interest. What’s most promising about the Adobe tools is how they’re implemented in Photoshop (you can try Firefly, the A.I. engine, on it’s own site without subscribing or going near Photoshop). Photoshop uses AI as a practical aid in different steps of the photo editing, or painting, process. You can select your whole canvas and say “make this”, or you can make selections and ask for varying degrees of changes. This is more of what I would want, but…
… these very METAL renderings weren’t really what I was looking for. I think the t-shirt could be a banger, but I was imagining something much different.
Firefly took my “faceless” description and made two monsters that were almost entirely face. It also snagged its backgrounds from Tombraider 2 (the game, not the movie). Firefly is probably best suited to helping Photoshop users expand backgrounds and do other arduous, but common. Don’t be fueled by the “leopard in the library” commercial quite yet.
Aside: Canva competes with Photoshop in this space, though it’s more elementary in a lot of aspects (but also more accessible). I didn’t play with it here, but I do believe that there’s free access to most of its features.
OpenArt
OpenArt is another generator that I stumbled upon. It’s also not “free”, but they give you fifty free credits to get going. The A.I. took my “bones” characteristic to heart and gave me giant skeletons. Though not explicit, I’d certainly hoped that using bones as my material and not specifically saying skeleton would give me a more creative rendering. I did not get that.
At least the composition had more perspective, and the ribs look kinda nice. Composition is probably the only category that OpenArt excelled in. I don’t think there are a million bones here and this skeleton’s feet MIGHT be fused to the ground, but at least it has a chance to find help in Santa Fe, or whatever this not-so-ruined city is.
There isn’t much more to say here: skeletons. That’s it. We’ve all got em, let’s move on.
DALL-E 3
DALL-E is probably the closest competitor to Midjourney. What it really has going for it is that it’s a product of OpenAI and therefore bundled with ChatGPT– everyone’s darling A.I. (except Elon Musk’s). The downside is that it also made me giant skeletons, though they are much prettier skeletons.
What I appreciate most about these renderings is how attractive they are. The first skelo-kaiju leans way into the Dali look and is actually appealing– though still way off from what I asked for. Is the third leg just to spite me?
The second creature ALSO has a third leg (or appendage) for some reason, but gets kind close to what I had in my mind– despite ignoring a lot of stuff. The composition is much closer to what I pictured, at least, and the city is notably desolate.
The Winner…
This really comes down to what a person wants from the machine. A creative person who already has a vision will always be (at least slightly) disappointed, while someone with a need for a creative vision will probably get what they need. There’s an iteration factor that could be valuable, but it’s also concerning to give an intelligence that might replace you your “style”– especially when its abilities seem capped by the pool it’s spoofing. It can only take the data it has so far, which means it really needs your “style” to extend its “imagination.” I, for one, am not talented enough to be too concerned (God help the person who gets a recommendation riffed from my artistic decisions), but absolutely understand why others should be.
If our victor is the tool that best matches what I’m imagining, the second image from Davinci is probably the closest– though still far from close– rendering. Combining it with the second image from DALL-E would get us a bit closer, the composition and feel near what I had in mind, but the use-case of creating what’s in my mind simply cannot be met by these tools.
Can I describe what I had in mind in more detail? Sure: I saw something tens of stories tall and only vaguely anthropomorphic, despite standing on two legs. This colossus among ruination would be shaped from the conjunction of millions of bones and stare blankly down at us through a constellation of pits in the flat, bone, face of it’s animal-like skull. Horny protrusions would extend from there, entangling it’s elusive neck and shoulders in wind-hewn brambles; depths from which something like appendages extend from in an odd, and incalculable, number. Is this calcified apparition, as dusty as the ruins it’s emerged from, part of that apparatus? A titan left, inseparable, from its lost civilization? Or is it a cluster of pixels translated through a machine?
You’re thinking: why didn’t you type that to begin with, dumb shit.
That’s fair, and maybe I should try, but I also failed to get a one-hundred percent match for the limited prompt I entered. Semantics and my writing skills certainly contribute to the gap, but there’s no doubt that the problem runs deeper.
For the best results, a person needs to follow up their prompt with edits. They need to massage the results. Eventually, after adding some of the above details, I did get DALL-E to make this lovely creature (I’d love to know who’s art trained it to make the decisions that it did). I pushed the generator extra hard to avoid anthropomorphic features, killed the surrealist attributes, and asked for something much much bigger. While it’s hard to deny how beautiful the outcome is, it also underscores my final thesis: it’s really hard to get what you ask for and nearly impossible to get the image you’re imagining.
My take-away from this exercise is that these tools are more likely to help those who don’t have the breadth of imagination to conjure the image themselves. If the use-cases for replacing artists, versus helping artists, are in a race: replacing us is winning. I could draw what I’m thinking, but most people would just take this much cooler thing, for a small price, instead.
I have a passion for creating monsters and it’s been fun to spin them up so quickly, but they aren’t mine. There’s a dirty delight in seeing what comes out, but the grand satisfaction isn’t there. You can type into the void, but it’s not you staring back– so what happens when art becomes the void?