AI image generation has arrived
My official assessment of AI image generators has always been this: they’re impressive and fun, but not ready for real creative work. You could maybe get something usable occasionally, but overall, DALL-E, Midjourney, and the rest were toys, not tools.
OpenAI’s 4o image generation (formerly DALL-E) is the first AI image generation that is a genuine tool. Gotta say, I’m shocked. I didn’t expect to be here yet. Here’s what’s changed.
Prompt: a perfectly realistic portrait photo of an average looking 40 year old korean man
It creates realistic (enough) portraits
Portraits of people now look real. Well, real enough. The image above does look a bit plastic, a bit too right, but 99%+ of human beings will think that is real. If I saw that image in a profile photo, I wouldn’t give it a second look.
Part of this realism is that ChatGPT 4o can generate regular-looking people, rather than defaulting to models from a Louis Vuitton ad. AI-generated portraits have mostly been excessively beautiful. Below is a pretty typical “Midjourney lady.”
Just your average Midjourney lady kickin’ back
Professional-level type
The text rendering in 4o is a giant leap. DALLE’s type was simply dreadful. With a single upgrade, it’s vaulted to the level of a good professional graphic designer. 4o does still make mistakes, especially if it sets a bunch of type. But for titles and brief bits of text, it’s quite reliable and gives good results.
Prompt 1: set the words Everything is a Remix in a gothic typeface, white on black
Prompt 2: in the background, place a 1950s style color image that looks like something from a criterion collection film
Sketching + prompting
For visual work, prompting is a blunt instrument. I’ve often referred to AI image prompting as “spaghetti at the wall.” No matter how detailed your prompt, you never know what you’re gonna get.
In 4o you can fluently combine sketches and prompts. This means you can sketch out your idea, then write a prompt describing how to execute the sketch. The result is a more precise way to art direct the AI.
Here’s an impressive result from X user EP.
Beyond single images
Past image generators were limited to full-frame images and couldn’t do layouts or infographics – at least, not unhorribly. The leap in quality in 4o is just as big as the one in type.
Again, this is very practical stuff. Layouts and infographics have endless real-world applications: thumbnails, directions, ads, training, social media graphics, reports, and loads more.
Here’s an example ad from X user Jacob Posel. ChatGPT also wrote all the copy. (Note that the product shot is garbled, but you could replace that with the original image.)
Prompt: Create a madmen-style print ad using this image (with uploaded product shot)
Extremely weird photo editing
Source: Unspash.com
Here’s where things get weird. Let’s say you upload an image, like the one above.
Then you say “remove the black man’s hat.”
Wow, right?
But look a bit closer and compare those images. It’s not the same image. It’s now an AI image, with that same slightly off feeling.
And here’s the weirdest weird part: that black guy is a whole different dude now. He looks similar to the original person, but it’s definitely not him. This applies to everybody in the image.
And what’s this guy’s problem?
This guy is unimpressed.
When you upload an image for editing in 4o, it recreates the whole image and AI-ifies it. The effect is glaring with faces because we humans are extremely attuned to faces. With images of animals, objects, and natural scenes, the effect is less obvious. So where you can use this feature is limited, but it’s powerful nonetheless.
Mindblowing but still glitch-galore
Let me be frank: 4o image gen is plenty glitchy. Every session I spend with it is littered with hopelessly messed-up images. For example, I’ve tried to create “spaghetti being thrown at a wall” with every image generator. 4o’s result is the best I’ve gotten – and it took some back-and-forth just to get that.
Prompt 1: a 30 year old man throws a fistful of cooked pasta at a blue wall
Prompt 2: make his hand open, like he is throwing. make him less angry. he's just focused on throwing.
Prompt 3: make his arm blurry, like the shutter speed is too slow for the fast motion of his arm
If you’re accustomed to the reliability of traditional graphics apps, you’ll want to lower your expectations a bit here. (4o image generation is also way more error-prone than ChatGPT’s text generation.)
But 4o can do real work right now. If you already have visual skills, it can do important support work. And if you’re not skilled with visuals, 4o can create the practical graphics that we all need on occasion. For example, if you have to to create a little promo graphic for a social media post, 4o will almost certainly do that job a lot better than you would.
I’m placing a bet on image generation
I’m so impressed with ChatGPT 4o image generation that I’m going to place a serious bet that this technology now matters. More about that shortly.