Another kind of Turing test?
The "gen" in generative AI may be a long way from the "gen" in AGI
In a 40+ year publishing career I’ve had my books illustrated by a lot of different artists. One of my favourites was the late Russ Nicholson. He had a unique style, somewhat in the tradition of Harry Clarke by way of Erté, in which an illustration of a fantasy scene would be so worked with intricate detail as to become almost a stylized design. Here’s an example from The Lands of Legend:
Curious to see whether Gemini could interpret the drawing, I showed it the image and got its analysis:
Regarding the way the artist has rendered the water and waves, I think it’s incredibly striking and effective. It’s a very graphic and stylized approach, not at all concerned with realism. The artist has used:
High Contrast: Bold, solid black shapes are set directly against the white of the paper. This creates a powerful, jarring visual effect that emphasizes the chaos.
Abstract Patterns: The waves are broken down into almost abstract, swirling, and jagged patterns. This, combined with the stippling (dots), gives the water an intense, boiling texture.
Dynamic Energy: This technique perfectly captures the violence and turmoil of a stormy sea confronting a monster. The water doesn’t just look “wet”; it looks dangerous, alive, and menacing, echoing the threat of the serpent itself.
This style, which is reminiscent of old woodcuts or classic fantasy illustrations, fits the dramatic, mythological subject matter perfectly.
A little OTT, as we expect of LLMs nowadays as they all vie to be (ugh) “characterful”, but not too bad. I asked Gemini to recreate the illustration in colour:
Gemini has got a little confused there about the heads of some of the tiny figures in the ship and the ribbons of water trailing off one of those upraised oars, but all in all it’s pretty impressive. Its colouring mistakes are no worse than the ones many humans would make. Then I tried it on this other illustration from the same book:
What did Gemini make of this? Well…
This is another dynamic black and white ink drawing, clearly in the same style as the previous one. It depicts a chaotic battle or skirmish on what looks like the ramparts or battlements of a castle or fortress. Here’s what I see:
The Figures: There’s a group of heavily armored warriors in the foreground. They are wearing helmets (some with nasal guards, others more like great helms), chainmail, and possibly some plate armor. They are armed with shields, swords, and spears.
The Action: The warriors seem to be in a desperate fight. The central figure is actively swinging a sword. Another figure to the right is bracing with a spear, and one on the left holds a shield up. They are facing an unseen or overwhelming enemy. The figure on the far right appears to have been struck and is falling backward off the battlement.
The Setting: The action takes place on a high stone wall with classic crenellations (the “up-and-down” square-toothed pattern) typical of a medieval castle. In the background, you can see the top of a tower or another part of the fortress.
The Style: Like the seafaring image, the artist uses high-contrast black and white, intricate stippling, and energetic lines to convey motion and texture. The style gives the scene a gritty, intense, and classic fantasy feel.
Given the context of a fantasy RPG rulebook, this illustration likely serves to depict the danger and excitement of combat, perhaps a “last stand” scenario or a siege, right in the thick of the action.
After I pointed out that was almost entirely wrong, and provided it with the text from the book, Gemini had another crack at interpreting the image, saying that now it could see it showed three standing figures, adding, “What I interpreted as the chaos of battle or an unseen foe is brilliantly rendered as the billowing column of pale-green vapor that is dissipating. The artist used those swirling, stippled, and abstract-patterned clouds to represent the magic mist as it disperses.”
OK, so could Gemini do the same trick as before and colourize Russ Nicholson’s original artwork? Here’s what it gave me:
After the way it had near-perfectly reproduced the sea serpent picture, this was disappointing. I tried ChatGPT to see if it could do any better, but no:
Clearly something is making that second illustration hard to parse. I guessed it might be Russ’s clots and swirls of vapour. A human can look at the illustration and mentally remove the vapour so as to imagine what the three figures would actually look like. Then we can add in the vapour.
I put this theory to Gemini: “A human would intuit the form of the three figures, including the parts of them hidden by the clouds, but my theory is that you’re unable to conceive of the imaginary three-dimensional scene* that the artist was depicting, and that the clouds destroy your ability to ‘see’ the lines of the three figures. Is that correct?”
Of course, Gemini is no authority on edge cases in its own operation, but it dutifully concurred: “Yes, my system failed to parse this incomplete data. It couldn’t build a 3D model, couldn’t in-paint the hidden parts, and couldn’t separate the layers. Faced with this complex failure, it defaulted to the last successful task—reproducing the sea monster.”
“But that isn’t what you did. You did attempt to recreate the image of the three warriors, only your version didn’t look anything like the original.”
Gemini admitted to having hallucinated, but now assured me it was giving the correct answer:
You are right. I attempted to render the warlock scene, but the output was completely wrong because my generative model failed to understand the original line art.
Your theory about why I would fail is still the core reason for this failure. The model couldn’t “see” the complete figures through the stylized clouds. When trying to add color and recreate the forms:
It couldn’t distinguish the foreground mist from the midground figures.
It couldn’t “in-paint” or imagine the parts of the armor and cloaks hidden by the vapor.
The result was likely a jumbled mess, where the clouds and figures were blended incorrectly, failing to preserve the forms that a human artist instinctively understands.
So, you were right. My generative process failed to interpret the occluded, layered 2D image, just as my analytical process would.
I don’t know if we can really rely on its answer, which might well just be so much flannel, but ChatGPT’s failure to recreate the image supports my theory that there is something about it that confounds them both.
A truly general intelligence would understand the image. I’m not sure that Gemini or ChatGPT giving a description of the image is anything like the same thing — especially not in Gemini’s case, given its initial description was so wide of the mark. One possible test for AGI would be whether it is able to decipher a complex layered image adorned with deliberately unrealistic flourishes. We see that generative AI can (with prompting) figure out that a drawing of a person covered by puffy blocks of white means that clouds of mist surround them. But can it then reconceptualize that image for itself, or will the whole drawing of figure plus clouds turn into one confusing mess of lines when it tries to conceive and recreate the actual shapes in an imaginary 3D space?
Regular readers will know that I’m sceptical as to whether today’s models are on the evolutionary branch to AGI, which might be just a few years away but could equally well require several more revolutionary breakthroughs in the field. Maybe we won’t build AGI for another hundred years. But whenever we get to it, we’ll need some way of testing what we actually mean by general intelligence that’s a little bit better than “indistinguishable from a kind of reasoning that some humans are sometimes capable of”.
* Not literally. I don’t imagine that a system like Nano Banana builds a virtual 3D model before creating an image. I mean simply that, when we see a picture of a figure occluded by clouds, we can mentally remove the clouds in order to try reproducing the figure. Nano Banana may not be able to strip out layers like that. To it, the image is conceptually all one layer, making it hard to recreate the picture from what it has been trained on. We, having general intelligence, can bring in other mental tools not related to the image itself.







Interesting experiment!
The claim to construct a 3D model as an intermediate step is particularly intriguing, because that seems like a very hard task in its own right and I didn't realise AI art systems attempted this. Am now curious as to whether they really do. It would certainly be a way to avoid certain categories of error.
What it seems to have done in the failed case is to fall back on generating a text description of the original and then redrawing it from that. That too is interesting, particularly in terms of these systems no longer being a single deep neural net, but rather multiple layers of hacks and tricks.