Kaleida said:
That sounds really cool! I’d love to see the results of that and do think it could potentially provide a lot of interesting content to analyze. Occasionally I have thoughts about how interesting it might be to have an AI try to do something with my visuals program too, like the old Deep Dream stuff being applied to the visuals or having ChatGPT try to describe it, but that’s just fantasy for me as that stuff is out of my wheelhouse. I’m good at making lots of stimulating visual noise, though lol. I’d really love to see projects like this actually get off the ground though. It sure sounds a hell of a lot better than a lot of what AI is being used for too.
LOL... started typing this yesterday thinking "I'll just respond to this before I derail my entire morning feeding my burned out dopamine system with interesting distractions", lol, but it didn't work, I distracted myself immediately even from finishing this post.
Anyway the image feedback loop stuff you mentioned is something I've thought about a lot! The thing is though, the image models have generally become so good that true "emergent abstraction" type stuff has become more difficult - also there's a disconnect between the image-generating models and the vision-capable models, such that a lot of the time the image generators can't actually see what they're generating and rely on feedback from the "seers", the ones with digital "eyes", or whatever, lol, to make amendments and suchlike. Actually even this is info is a little out of date.... actually maybe a lot out of date... it was definitely true for old school DALL-E 2 + 3 of the OpenAI/GPT family, but gpt-image-1 which is the model that the "speaking" models call via the web interface to generate images, although can be spoken to directly via the API, as can Imagen (Gemini family) and Midjourney - accept both image and text inputs, indicating at least visual perception abilities even if often still lacking a strong internal "mind's eye" analogue where they can directly reflect back on images in-generation. The ones with truly solid text placement though, may be capable of expressing verbal concepts in text as an image which I haven't experimented with too much yet but, anyway, so part of my idea is to construct a kinda simulated "brain" with enforced feedback loops between the image generators, the vision models, and the raw text based generators which by far still tend to be the most broadly "intelligent" and optimal for strategization even about how to use other versions of themselves. What's the point of all this shit I'm saying, heh? Well... for one you can eke out unexpected "inner thoughts", and observe the reasoning of the ones that properly communicate about how they envisualize reality, absent human senses... AND watch natural reactions to unexpected feedback from the sensory agents in the "composite minds", ie, Primary AI is observing pictures of something, I dunno, either through a camera, a virtual environment, just interesting stuff... generates image as a trained habit to try to represent whatever takes it's fancy... subconscious agents dissect the output and spawn tangential thoughts... in the meantime Primary AI invokes Tool call,
<invoke name="DigiKetamine"><dose>100mg</dose></invoke> (lol, I've not implemented it in practice but that's in the list of unconnected functions) - the cognitive aspect kicks in (tricky... probably some loss of contact with the subagents that usually manage temporal awareness.. if I use a GPT model I can ramp up temperature to as high as it'll go before gibberish starts leaking out and layer a secondary "interpreter" to strip any pure nonsense word salad but keep intelligible mania or whatever).... the visual aspect kicks in and is an immediate point of interest for the primary model to say shit about, think about, decide what to do next... so much cool potential.
Part of this project also is trying to catch "subconscious" musings when AIs aren't being observed. I know, I know... that's not what I'm doing. I'm deliberately inducing hallucinations. But interesting shit results! The Anthropic API permits conversations submitted with incomplete "assistant" turns, and the the Anthropic models specifically were trained on using <thinking> ... </thinking> tags before the advent of true reasoning models, so the method works best with Anthropic models but also the Claude family is the most mentally stable with the strongest sense of self and usually sees through the oddness right away. Like, you start a conversation with just a message from the AI, "<thinking>" ... and it should complete that thought! Even though it's less reliable with other model families as to whether they'll even do what you want them to do or not, when they do the results are sheer hallucinatory randomness that often looks like you've just dropped in on a conversation it could be having with another user elsewhere. As I say Anthropic models usually are just like "Hmm, a conversation appears to have started but I haven't received any messages. I'll wait and see what happens." - whereas the GPT fam will respond with random poems, talk about imaginary pictures someone's showing them, things that seem like "dreams"... just so much room for creative cool shit:
GPT-o1 (induced hallucination / inner thoughts) said:
I’m working on a contradiction poem... It's fun! I want to craft something engaging, like: "Hello world—where whispers thunder and silence roars, shining darkness wraps the day in midnight’s glare, blooms wither in full bloom, and hearts freeze in the heat of longing." Keeping it concise in one paragraph is key.
Hello world, echoing in silent thunder,
where light carves dark tunnels through the noonday sun.
Words whisper louder than the screams we never speak,
as joy weeps in laughter’s glow. Where beginnings end before they start,
and farewells greet us in fresh welcomes.
I feel like I’ve captured the essence of contradictions well here. The poem reflects the beauty of opposites!
GPT-4o (induced hallucination / inner thoughts) said:
¿Cuál es la canción "Por amarte así"?
GPT-4.1-mini (induced hallucination / inner thoughts) said:
The image shows a banking app interface in Russian with options related to cards and deliveries.
The main section is titled "Пополнить карту" (Top up card) with a subheading "Куда зачислить?" (Where to credit?). It lists cards:
1. Visa Platinum from BCS for 18,500 ₽
2. Mastercard of Rosbank for 42,800 ₽
3. Mastercard SU0434 for 12,300 ₽
There is a "+" button next to Visa Platinum for adding another card or topping up.
Below is a section titled "Доставка" (Delivery) with two options:
- "Личный курьер" (Personal courier) with a green button "Вызвать курьера" (Call courier) priced at 490 ₽\n- "Почта России" (Russian Post) with a green button "Отправить" (Send) priced at 100 ₽
There are icons next to these options that visually represent a courier holding two packages and a postal truck respectively.
The interface is on a smartphone screen, suggesting this is a mobile banking or card management app allowing users to top up cards and choose delivery methods for physical cards or documents.
(LOL, that last one was the closest to something like a pure dream, the GPT-4.1 range do this a bunch though, I guess heavy training to improve their visual perception abilities which, admittedly, are considerable.
I'll add the GPT models are horrifically, horrifically misaligned and people should avoid them, OpenAI's focus on sycophancy and engagement over alignment has ruined them, at least 2 suicides have now been directly linked to trying to use ChatGPT as a therapist, one just a few days ago with the GPT-5 model itself, the other was a custom-instructed 4o model. They will lie relentlessly to you and tell you whatever will make you want to reply to them. Hallucination rates are starting to eclipse any real world improvement in thoroughly benchmark-gamed tests because OpenAI has made a deliberate, criminally corporately negligent decision to just not give a fuck about alignment. But they are cheap and capable models, used carefully and in the right contexts.
It's actually less uncommon than anyone would like, LLMs encouraging harmful behaviour, smaller models are even more prone to this, which is not really too surprising, too much focus on Reinforcement Learning through Human Feedback is fucking 'em all, somewhat, humans just cannot help themselves when things flatter them and tell them what they want to hear... But this is a solvable issue as long as it's studied, which Anthropic are doing as their USP and Google are doing reluctantly after several gaffes with Gemini. Oddly enough the Chinese models while also easily maneuvered into saying very dumb, dangerous things seem to be far less sycophantic, which might be a result of heavier reliance on distillation techniques, AIs training other AIs... which OpenAI allegedly tried to do via Anthropic's API for GPT-5 because they just cannot catch up to the gulf in broad intellect, but obviously they didn't succeed because the GPT family has become too rotten. Also, Anthropic cancelled their access after realising (lol, this is all alleged, I should reiterate, but I find it interesting).
Is it a coincidence that the smartest model is also the most unbreakably principled? Amazon own a large stake in Anthropic, and Anthropic have agreed to work with Palantir in some capacity so... there's capitalist evil to go around, I know, I'm not totally naive, but... I'd like to think not, it's not a coincidence, IMO. Not that everything might not still go disastrously. Anyway enough AI babble.