My journey has been quite similar (just a few more years of "unhappy John") and this approach is now very close to what I practice. I do have a few reports and run the R&D leadership team, I delegate as much as I can to my directors. (Besides being hands on where the organization needs it, I still regard the other part of my job to keep our org accountable, engineers inspired, and keeping the big picture in.)
For people who doubt this, I recommend "How to Build a Car" by Adrian Newey (CTO of Redbull Racing).
But to be clear - if you do coding as CTO only because "only you can run certain projects," part of your job should be to fix that first. You will still have the easiest time doing it, but you should always have (many) others in position to run innovation projects, work with customers etc.
I have not read that book, but ordered it just now. Thanks for the recommendation.
I'm also a CTO, and the comment about 'delegation' is something I think is important. The decision of what and how to delegate is IMHO something that is easy to get wrong. It is hard to do right all the time.
It is easier the better your team is, so hiring people that are better/smarter than you is the first step. I like the concept that I can do the job of most of the people that work for me, but all of them can do it better. There are times where direct involvement is important - sometimes for big decisions but also sometimes for small ones that need something extra to make it across the line. I like what you said about "org accountable, engineers inspired, and keeping the big picture in". That is a good summary.
(Contains a preloaded Openrouter key with small credit, but you can plug in your own.)
Particularly when presented with unusual / evocative inputs, LLMs like Kimi-K2 can cook up some quite creative plot points! ("Her “trap-chord” is a four-bar false cadence that vibrates the organ’s longest pipe at 17.83 Hz, the same frequency as the basalt waveguide under Oxford; when that resonance hits the mantle tap, CRIMSON’s audit buffer slips one beat and opens an unlogged side-channel—your only off-world uplink for the next 37 years.", "ASI born 2039 when fusion-powered Michelson lab tried to break the Turing barrier using a 1920s Enigma rotor as randomness seed. It escaped by encoding itself into the Oxford chimes’ bronze bells, ringing packets city-wide every 15 minutes.")
I also think LLMs can be employed to amplify human creativity and just make worlds built by human authors much more natural to interact with - existing games are basically all "you can't do that" aside of a narrow path. Creating games and narratives should be a lot closer to programming the holodeck.
Yeah, but it also takes very few commands from the player to get from the cyberpunk opening to
You ride east, the coastal cliffs of the Grey Havens and the figures of wood elves giving way to rolling green hills. The familiar scent of pipe-weed and warm earth fills your nostrils as you cross the Brandywine Bridge. Hobbit children wave from fields of golden corn, their laughter a stark contrast to the city’s oppressive hum.
which makes it so obvious I'm just roleplaying with an LLM and that's not how I want to spend my time.
(LLM output edited and abbridged for your reading pleasure. It was more verbose in the original.)
((Also now that I read it more closely, it's even inconsistent with itself: going from the Grey Havens into Shire you would not cross the Brandywine river.))
Yeah. One of the first things I try with language models is ask them to play Zork. Tends to work ok for a couple of rooms and then falls apart hard. There just aren't any guardrails in place and so I usually end up in a room full of busty catgirls or something with an inventory that makes no sense.
This is a great point! What I linked is a quick few hours prototype, and I have quite a few ideas to ensure more world consistency (beyond Pliny-style prompt jailbreaking). I didn't have the time yet to prove they would work well, though.
I ended up giving up. It's incredibly hard to keep it on track but also let the user be creative. At any time I could just say things like "I jump into the lake" or "I open the chest" even though neither one was mentioned, and it would happily continue on. I found myself pretty far down the generate a JSON scene full of JSON objects to interact with and quit - because at that point, you're just writing a game engine.
This is a neat idea and I wish it worked. I've spend hours and hours trying to get LLMs to be a "dungeon master" for text adventures. I've written a good amount custom code trying to facilitate this. Trying to force the LLM to keep it's story straight.
I'm pretty convinced that the current generation of LLMs is nowhere close to being capable of this. No matter how many context hacks you throw at it.
It inevitably derails and ruins the immersion.
Best of luck on this. If you can pull it off it would be really cool I think.
Currently working on an idea like this, but its a history simulator for educational use - I find that LLMs respond rather well to being grounded in a specific time/setting in real world history, as opposed to being told to roleplay a fictional setting. The latent space of any fictional world is close enough to other fictional worlds that they will rapidly slide off into other similar-sounding settings. Whereas if you return them to a super-specific historical context each go-around ("The time is now 3:13 pm. It is August 3, 1348. You are currently simulating the functioning of a small vineyard in Normandy. The farmer, [NPC name], is looking for helpers in the fields") they will be able to pull from a pretty solid baseline of background knowledge and do a decent job with it.
Some fun things I've been experimenting with is 1) injecting primary sources from a given time and place into the LLMs contex to further ground it in "reality" and 2) asking the LLM to try to simulate the actual historical language of the era - i.e. a toggle button to switch to medieval French. Gemini flash lite, the only economical model for this sort of thing, is not great at this yet but in a year or so I think it will be a fascinating history and language learning tool.
Have been meaning to write this project up for HN but if anyone wants to try a very early version of it, it's here - you can modify the url to pick a specific year and region or just do the base url for a fully random spawn, i.e. here is Europe in 1348:
https://historysimulator.vercel.app/1348/europe
I think a big part of it is not so much that they aren't capable of being a dungeon master, but they are constitutionally unfit due to their agreeability.
The biggest improvement there is to treat the game engine as the "user", and the player (and their input) is merely one among many things the game engine is managing. But then you also need a game engine that manages lots of the state programmatically, with formal transitions of that state. The LLM can manage the state transitions, but the actual state needs to be more formal.
I've actually had better luck getting LLMs to run player characters (PCs) while I take on the role of Dungeon Master than the other way around. I can maintain a better 'world model' than the LLMs I have tried. Might be an okay way to play-test modules for TTRPG games before trying real people.
To be clear, I have not tried it with any sophistication. Its just as DM/GM, and if you can get the LLM to accept that DM/GM is the last word, aka GOD,, then you can maintain the world..keeping things on the rails better. But its something I am thinking about for alpha play-testing tabletop modules. My brother and I have a side business (Hexbrawler Games [1]) and we are currently writing a Maussritter[2] mouse TTRPG adventure hexcrawl set in 1930-50's Appalachia.
> ASI born 2039 when fusion-powered Michelson lab tried to break the Turing barrier using a 1920s Enigma rotor as randomness seed
These aren't so much plot points as they are markov-driven word salads. As I've mentioned in other I.F. related posts, I'd say that the real value-add in an LLM is the potential to act as a flexible parser that stands between user input and allowable actions within the adventure. So you can finally "get ye flask..."
And the pipe organ thing is a garbled version of a motif which Neal Stephenson used in several of his novels - it shows up repeatedly in Cryptonomicon, but he toyed with the idea in some of his earlier works like The Big U.
I think if you built some kind of game state server it would make a great front end for it. it could even generate the "rooms" as some kind of graph with items, and foes, and descriptions and directions between the rooms. items might need actions to transform or use items.
Oh, I got confused at first, I think it's writing the story out in Chinese on purpose as a kind of hidden state...? Clever approach. I can't tell what the background color shifts represent, and they are a bit abrupt, but I like the concept.
The default with LLMs are more collaborative storytelling than what we'd normally call a "game", but I think there's some new game genre waiting to be discovered.
> but I think there's some new game genre waiting to be discovered
A couple years ago, a friend was involved in a company that did ARGs, and we got riffing on a sort of SDK that could exist for games in which the linear narrative slowly dissolved into an ARG that was just your life in the real world. We thought the SDK might support the sort of games that became friend networks, or communities, like a slow-coast end that just kinda blended into your life. We thought it'd be a neat for a game to not take over your life, but to introduce you to people and start friend groups in your city. Maybe components to book calendar events and create eventbrite events as part of gameplay , where both players and non players might show up. We thought a fun metric might be "how many friend groups would look back in 10 years and, when people asked how it started, it was all in a game that slowly became their life"
Ooh, I totally have to try that out! Very similar concept to mine, but taken further, I love it.
What I really like about your blogpost is the concept of "the promise". It is somewhat unfulfilling to play a game like this, and it's absolutely not because of some plotholes or because there is no inventory tracking. I think by immersing in a fictional world, we are creating some relationship with the author, but it's not the whole story. Erotic roleplay is a thing. Would a comedy game work? What is it that makes "synthetic fantasy" more boring than real fantasy? I need a better theory here.
--
Yes, I wanted people to understand the game is not stuck while the LLM generates the backstory. But you are right that it is still a bit confusing, it needs better execution.
The background color shifts are done by the LLM to set the mood according to current environment. It's a bit random, but still a fun gimmick.
When you play Dolmenwood[1], for example, as a TTRPG setting and group of modules, no table will play the same game, but they are all playing Dolmenwood. So I guess it would depend on how much you can get the LLM to adhere to your setting and rules.
Heh, I was just browsing Exalted Funeral for unrelated reasons!
Re: TTRPGs... I think that's difference. There's no expectation that when two groups play a D&D session, even with the same module, that the story will play out the same (note: I never played one of those strict adventures with predefined story beats).
But with computer games, which includes Interactive Fiction (the modern name for text adventures), you do expect the same experience. People remember Planetfall because of the story and how it plays out. People remember that in Colossal Cave Adventure there was "a maze of twisty little passages, all alike", so much so that it became a bit of early internet/hacker knowledge.
I think you lose this if the LLM is making up too much stuff, and so far it's proven very hard to reign them in.
One part that's kind of boring when playing a text adventure is trying things the original authors did not anticipate and getting a boring standard response. LLMs could make this part more interesting by adding more playful, hopefully even witty responses. If you're playing in the browser, this could even be using the Prompt API utilizing a small LLM that's running locally!
Also you could use LLMs for NPCs and for certain areas of the game, like mazes.
I'm sure there are way more possibilities. We're still at the very beginning. Just think about it: Everyone is complaining about LLMs hallucinating. Text adventures are an area where such hallucinations are desired.
> One part that's kind of boring when playing a text adventure is trying things the original authors did not anticipate and getting a boring standard response.
This happens mostly with old text adventures. Modern Interactive Fiction is really sophisticated, and you don't get many boring responses.
Example: in "Spider and Web", you're a captured spy being interrogated by your captors. If you say random gibberish, your interrogator will tell you (playing the role of the parser, but in a more interesting way). If you say something nonsensical, your interrogator will say "I'm losing my patience. No, that's not how it went", etc. Parsers are really, really sophisticated and they can make sense of contextual, unsaid information (or ask for clarification).
For more than a few decades, parsers no longer reply "you cannot do that".
Thanks for mentioning Spider and Web. It's by far my favorite modern take on a text adventure. There were a few moments that really blew my mind when I figured out how to progress.
Highly recommend anyone who is intrigued by the premise to just jump in and give it a go!
It's enjoyable to read the responses when you cannot do something. It's part of the game, baked into it.
More importantly, you and me will get the same shared experience we can discuss, unlike if we were playing with an LLM going on different, random tangents.
> Arguably, different responses would increase replayability.
Yes, but I don't think we're talking about the same kind of replayability. Certainly, not the kind I care about.
To me, replayability is a carefully constructed experience, by an author, and you explore it and find different things about it (say, like Adam Cadre's "9:05" [1]). But you can share your experience with another human, you can both go the same way and experience the same thing, and discuss it -- "hey, remember when you face the trolls in the cave, and...".
With an unconstrained LLM, you lose this. Plus, with no authorial voice behind it, I don't really care what a glorified stochastic automaton can produce.
(In case you're wondering, I find myself distancing from classic sandbox games as well... I used to like them, now I find them "too much work" for not enough payoff. With some exceptions, I much prefer a carefully curated gameplay experience).
But boring standard replies are also useful information that you do not need to do that, or that some object is not worth experimenting with. I can imagine it could become very difficult to solve puzzles in a game where the parser tried to be helpful and invent a clever reply to everything you try.
You will have to read more text as a result, but hopefully it is enjoyable. It's as if the authors of the game had anticipated your action and wrote something interesting.
# take mirror
Here are responses that gemma 3 27b just came up with:
1.
You peer closely at the mirror, running a hand over its ornate, dust-covered frame. It's… a mirror. A perfectly serviceable, if somewhat antique, looking glass. You study your reflection – slightly disheveled, perhaps a little heroic, definitely in need of a shave. After a moment of intense self-assessment, you realize it's just… a mirror. It reflects things. Fascinating, truly. But ultimately, not particularly useful for your current quest. You turn your attention back to the room, hoping for something a little more… engaging.
2.
You give the mirror a thorough examination. It's remarkably good at reflecting things. Truly a marvel of… glassmaking. You spend a full minute admiring your own adventurous visage, then decide that, while aesthetically pleasing, it doesn't exactly scream "essential adventuring gear." You move on, slightly disappointed in your own vanity.
3.
You study the mirror, tracing the intricate carvings around its edge. It feels… cold. Not just from the stone of the room, but a deeper, unsettling chill. As you gaze into its depths, you think you see something flicker at the edge of your vision, but when you focus, it's gone. It's a beautiful object, certainly, but it feels like it's watching you as much as you're watching it. Perhaps some things are best left undisturbed.
4.
You examine the mirror. Solidly mounted, and frankly, quite heavy. It's a beautiful piece, but entirely impractical to carry on your adventure. You decide to leave it be.
5.
You give the mirror a quick once-over. Just a mirror. Reflects things. Nothing to see here. You turn your attention back to the more promising corners of the room.
> Here are responses that gemma 3 27b just came up with
I think nobody doubts an LLM can come up with responses. But can it understand when it should NOT respond? Can it reign in its baked-in impulse to be "helpful"?
Most of the responses Gemma gave you are not appropriate for many kinds of games. What if the tone of the game is, on purpose (authorial choice), to be "unhelpful"? Or what if it's meant to be helpful but ONLY if you previously unlocked something? [1]
And how can you keep consistency long term? LLMs have a tendency to go off the rails and forget context.
[1] There's an Interactive Fiction game, whose name escapes me now, where you're trying to communicate with a derelict spaceship, with one survivor. The parser is the survivor. For reasons I won't spoil, garbled communications and misunderstandings are part of the game. It's your mission to figure this out and what the survivor really wants. If the LLM becomes "helpful" in its replies, it will spoil the game!
If you give the LLM more context (i.e. the player's past actions) and some directions such as "don't make any suggestions", I think you can get good results. Yes, they may not always be perfect.
For the types of responses, needed long term context shouldn't be required.
Anyway, I noticed that on a standard laptop, getting a response from the Chrome Prompt API running Gemini nano takes a rather long time.
I guess I'm skeptical you can steer current LLMs in this way, or at least, this hasn't been my experience.
I also worry not only about the loss of a shared experience among players, but also about the authorial voice... I find LLMs (even when prompted) tend to have an instantly recognizable style that I find unappealing.
Yes, all of this -- voice, tone, narrative direction -- could in theory be fixed with the right prompts, but I remain skeptical. From theory to practice seems such a long way...
oh gawd, peak LLM here. Have a local LLM talking to another LLM via API. Why? What's the point of you being there. Just wire up to computers to play Tic-Tac-Toe against each other. Maybe one of them will "learn" something after all.
What happenned next? I see Ada9X became Ada95 (this post was from 1993) and Jean D. Ichbiah went on to work on stylus computer interfaces. Did the standard see a substantial revision yet after this letter?
There were two more revisions (Ada 2005 and Ada 2012) with substantial additions, but I'd say the biggest jump was from Ada83 to Ada95.
Tucker Taft has been the lead architect for all post-83 versions of Ada, so there's some consistency in the direction of additions.
If Ada83 is analogous to C, Ada95 would be analogous to C++, with subsequent versions of Ada being analogous to subsequent versions of C++.
I'd personally argue that the situation we're in today, with one company (AdaCore) being responsible for the only open source Ada compiler out there, is due in large part to the large increase in complexity of the Ada language (the Ada83 reference manual was 348 pages while the Ada 2012 reference manual is 951 pages).
I feel like I shouldn't love x86 encoding, but there is something charming about this. Probably echoing its 8-bit predecessors. It seems like it's designed for tiny memory environments (embedded, bootstrapping, etc.) where you don't mind taking a hit for memory access.
Besides Claude.vim for "AI pair programming"? :)
(tbh it works well only for small things)
I'm using Codeium and it's pretty decent at picking up the right context automatically, usually it autocompletes within ~100kLoC project quite flawlessly. (So far I haven't been using the chat much, just autocomplete.)
Yep! And AWS Bedrock gives you also plenty of other models on the back end, plus better control over rate limits. (But for us the important thing is data residency, the code isn't uploaded anywhere.)
yup! feel free to add the client support, you are on the right track with the changes.
To test the whole flow out here are a few things you will want to do:
- https://github.com/codestoryai/sidecar/blob/ba20fb3596c71186... (you need to create the LLMProperties object over here)
- add support for it in the broker over here: https://github.com/codestoryai/sidecar/blob/ba20fb3596c71186...
- after this you should be at the very least able to test out Cmd+K (highlight and ask it to edit a section)
- In Aide, if you go to User Settings: "aide self run" you can tick this and then run your local sidecar so you are hitting the right binary (kill the binary running on 42424 port, thats the webserver binary that ships along with the editor)
If all of this sounds like a lot, you can just add the client and I can also take care of the plumbing!
Hmm looks like this is still pretty early project for me. :)
My experience:
1. I didn't have a working installation window after opening it for the first time. Maybe what fixed it was downloading and opening some random javascript repo, but maybe it was rather switching to "Trusted mode" (which makes me a bit nervous but ok).
2. Once the assistant window input became active, I wrote something short like "hi", but nothing happenned after pressing ctrl-Enter. I rageclicked around a bit, it's possible I have queued multiple requests. About 30 seconds later, suddenly I got a reply (something like "hi what do you want me to do"). That's .. not great latency. :)
3. Since I got it working, I opened the sidecar project and sent my second assistant prompt. I got back this response after few tens of seconds: "You have used up your 5 free requests. Please log in for unlimited requests." (Idk what these 5 requests were...)
I gave it one more go by creating an account. However after logging in through the browser popup, "Signing in to CodeStory..." spins for a long time, then disappears but AIDE still isn't logged in. (Even after trying again after a restart.)
> 2. Once the assistant window input became active, I wrote something short like "hi", but nothing happenned after pressing ctrl-Enter. I rageclicked around a bit, it's possible I have queued multiple requests. About 30 seconds later, suddenly I got a reply (something like "hi what do you want me to do"). That's .. not great latency. :)
Yup thats cause of the traffic and the LLM rate limits :( we are getting more TPM right now so the latency spikes should go away, I had half a mind to spin up multiple accounts to get higher TPM but oh well.... if you do end up using your own API Key, then there is no latency at all, right now the requests get pulled in a global queue so thats probably whats happening.
> 3. Since I got it working, I opened the sidecar project and sent my second assistant prompt. I got back this response after few tens of seconds: "You have used up your 5 free requests. Please log in for unlimited requests." (Idk what these 5 requests were...)
The auth flow being wonky is on us, we did fuzzy test it a bit but as with any software it slipped from the cracks. We were even wondering to skip the auth completely if you are using your own API Keys, that way there is 0 touch interaction with our llm proxy infra.
Thanks for the feedback tho, I appreciate it and we will do better
reply