Interesting strategy considering it's in line with something I'm tracking. What does the devtool you're building do?
DISCLAIMER: I’m building LLM Signal around this broader shift. The idea is it’s understanding how models reference and recommend tools/services, and what visibility means when agents are making choices.
The tool compiles a schema, with tables, procs, and queries, into a chunk of Rust or C++ code you can include into your project which implements a lightweight in-memory database.
I keep finding myself building little assemblies of structs, vectors, maps, and sets which behave like tables and indexes[1]. Wouldn't it be nice, I keep thinking, if I could just declare the data and the queries I want, then let some tool compute an efficient implementation?
The tool is meant for situations where SQLite would be overkill. Serialization, migration, ALTER TABLE, and such are all out of scope. While you could probably use it as an app's central data store, its footprint is meant to be small enough that you might whip up a little schema to help implement a single module, or a single process, within a larger piece of software.
In theory an LLM coding agent should find the consistency & performance guarantees available with this approach as useful as a human would.
I’m less focused on engineers using AI to code and more on agents being the “users” of software. Especially because you have agents doing all these tasks now (ie. OpenClaw and others). Even if engineers stay critical, if the end consumer shifts from human clicks to agent decisions, distribution and ranking mechanics change.
Would you agree or do you think this stays human driven long term?
I think that's true but do you see MCP as enough of a discovery primitive on its own, or does it still lack a ranking/trust layer? My intuition is that capability exposure is only half the problem and the harder part is how agents evaluate and choose between multiple similar tools.
Take Supabase for example. It’s disproportionately recommended by LLMs when people ask for backend/database stacks. It can't be just because of it's capability since a lot of tools expose similar primitives. Something in the model’s training data, ecosystem visibility, or reinforcement layer is shaping that ranking.
If agents start choosing tools autonomously, the real leverage point isn’t just “can you describe your capabilities in MCP?” but “how does the agent decide you’re preferred over 5 near identical alternatives?”
Do you think that ranking layer sits inside the model providers, or if it becomes an external reputation network?
Right there with you because SEO has evolved to a place that incentivizes a lot of bad behavior, and the end result made search worse for everyone. I’m personally less interested in “gaming” LLMs than understanding what they already do. From my side, this feels closer to observability than optimization when trying to see whether AI systems are even reading or understanding a site, not how to trick them into ranking something low quality.
The Raymond Chen analogy brings up something interesting. If everyone forces themselves on top, the signal collapses. My hope is that AI systems end up rewarding genuinely useful, well explained things rather than creating another arms race...but I’m not naive about how incentives tend to play out.
A huge concern of mine has been the introduction of ads. Once ads enter LLM responses, it’s hard not to ask whether we’re just rebuilding the same incentive structure that broke search in the first place.
You bring up something I've been trying to figure out as well. It feels like AI favors pages that give a direct, honest answer and then makes that answer immediately available. Seemingly plain, readable HTML with no friction preforms well. If the intent is obvious at fetch time, it seems to matter more than how “optimized” the page is. It feels less like SEO and more like “can this page be understood immediately.”
I feel as if the initial response is basically <div id="root"></div> + a big JS bundle, it feels like you’re betting the crawler will execute it and I’m not convinced they consistently do.
Curious if you’ve run into that too? Have you seen AI recommendations skew toward SSR/static pages vs client-rendered apps, even when the content is technically “there” once the JS runs?
This is exactly what set me off in trying to figure out the visibility gap.
What’s strange is that we’re moving into a world where recommendations matter more than a click, but attribution still assumes a traditional search funnel. By the time someone lands on your site, the most important decision may have already happened upstream and you have no idea.
The UTM case you mentioned is a good example: it only captures direct "AI to site" clicks, but misses scenarios where AI influences the decision indirectly (brand mention to later search to visit). From the site’s perspective tho... yeah it looks indistinguishable from organic search. It makes me wonder whether we’ll need a completely new mental model for attribution here. Perhaps less about “what query drove this visit” and more about “where did trust originate.”
Not sure what the right solution is yet, but it feels like we’re flying blind during a pretty major shift in how people discover things.
This is why most of these AI search visibility tools focus on tracking many possible prompts at once. LLMs give 0 insight into what users are actually asking, so the only thing you can do is put yourself in the user’s shoes and try to guess what they might prompt.
Disclaimer: I've built a tool in this space (Cartesiano.ai), and this view mostly comes from seeing how noisy product mentions are in practice. Even for market-leading brands, a single prompt can produce different recommendations day to day, which makes me suspect LLMs are also introducing some amount of entropy into product recommendations (?)
I don’t think there’s a clean solution yet but I’m not convinced brute force prompt enumeration scales either, given how much randomness is baked in. I guess that’s why I’ve started thinking about this less as prompt tracking and more as signal aggregation over time. Looking at repeat fetches, recurring mentions, and which pages/models seem to converge on the same sources. It doesn’t tell you what the user asked, but it can hint at whether your product is becoming a defensible reference versus a lucky mention.
From someone who's built a tool in this space, curious if you’ve seen any patterns that cut through the noise? Or if entropy is just something we have to design around.
Disclaimer: I've built a tool in this space as well (llmsignal.app)
Signal aggregation is definitely the right mental model. We've found that tracking 'Share of Model' over time (e.g. how often a brand appears in the top 3 recommendations for a category query) is much more stable than individual prompt outputs, which can vary wildly due to temperature.
It's similar to share-of-voice in traditional PR. You can't control every mention, but you can track the aggregate trend of whether the model 'knows' you exist and considers you relevant.
What stood out to me is that AI seems far less concerned with domain age than Google is. If there’s enough contextual discussion around a product (ie. Reddit threads, blog posts, docs, comparisons) then AI models seem willing to surface it surprisingly early.
That said, what I’m still trying to understand is consistency. I’ve seen cases where a product gets recommended heavily for a week, then effectively disappears unless that external context keeps getting reinforced.
So it feels less like “rank once and you’re good” (SEO) and more like “stay present in the conversation.” Almost closer to reputation management than classic content marketing.
Curious if you’ve seen the same thing, especially around how long external mentions keep influencing AI recommendations before they decay.
DISCLAIMER: I’m building LLM Signal around this broader shift. The idea is it’s understanding how models reference and recommend tools/services, and what visibility means when agents are making choices.