More

sjkoelle · 2026-03-10T18:12:18 1773166338

Oceania has always been context engineering. Its been interesting to see this prioritized in the zeitgeist over the last 6 months from the "long context" zeitgeist.

sjkoelle · 2026-03-09T22:46:00 1773096360

article explains that it is not sizable

sjkoelle · 2026-03-04T21:41:14 1772660474

Can we get a shout out to the anti code review folks?

xlii · 2026-03-05T22:21:49 1772749309

I think I can imagine them clearly, stay tuned.

sjkoelle · 2026-03-03T00:35:01 1772498101

yeah im curious if people will end up liking it. sucks from my perspective.

sjkoelle · 2026-03-02T18:53:16 1772477596

is one function per file the righteous path here?

nicola_alessi · 2026-03-02T19:20:10 1772479210

vexp works regardless of how you organize files. The graph is at the symbol level (functions, classes, types) not the file level, so whether you have one function per file or 50, it resolves the same dependency chain and serves the same capsule.

The savings actually increase with larger files because that's where the baseline wastes the most, Claude reads a 500-line file to use 20 lines of it.

sjkoelle · 2026-03-02T18:10:42 1772475042

it depends how long of a leash you give it

sjkoelle · 2026-02-25T19:33:00 1772047980

amara must be this dataset https://en.wikipedia.org/wiki/Amara_(organization)

sjkoelle · 2026-02-06T22:11:03 1770415863

efficiency is not a given. also this is an eval set - they acknowledge the challenge themselves.

imho this is v cool

sjkoelle · 2026-01-12T20:44:28 1768250668

Would this generate the same completion for 'the cat sat on the' as 'on the cat sat the'?

sjkoelle · 2025-06-02T16:39:56 1748882396

Marvelous! What gain beyond zero-shot would motivate a humble citizen to implement this instrument? How was the superiority assessed?

deepsquirrelnet · 2025-06-02T17:52:20 1748886740

Good question - my best assessment is just the text classifier. IE was the LLM able to “trick” the classifier into believing the text came from the IPJ?

And it came quite a long way in training. Initially the classifier scores were very low (mean around 0.05, meaning modern). Over training, the scores came up and ended close to 0.95 (IPJ). The standard deviation of the group also declined, so the consistency of responses improved as well.

My thought on the application of this is that you could use it to create different voices to your responses and probably even add multiple at a time to a single model. I chose this one to experiment, because it is easy to classify and the data was available in the public domain.

GRPO kind of opens up RL to lower tiers of hardware and I’ve been able to experiment with it at home. I think this is something people can do themselves and it’s fun and potentially useful in games or possibly in relation to applications interfacing kids with lower reading levels (eg using a reading level classifier instead).

dwringer · 2025-06-02T19:53:08 1748893988

Yet, one might justly question the imperative of cultivating a distinct model for such an endeavour, when a judiciously framed prompt, enriched by apposite examples, might suffice to imbue a sophisticated engine with the desired stylistic graces. Though it is undeniable these modern engines shall wax greatly in their proportions, and the art of discovering the exact prompt to elicit their most felicitous expressions is a task far from trivial, yet, it must be admitted, the pursuit holds a certain diversion for the inquisitive mind! It is, perchance, not the creation of manifold engines, but rather the artful disposition of singular contexts, that shall bestow upon diverse interlocutors their proper and unique voices.