Hacker Newsnew | past | comments | ask | show | jobs | submit | 1st1's commentslogin

I don't think this is related in any way.

> Having a CI job that identifies places where the docs have drifted from the implementation seems pretty valuable.

https://docs.python.org/3/library/doctest.html

> To check that a module’s docstrings are up-to-date by verifying that all interactive examples still work as documented. To perform regression testing by verifying that interactive examples from a test file or a test object work as expected. To write tutorial documentation for a package, liberally illustrated with input-output examples. Depending on whether the examples or the expository text are emphasized, this has the flavor of “literate testing” or “executable documentation”.

Seems pretty related to me.


I appreciate doctest very much, but those aren’t the kind of documentation I’m worried about drifting. I’m thinking more on the “how does this communication protocol between this server and this client work?”, which is generally terrible to try to summarize in doctests. If you want to take the idea to the extreme, imagine a CI test that answers “does this server implementation conform to this RFC?”

Not really.

> Having a CI job that identifies places where the docs have drifted from the implementation seems pretty valuable.

Testing with lat isn't about ensuring consistency of code with public API documentation. It is about:

* ensuring you can quickly analyze what tests were added / changed by looking at the English description

* ensuring you spot when an agent randomly drops or alters an important functional/regression tests

The problem with coding agents is that they produce enormous diffs, and while reading tests code is very important in practice your focus and attention drifts and you can't do thorough analysis.

This isn't a new problem though, the same thing applies to classic code reviews -- rarely coding is a bottle neck, it's getting all reviews from humans to vet the change.

Lat shifts the focus from reading test code to understanding the semantics of the test. And because instead of reviewing 2000 lines of code you can focus on reviewing only 100 lines change in lat.md you'll be able to control your tests and implementation more tightly.

For projects where code quality isn't paramount I now just glance over the code to spot anti-pattern and models failing to DRY and resorting to duplicating large swaths of code.


Read this section, I've just updated it to be more clear, hopefully will answer your question:

https://github.com/1st1/lat.md?tab=readme-ov-file#the-idea


> Test specs with enforcement — test cases can be described as sections in lat.md/ and marked with require-code-mention: true. Each spec then must be referenced by a // @lat: comment in test code. lat check flags any spec without a backlink, so you can review and maintain test coverage from the knowledge graph.

if you mean this paragraph - imo that's still too hand-wavy compared to enforcement through generative tests for spec conformance.


can you tell me more about what you mean by generative tests for spec conformance?

tests generated based on normative spec to validate if the implementation accepts the right inputs, rejects the wrong ones, produces correctly shaped outputs and to some extend also validates behavior.

normative specs allow generation of such tests deterministically - no need to spend tokens, no risk of hallucinations, much higher level of confidence that generated code is correct and much more accurate feedback from the test system into the LLM loop.


I'm working on a blog post and on benchmarks. Here [1] Armin suggested I take something like quickjs, built lat base for it, and compare side by side how, say, claude code works with lat vs. without.

I'm very early into this and need to build proper harness, but I can see sometimes lat allowing for up to 2x faster coding sessions. But the main benefit to me isn't speed, it's the fact that I can now review diffs faster and stay more engaged with the agent.

[1] https://x.com/mitsuhiko/status/2037649308086902989?s=20


Very cool, interested to read more once you post! FWIW I've been building eval infras that does something adjacent/related — replaying real repo work against different agent configs, and measuring the agent's quality dimensions (pass/fail, but also human intent alignment, code review, etc.). If you want to compare notes on the harness design, or if having an independent eval of lat vs. no-lat on quickjs would be useful, happy to chat :)

++

> I suspect there's ways to shrink that context even more.

Yeah, I'm experimenting with some ideas on that, like adding `lat agent` command to act as a subagent to search through lat and summarize related knowledge without polluting the parent's agent context.


I think you can have your workflow with lat and it might make it even nicer. Would love feedback from you.

I think it's a great idea and I'm considering building this in lat too. Code embedding models can definitely speed up grepping further, but they still wouldn't help much when you have a business logic detail encoded across multiple complex files. With lat you'd have it documented in a paragraph of text + a few [[..]] links into your code.

Because lat gives agents more tools and enforces the workflow.

Unlike obsidian, lat allows markdown files link into functions/structs/classes/etc too.

This saves agents time on grepping but also allows you to build better workflows with tests.

Test cases can be described as sections in `lat.md/` and marked with `require-code-mention: true`. Each spec then must be referenced by a `// @lat:` comment in test code. `lat check` flags any spec without a backlink, so you can review and maintain test coverage from the knowledge graph.


This is interesting. I've been using a system-wide Obsidian vault that all my agents use for stuff that's platform-specific instead of project specific (think common Android/Samsung-related or ANR fixes). So far, it hasn't been mind-blowing but it's only been a month so its knowledge from there is also limited.

lat seems like it could be useful to cross-reference company-wide projects.


potentially. I'm toying with the idea of distributed lat ;)

I'm working on making lat hierarchical, e.g.

  - lat.md           # high-level description of the project
  - frontend/lat.md  # frontend-related knowledge
  - backend/lat.md   # details about your backend

It’s a start but it wouldn’t solve my uses cases. I developed my own skill for this. Called morning-routine that does a recursive `ls -R` on all the Claude markdowns. It’s staged in multiple stages so I don’t have to waste as much context if I don’t need to.

Keep going, prompt engineering is fascinating.


Fixed


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: