Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The non-deterministic nature is simultaneously the worst and best thing.

This is what always has me asking, "How can you trust it?" with usage of LLMs for some pretty complex tasks, so I gotta know, what kinds of tricks have you employed to identify hallucinations from good output? How do you separate valid output (valid meaning it works but isn't necessarily what's desired) from desired output?

Additionally, how do you identify the most performant ways of doing things? Have you found that you've recreated portions of websites more efficiently than the real sites you're mimicking?



The nice thing about building a design tool is correct output is defined as “did it render.” We do things like strip out hallucinated imports and parse the ast for errors to help with that.

“Did it output what I desired?” - we have the classic AI app stuff like thumbs up/down, and many features we’ve built are an attempt to make that answer be a solid yes. For example, we have a “referencing” feature (if you are within a “project” - our version of an infinite canvas) that lets you reference existing designs when doing a prompt for a brand new design. This helps the LLM keep the output consistent.

Regarding performance: you can download the raw HTML so I suppose that is more performant than a version that loads all the JS, hah. But our product focus is more on the design rather than the generated code, as we’ve seen with our customers that engineers will almost always touch it up.

To answer your last question: when users are “finished” and it’s deployed (and they share it with me), people don’t really mimic portions of websites. By they time they're done, it’s fully their own, they’ve iterated on it so much. What the imported from the extension ended up being just a base to get the LLM in the right initial context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: