More

AnimalMuppet · 2026-03-31T18:47:06 1774982826

Well, as a general rule, I don't do business with people who lie to me.

You've got a business, and you sent me junk mail, but you made it look like some official government thing to get me to open it? I'm done, just because you lied on the envelope. I don't care how badly I need your service. There's a dozen other places that can provide it; I'll pick one of them rather than you, because you've shown yourself to be dishonest right out of the gate.

Same thing with an AI (or a business that creates an AI). You're willing to lie about who you are (or have your tool do so)? What else are you willing to lie to me about? I don't have time in my life for that. I'm out right here.

otterley · 2026-03-31T19:17:10 1774984630

Out of curiosity, given two code submissions that are completely identical—one written solely by a human and one assisted by AI—why should its provenance make any difference to you? Is it like fine art, where it’s important that Picasso’s hand drew it? Or is it like an instruction manual, where the author is unimportant?

Similarly, would you consider it to be dishonest if my human colleague reviewed and made changes to my code, but I didn’t explicitly credit them?

AnimalMuppet · 2026-03-31T19:27:04 1774985224

Why does the provenance make any difference? Let me increase your options. Option 1: You completely hand-wrote it. Option 2: You were assisted by an AI, but you carefully reviewed it. Option 3: You were assisted by an AI (or the AI wrote the whole thing), and you just said, "looks good, YOLO".

Even if the code is line-for-line identical, the difference is in how much trust I am willing to give the code. If I have to work in the neighborhood of that code, I need to know what degree of skepticism I should be viewing it with.

otterley · 2026-03-31T19:35:20 1774985720

That's the thing. As someone evaluating pull requests, should you trust the code based on its provenance, or should you trust it based on its content? Automated testing can validate code, but it can't validate people.

ISTM the most efficient and objective solution is to invest in AI more on both sides of the fence.

AnimalMuppet · 2026-03-31T20:52:46 1774990366

In the future, that may be fine. We're not in that future yet. We're still at a place where I don't fully trust AI-only code to be as solid as code that is at least thoroughly reviewed by a knowledgeable human.

(Yes, I put "AI-only" and "knowledgeable" in there as weasel words. But I think that with them, it is not currently a very controversial case.)

feature20260213 · 2026-03-31T19:55:24 1774986924

Yes because you can be sued for copyright violation if you don't know the origin of one, and not the other.

otterley · 2026-03-31T20:17:40 1774988260

As an attorney, I know copyright law. (This is not legal advice.) There's nothing about copyright law that says you have to credit an AI coding agent for contributing to your work. The person receiving the code has to perform their due diligence in any case to determine whether the author owns it or has permission from the owner to contribute it.

hajile · 2026-03-31T21:08:35 1774991315

Can you back this up with legal precedence? To my knowledge, nothing of the sort has been ruled by the courts.

Additionally, this raises another big issue. A few years ago, a couple guys used software (what you could argue was a primitive AI) to generated around 70 billion unique pieces of music which amounts to essentially every piece of copyrightable music using standard music scales.

Is the fact that they used software to develop this copyrighted material relevant? If not, then their copyright should certainly be legal and every new song should pay them royalties.

It seems that using a computer to generate results MUST be added as an additional bit of analysis when it comes to infringement cases and fair use if not a more fundamental acknowledgement that computer-generated content falls under a different category (I'd imagine the real argument would be over how much of the input was human vs how much was the system).

Of course, this all sets aside the training of AI using copyrighted works. As it turns out, AI can regurgitate verbatim large sections of copyrighted works (up to 80% according to this study[0]) showing that they are in point of fact outright infringing on those copyrights. Do we blow up current AI to maintain the illusion of copyright or blow up current copyright law to preserve AI?

[0] https://arxiv.org/pdf/2603.20957

otterley · 2026-03-31T21:24:24 1774992264

You're asking a lot of very good and thoughtful questions, but none are directly related to the immediate issue, which is "do I have to credit the AI model?".

To begin to answer your questions, I would suggest you study the Copyright Office's report (which is also not law, but their guidance for laypeople as written by their staff lawyers) at https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

simianwords · 2026-03-31T18:49:24 1774982964

What’s the lie? It’s just asking to not reveal internal names

BoredPositron · 2026-03-31T19:29:46 1774985386

You are spamming the whole fucking thread with the same nonsense. It is instructed to hide that the PR was made via Claude Code. I don't know why people who are so AI forward like yourself have such a problem with telling people that they use AI for coding/writing, it's a weirdly insecure look.

simianwords · 2026-03-31T19:33:43 1774985623

I can do that right now with Claude Code without this undercover mode.. In fact I do it many times at work. What's the big deal in this?

Do you not think it is an overreaction to panic like this if I can do exactly what the undercover mode does by simply asking Claude?

BoredPositron · 2026-03-31T19:51:55 1774986715

It's different if it's an institutional decision or a personal like in your case. Which is and I am repeating myself here borderline insecure.

simianwords · 2026-03-31T19:54:23 1774986863

what's insecure about it? if it is up to the institution to make that decision - you can still do it. Claude is not stopping you from making that decision

BoredPositron · 2026-03-31T20:00:42 1774987242

You have to work on your reading comprehension or you are intentional deceptive. Bye.

simianwords · 2026-03-31T21:25:39 1774992339

?? why doesn't your panic apply to other agents like Codex that don't advertise that the commit was made by an AI by default? strange!

BoredPositron · 2026-03-31T22:14:47 1774995287

Because this thread is about claude. Are you that challenged?

anonymoushn · 2026-03-27T12:44:30 1774615470

are those tools known for their fast json parsers?

rennokki · 2026-04-01T09:07:18 1775034438

If we talk about TB or PB+ scales, then yes.

anonymoushn · 2026-04-01T15:18:39 1775056719

Oh, can you post some benchmarks? I didn't know that parser throughput per core would change with the amount of data like that.

anonymoushn · 2026-02-17T19:54:39 1771358079

ideally users could be banned for posting LLM outputs as if they were authored by humans https://www.pangram.com/history/49335ddf-118d-43e4-9340-a58a...

irickt · 2026-02-17T20:41:14 1771360874

I think not "ideally" in any case. Rather "practically" could be banned, for what badness?

It doesn't claim it was authored by humans. It is clearly the work product of human who transparently is using AI.

The work product if it works as claimed is rather amazing. Maybe even an inflection in AI use, if it would be sustainable.

rmsaksida · 2026-02-17T21:27:14 1771363634

The generated post is even in the repo: https://github.com/christopherkarani/Wax/blob/main/SHOW_HN_P...

anonymoushn · 2026-01-17T06:41:57 1768632117

Hello, the part about canonical filtering in https://openreview.net/pdf?id=DFybOGeGDS doesn't seem to try to account for pretokenization. For example, if you receive " 天天中彩票APP" in o200k, it means there has to be a lowercase letter within the span of letters, and while tokens like (4 spaces) may be pairwise compatible with tokens like "123" according to the BPE merge rules, the pretokenizer would split the span of spaces to give (3 spaces), " ", "123" instead. Are you aware of any work that does actual canonical generation for models with this kind of pretokenization regex?

anonymoushn · 2026-01-13T13:09:45 1768309785

use claude code if you want to use opus

anonymoushn · 2026-01-12T16:24:50 1768235090

what does "logprobs look off" mean

blixt · 2026-01-12T17:02:03 1768237323

If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).

anonymoushn · 2026-01-06T12:39:22 1767703162

Hello, a couple years ago I participated in a contest to count word frequencies and generate a sorted histogram. There's a cool post about it featuring a video discussing the tricks used by some participants. https://easyperf.net/blog/2022/05/28/Performance-analysis-an...

Some other participants said that they measured 0 difference in runtime between pshufb+eq and eqx3+orx2, but i think your problem has more classes of whitespace, and for the histogram problem, considerations about how to hash all the words in a chunk of the input dominate considerations about how to obtain the bitmasks of word-start or word-end positions.

stabbles · 2026-01-06T14:01:56 1767708116

Awesome! The slides with roofline analysis are great! https://docs.google.com/presentation/d/16M90It8nOK-Oiy7j9Kw2...

anonymoushn · 2026-01-03T16:32:11 1767457931

requires fully deterministic inference, which turns out to be unusual, but for this sort of thing it's probably fine if you do really slow inference on cpu. cool idea.

anonymoushn · 2025-12-21T04:50:40 1766292640

please write your own posts from now on

anonymoushn · 2025-12-16T22:29:49 1765924189

i love stemming, i love searching for "anime" and getting "animal"