Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As part of my consulting, i've stumbled upon this issue in a commercial context. A SaaS company who has the mobile apps of their platform open source approached me with the following concern.

One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.

Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own.

"How do we protect ourselves against a competitor doing this?"

Noodling on this at the moment.



You're not describing anything new, you're describing progress. A company invests time and money and expertise into building a product, it becomes established, people copy in 1/10th of the time, the quality of products across the industry improve. Long before generative AI, Instagram famously copied Snapchat's stories concept in a weekend, and that is now a multi-multi-multi-billion contributor to Meta's bottom line.

As engineers, we often think only about code, but code has never been what makes a business succeed. If your client thinks that their businesses primary value is in the mobile app code they wrote, 1) why is it even open source? 2) the business is doomed.

Realistically, though, this is inconsequential, and any time spent worrying about this is wasted time. You don't protect yourself from your competitor by worrying about them copying your mobile app.


> You don't protect yourself from your competitor by worrying about them copying your mobile app.

They did not copy the mobile app. They copied the service.


Replace “mobile app” with “backend” in my comment.


You might be interested in the dark factory work here https://factory.strongdm.ai/

They do something very similar for some of their work. It’s hard to use external services so they replicate them and the cost of doing so has come down from “don’t be daft, we can’t reimplement slack and google drive this sprint just to make testing faster” to realistic. They run the sdks against the live services and their own implementations until they don’t see behaviour differences. Now they have a fast slack and drive and more (that do everything they need for their testing) accelerating other work. I’m dramatically shifting my concept of what’s expensive and not for development. What you’re describing could have been done by someone before, but the difficulty of building that backend has dropped enormously. Even if the application was closed you could probably either now or soon start to do the same thing starting with building back to core user stories and building the app as well.

You can view some of this as having things like the application as a very precise specification.

Really fascinating moment of change.


> It’s hard to use external services

I think it's interesting to add what they use it for and why its hard.

What they use it for:

- It's about automated testing against third party services.

- It's not about replicating the product for end users

Why using external services is hard/problematic

- Performance: They want to have super fast feedback cycles in the agentic loop: In-Memory tests. So they let the AI write full in-memory simulations of (for example) the slack api that are behaviorally equivalent for their use cases.

- Feasiblity: The sandboxes offered by these services usually have performance limits (= number of requests per month, etc) that would easily be exhausted if attached to a test harness that runs every other minute in an automated BDD loop.


> "How do we protect ourselves against a competitor doing this?"

If the platform is so trivial that it can be reverse engineered by an AI agent from a dumb frontend, what's there to protect against? One has to assume that their moat is not that part of the backend but something else entirely about how the service is being provided.


Interesting case, IANAL but sounds legal and legit. The AI did not have expose to the backend it re-implemented. The API itself is public and not protectable.


OTOH as of yesterday the output of the LLM isn't copyrightable, which makes licensing it difficult


As other's have pointed out, this case is really about refusing to allow an LLM to be recognised as the author. The person using the LLM waived any right to be recognised as the author.

Its also US only. Other countries will differ. This means you can only rely on this ruling at all for something you are distributing only in the US. Might be OK for art, definitely not for most software. Very definitely not OK for a software library.

For example UK law specifically says "In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken."

https://www.legislation.gov.uk/ukpga/1988/48/section/9


> The person using the LLM waived any right to be recognised as the author.

They can't waive their liability from being identified as an infringer though.


> the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.

This seems extremely vague. One could argue that any part of the pipeline counts as an "arrangement necessary for the creation of the work", so who is the author? The prompter, the creator of the model, or the creator of the training data?


The courts will have to settle that according to circumstances. I think it is likely to be the prompter, and in some cases the creator of the training data as well. The creator of the model will have copyright on the model, but unlikely to have copyright on its outputs (any more than the writer of a compiler has copyright on its output).


I wrote this comment on another thread earlier, but it seems relevant here, so I'll just c/p:

I think we didn't even began to consider all the implications of this, and while people ran with that one case where someone couldn't copyright a generated image, it's not that easy for code. I think there needs to be way more litigation before we can confidently say it's settled.

If "generated" code is not copyrightable, where do draw the line on what generated means? Do macros count? Does code that generates other code count? Protobuf?

If it's the tool that generates the code, again where do we draw the line? Is it just using 3rd party tools? Would training your own count? Would a "random" code gen and pick the winners (by whatever means) count? Bruteforce all the space (silly example but hey we're in silly space here) counts?

Is it just "AI" adjacent that isn't copyrightable? If so how do you define AI? Does autocomplete count? Intellisense? Smarter intellisense?

Are we gonna have to have a trial where there's at least one lawyer making silly comparisons between LLMs and power plugs? Or maybe counting abacuses (abaci?)... "But your honour, it's just random numbers / matrix multiplications...


In terms of adoption, "it's not settled" is even worse


Maybe we should build an LLM that can be the judge of that :)


That's a very incorrect reading.

AI can't be the author of the work. Human driving the AI can, unless they zero-shotted the solution with no creative input.


Only the authored parts can be copyrighted, and only humans can author [0].

"For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the 'traditional elements of authorship' are determined and executed by the technology—not the human user."

"In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that 'the resulting work as a whole constitutes an original work of authorship.'"

"Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are 'independent of' and do 'not affect' the copyright status of the AI-generated material itself."

IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

[0]: https://www.federalregister.gov/d/2023-05321/p-40


> IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

Actually this is very much how people think for code.

Consider the following consequence. Say I work for a company. Every time I generate some code with Claude, I keep a copy of said code. Once the full code is tested and released, I throw away any code that was not working well. Now I leave the company and approach their competitor. I provide all of the working code generated by Claude to the competitor. Per the new ruling, this should be perfectly legal, as this generated code is not copyrightable and thus doesn't belong to anyone.


No software company thinks this, not Oracle, not Google, not Meta, no one. See: the guy they sued for taking things to Uber.


The person I replied to said "No one's arguing they're authoring generated code; the whole point is to not author it.". My point was that people absolutely do think and believe strongly they are authoring code when they are generating it with AI - and thus they are claiming ownership rights over it.


(the person you originally replied to is also me, tl;dr: I think engineers don't think they're authoring, but companies do)

The core feature of generative AI is the human isn't the author of the output. Authoring something and generating something with generative AI aren't equivalent processes; you know this because if you try and get a person who's fully on board w/ generative AI to not use it, they will argue the old process isn't the same as the new process and they don't want to go back. The actual output is irrelevant; authorship is a process.

But, to your point, I think you're right: companies super think their engineers have the rights to the output they assign to them. If it wasn't clear before it's clear now: engineers shouldn't be passing off generated output as authored output. They have to have the right to assign the totality of their output to their employer (same as using MIT code or whatever), so that it ultimately belongs to them or they have a valid license to use it. If they break that agreement, they break their contract with the company.


(oops, I didn't check the usernames properly, sorry about that)

I still don't think this is fully accurate.

The view I'm noticing is that people consider that they have a right to the programs they produce, regardless of whether they are writing them by hand or by prompting an LLM in the right ways to produce that output. And this remains true both for work produced as an employee/company owner, and for code contributed to an OSS project.

Also, as an employee, the relationship is very different. I am hired to produce solutions to problems my company wants resolved. This may imply writing code, finding OSS code, finding commercial code that we can acquire, or generating code. As part of my contract, I relinquish any rights I may have to any of this code to the company, and of course I commit to not use any code without a valid license. However, if some of the code I produce for the company is not copyrightable at all, that is not in any way in breach of my contract - as long as the company is aware of how the code is produced and I'm not trying to deceive them, of course.

In practice, at least in my company, there has been a legal analysis and the company has vetted a certain suite of AI tools for use for code generation. Using any other AI tools is not allowed, and would be a breach of contract, but using the approved ones is 100% allowed. And I can guarantee you that our lawyers would assert copyright to any of the code generated in this way if I was to try to publish it or anything of the kind.


Every contract I've seen has some clause where the employee affirms they have the right to assign the rights to their output (code, etc) to the company.

I'm not really convinced; I think if I vibe code an app, and you vibe code an app that's very, very similar, and we're both AI believers, we probably both go "yup, AI is amazing; copyright is useless." You know this because people are actively trying to essentially un-GPL things with vibe coding. That's not authoring, that's laundering, and people only barely argue about it. See: this chardet situation, where the guy was like "I'm intimately familiar with the codebase, I guided the LLM, and I used GPL code (tests and API definitions, which are all under copyright) to ensure the new implementation behaved very similarly to the old one." Anything in the new codebase is either GPL'd or LLM generated, which according to the copyright office, isn't copyrightable. If he's right, nothing prevents me from doing the exact same thing to make a new public domain chardet. It's facially absurd.


So if I want to publish a project under some license and I put a comment in an AI generated file (never mind what I put in the comment), how do you go about proving which portion of that file is not protected under copyright?

If the AI code isn't copyrightable, I don't have any obligations to acknowledge it.


You're looking at this as the infringer rather than the owner. How do you as a copyright owner prove you meaningfully arranged the work when you want to enforce your copyright?


I was looking at it from the perspective of an owner who simply wants to discourage use outside of some particular license.

There's close enough to zero enforcement of infringement, it's all self policing or violation.


Copyright office says this has to be done case-by-case. My guess is they'd ask to see prompts and evidence of authorship.


The human is still at best a co-author, as the primary implementation effort isn't theirs. And I think effort involved is the key contention in these cases. Yesterday ideas were cheap, and it was the execution that matters. Today execution is probably cheaper than ideas, but things should still hold.


No, effort is explicitly not a factor in copyright. It was at one point, but "sweat of the brow" doctrine went away in Feist Publications in 1991, at least in the US.


That's not really what the ruling said. Though, I suspect this type of "vibe rewrite" does fall afoul of the same issue.

But for this type of copyright laundering, it doesn't really matter. The goal isn't really about licensing it, it's about avoiding the existing licence. The idea that the code ends up as public domain isn't really an issue for them.


As of yesterday?



No serious enterprise SaaS company differentiates themselves solely on the product (the products are usually terrible). It's the sales channel, the fact that you know how to bill a big company, the human engineer who is sent on site to deploy and integrate the product, the people on the support line 24/7, the regulatory framework that ensures the customer can operate legally and obtain insurance, the fact that there's a deep pool of potential hires who have used and understand the product. Those are the differentiators.


> "How do we protect ourselves against a competitor doing this?"

You can try patenting; but not after the fact. Copyright won't help you here. You can't copyright an algorithm or idea, just a specific form or implementation of it. And there is a lot of legal history about what is and isn't a derivative work here. Some companies try to forbid reverse engineering in their licensing. But of course that might be a bit hard to enforce, or prove. And it doesn't work for OSS stuff in any case.

Stuff like this has been common practice in the industry for decades. Most good software ideas get picked apart, copied and re-implemented. IBM's bios for the first PC quickly got reverse engineered and then other companies started making IBM compatible PCs. IBM never open sourced their bios and they probably did not intend for that to happen. But that didn't matter. Likewise there were several PC compatible DOS variants that each could (mostly) run the same applications. MS never open sourced DOS either. There are countless examples of people figuring out how stuff works and then creating independent implementations. All that is perfectly legal.


IBM never open sourced their BIOS, but they did publish complete source code listings:

https://bitsavers.org/pdf/ibm/pc/pc/6025008_PC_Technical_Ref...

https://bitsavers.org/pdf/ibm/pc/xt/1502237_PC_XT_Technical_...

https://bitsavers.org/pdf/ibm/pc/at/1502494_PC_AT_Technical_...

Between this and the fact that their PC-DOS (née MS-DOS) license was nonexclusive, I'm honestly not sure what they expected to happen.

The nature of early IBM PC advertising suggests to me that they expected the IBM name and established business relationships to carry as much weight as the specifications itself, and that "IBM PC compatible" systems would be no more attractive than existing personal computers running similar if not identical third-party software (PC-DOS wasn't the only example of IBM reselling third-party software under nonexclusive license), and would perhaps even lead to increased sales of first-party IBM PCs.

Which, in fact, they did, leading me to believe the actual result may have been not too far from their original intent, only with IBM capturing and holding a larger share of the pie.


If your backend is trivial enough to be implemented by a large language model, what value are you providing?

I know it's a provoking question but that answers why a competitor is not a competitor.


I suspect you're underestimating the capabilities of today's LLMs.


> "How do we protect ourselves against a competitor doing this?"

I have been thinking about this a lot lately, as someone launching a niche b2b SaaS. The unfortunate conclusion that I have come to is: have more capital than anyone for distribution.

Is there any other answer to this? I hope so, as we are not in the well-capitalized category, but we have friendly user traction. I think the only possible way to succeed is to quietly secure some big contracts.

I had been hoping to bootstrap, but how can we in this new "code is cheap" world? I know it's always been like this, but it is even worse now, isn't it?


Maybe a better question is:

How do our competitors protect themselves against us doing this?


Particularly if you're named "Google", "Amazon", "Microsoft", or "Apple".


I think the genie is out of the bottle on this one and there's really no putting it back.

There is a certain amount of brand loyalty and platform inertia that will keep people. Also, as you point out, just having the source code isn't enough. Running a platform is more than that. But that gap will narrow with time.

The broader issue here is that there are people in tech who don't realize that AI is coming for their jobs (and companies) too. I hope people in this position can maybe understand the overall societal issues for other people seeing their industries "disrupted" (ie destroyed) by AI.


The famous case Google vs Oracle may need to be re-evaluated in the light of Agents making API implementation trivial.

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....


"How do we protect ourselves against a competitor doing this?"

That's the neat thing: you don't!


> "How do we protect ourselves against a competitor doing this?"

DMCA. The EULA likely prohibits reverse engineering. If a competitor does that, hit'em with lawyers.

Or, if you want to be able to sleep at night, recognize this as an opportunity instead of a threat.


What about jurisdictions where reverse engineering is an inalienable right?


Which are those?


Afaik, the EU and Russia says that observing/experimenting with the external behavior of the program to determine its internal logic is legal.

Russia even allows to decompile object code if you have to solve private compatibility issues.


Even in the US, are there any non-DRM examples where reverse engineering for the purpose of interoperability in violation of a license agreement have been used as the basis for copyright claims, even when the results are incorporated into a competing product?

For example, I don't recall Microsoft ever being sued by WordPerfect or Lotus for reading and writing their applications' unpublished file formats, which wouldn't have necessarily involved disassembly or decompilation, but was still the result of reverse engineering that almost certainly involved using a licensed or unlicensed copy of the competitor's product.


Google LLC v. Oracle America, Inc. is also a relevant case, I suspect. Found for Google against Oracle's claim of copyright infringement, for non-clean-room RE of Java APIs:

<https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...>


Nothing. This is why SaaS stocks took a dump last week.


Makes me wonder when AI will put the mobile phone OS duopoly to an end.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: