Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The security of package managers is something we're going to have to fix.

Some years ago, in offices, computers were routinely infected or made unusable because the staff were downloading and installing random screen savers from the internet. The IT staff would have to go around and scold people not to do this.

If you've looked at the transitive dependency graphs of modern packages, it's hard to not feel we're doing the same thing.

In the linked piece, Russ Cox notes that the cost of adding a bad dependency is the sum of the cost of each possible bad outcome times its probability. But then he speculates that for personal projects that cost may be near zero. That's unlikely. Unless developers entirely sandbox projects with untrusted dependencies from their personal data, company data, email, credentials, SSH/PGP keys, cryptocurrency wallets, etc., the cost of a bad outcome is still enormous. Even multiplied by a small probability, it has to be considered.

As dependency graphs get deeper, this probability, however small, only increases.

One effect of lower-cost dependencies that Russ Cox did not mention is the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them. This further multiplies the task of reviewing all of the code important to your program.

Linux distributions had to deal with this problem of trust long ago. It's instructive to see how much more careful they were about it. Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust. Of course, the distributions are at the end of the day distributing software written by others, and this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review. And of course, the hassles of getting a library accepted into distributions is one reason for the popularity of tools such as Cargo, NPM, CPAN, etc.

It seems that package managers, like web browsers before them, are going to have to provide some form of sandboxing. The problem is the same. We're downloading heaps of untrusted code from the internet.



After using Go and Dart on a number of projects and using very few dependencies (compared to JavaScript projects) I'd say a good starting point is having a great standard library.

For example, it's a bit ridiculous that in 2019 we cannot decode a JWT using a simple browser API, still need Moment for time and date operations, there is no observable type (a 4 year old proposal is still in draft stage), and still no native data-binding.

The TC39 is moving too slowly and that's one of the reasons why NPM is so popular.


I mean, even all of those examples you listed aren't as crazy as the fact that you need a library to parse the cookie string and deal with individual cookies...


> I'd say a good starting point is having a great standard library.

True, and at this stage, I am also playing with Javascript dates and timezones and I would be happy with a mediocre standard library.


> Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust

At the very least. More often people receive mentoring for months and meet in person.

> this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review

It makes packaging extremely time consuming and that's why a lot of things in Go and javascript are not packaged.

The project cares about security and compliance to licensing.


> ... the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them.

It's not just a security problem. It also hampers composition, because when two libraries talk about the same concept in different "terms"/objects/APIs (because they rely on two different other libraries to wrap it), you have to write a bridge to make them talk to each other.

That's why large standard libraries are beneficial - they define the common vocabulary that third-party libraries can then use in their API surface to allow them to interoperate smoothly.


> The security of package managers is something we're going to have to fix.

why the generalization? lot of package manager have been serviceable for decades, their security model based solely on verifying the maintainer identity with clients deciding which maintainer to trust.

look at the amount of security advisories that are just 'malicious package' - https://www.npmjs.com/advisories

this is of course an issue with all package manager, but it's the lack of trusted namespacing that makes it easy to fall into it. (there's scope which sound similar but the protection model of the scope name is currently unclear to me and it's optional anyway)

compare to maven, where a package prefix gets registered along with a cryptographic key and only the key holder can upload to it to the central repo.

sure you get malicious packages going around, but it's far easier not to fall into it because it's significantly harder to get user to download a random package off the namespaces he knows

> We're downloading heaps of untrusted code from the internet.

this is not something a package manager can fix, it's a culture problem. even including a gist or something off codepen is dangerous. a package manager cannot handle the 'downloading whatever' issue, it's not reasonable to put that in its thread model, because no package management maintainer can possibly guarantee that there is no malicious code in its repository and it's not its role anyway. a package manager is there to get a package to you as it was published at a specific point in time identified by its versioning, and its threat model should be people trying to publish packages under someone else name.

speaking of which it took npm 4 years to prevent people to publish a package with new code under an existing version number: https://github.com/npm/npm-registry-couchapp/issues/148 - they eventually came to sense but heck the whole node.js ecosystem gung-ho attitude is scary.


> why the generalization? lot of package manager have been serviceable for decades, their security model based solely on verifying the maintainer identity with clients deciding which maintainer to trust.

What happens when the maintainer of a package changes?

The big problem I see happening is maintainers getting burned out and abandoning their packages, and someone else taking over. You might trust the original maintainer, but do you get notified of every change in maintainer?


> The security of package managers is something we're going to have to fix.

Companies that care about this already have dependency policies in place. The companies that don't care so much about security already have an approach to security problems that they will employ if a significant threat is revealed, spend time and money to fix it then.

It's a herd approach. Sheep and cattle band together because there's strength in numbers and the wolves can only get one or two at a time. It's extremely effective at safeguarding most of the flock.


>Companies that care about this already have dependency policies in place. The companies that don't care so much about security already have an approach to security problems that they will employ if a significant threat is revealed, spend time and money to fix it then.

I think that probably the majority of companies actually fall into a third group: Those who don't really care enough about this but also don't really have a good policy for dealing with it.


> It's instructive to see how much more careful they were about it.

"Much more careful" would have been a requirement to consult upstream on all patches that are beyond the maintainer's level of expertise. Especially so for all patches that potentially affect the functioning of cryptographic libraries.

Debian has had a catastrophe to show the need for such a guideline. Do they currently have such a guideline?

If not it's difficult to see the key parties as little more than security theatre.

Edit: clarification


> If you've looked at the transitive dependency graphs of modern packages, it's hard to not feel we're doing the same thing.

Are there any tools to do this visually. I mean visualizing the graph and interacting with it.


Yeah, you see them all really easily here:

https://anvaka.github.io/pm/#/galaxy/npm?cx=-1345&cy=-7006&c...


Dang, that's _beautiful_.


> The security of package managers is something we're going to have to fix.

Inclusiveness and the need for Jeff Freshman and Jane Sophomore to have a list of 126 GitHub repos before beginning their application process for an intern job is at odds with having vetted entities as package providers.

When I was developing Eclipse RCP products, I had three or five entities that provided signed packages I used as dependencies.

Plus: with npm, you even have tooling dependencies, so the former theoretical threat of a malicious compiler injecting malware is now the sad reality[0].

I'm not claiming the "old way" is secure, but the "new way" is insecure by design and by policy (inclusiveness, gatekeeping as fireable offense).

[0] I have tooling dependencies in Gradle and Maven too, but again, these are by large vendors and not by some random resume padding GitHub user.


I'm a big fan of kitchen sink frameworks for this reason. Whenever I want to do something in JS the answer is to install a package for it. When I want to do something in rails the answer is its built in. I have installed far far fewer packages for my back end than the frontend and the back end is vastly more complex


Same here.

Turbo Vision, Clipper, OWL, MFC, Forms, JEE, ASP.NET, Liferay, Sitecore,...

I don't have patient to play hunt the dependency game, followed by get all of them to play well together.


Here is some recent research where we are trying to solve the security and update issues in Rust/Cargo:

https://pure.tudelft.nl/portal/files/46926997/main2.pdf

TLDR: it boils down to analysing dependencies at the level of the callgraph; but building those callgraphs isn't easy. The benefit in the security use case is ~3x increased accuracy when identifying vulnerable packages (by eliminating false positives).


This right here is why Go's 'statically link everything' is going to become a big problem in the long run when old servers are running that software and no one has the source code anymore.


I find that Go developers also tend to have a 'vendor everything' philosophy which significantly decreases that risk.

At least the ones I meet tend to take that approach...


If anything, it hugely increases the risk of vulnerable libraries being built into the binary and forgotten.

Same with containers. Statistics clearly show the security impact.


i dont see how that's true. in both worlds, a developer has to take the manual action to review published vulnerabilities and track down repos they own that are affected and upgrade the dependencies.


No: with dynamic linking, and especially with Linux distributions, most of the work is automated and the patching is done by the distribution security team.

The time to write a patch and deliver it to running systems goes down to days or, more often, hours.


Do you have a reference to such statistics? I'd love to use them as yet another reason against vendoring dependencies when I talk to colleagues.


https://blog.acolyer.org/2017/04/03/a-study-of-security-vuln...

Cautiously posting that link, because I'm not against vendoring. You just need a process around keeping your dependencies up to date / refreshed automatically. The ability to vendor is one thing, how you use it is another.


Thanks!

I agree with your statement, but what I usually see in real life is that once dependencies are vendored in they never change.


> You just need a process around keeping your dependencies up to date / refreshed automatically

That's what dynamic linking and Linux distributions are for.


> when old servers are running that software and no one has the source code anymore

If only there was some kind of "open" source, so that it could be included along with the binaries. :P

I think the bigger problem is all the bootstrapping issues where you need "living" build systems to build more build systems etc.


It would be nice if our compilers had the ability to directly incorporate the source code into the binary in some standard way. E.g. on Win32, it could be a resource, readily extractable with the usual resource viewer. On Unix, maybe just a string constant with a magic header, easy to extract via `strings`. And so on.


I agree this would be positive. But the source only get's you halfway there though, you still need to actually be able to reproduce a compatible build system. And the longer ago the software was originally developed, the more challenging that becomes.

There are lot's of old projects out there relying on some ancient VS2003 installation. Same will happen with modern languages in a decade - code goes stale, and it get's more and more difficult to pull down the versions of software it was originally built with.


I hate (read: love) to be pedantic, but all scripting languages already have this feature built-in, and thanks to modern VMs and JIT compilers and the like, performance is much less of an issue.

It would be interesting to see e.g. a Go executable format that ships with the source, build tools and documentation that would compile to the current platform on demand. Should be doable in a Docker image at least.


No one's going to waste resources putting source code on the server dude, they'll host it somewhere else and then something will happen to it or they just won't see the need to give the source to anyone because they're the only people in the company who understand it anyway etc.


Given the ease with which the parser and AST are made available to developers, we should be able to implement tools which can detect naughty packages. Also, given the speed at which projects can be compiled, the impetus to keep the source code should remain strong.


> we should be able to implement tools which can detect naughty packages

We can! It's one thing to know that there's no major technical obstacle to having a security-oriented static analysis suite for your language of choice. It's quite another for one to actually have already been written.

The primary wrinkle tends to be around justifying the cost of building one. For companies that use small languages, that means a non-trivial cost in engineer time just to get a research-grade scanner. For companies whose products are security scanners, it means waiting until there's a commercial market for supporting a language.

This is a problem I've been struggling with. I sympathize a great deal with developers who want to use the newest, most interesting, and above all most productive tools available to them. This stacks up awkwardly against the relatively immature tooling ecosystem common in more cutting-edge languages with smaller communities and less corporate support.


https://en.m.wikipedia.org/wiki/List_of_tools_for_static_cod...

They're either going to miss things or have false positives. They sure improve the situation, but you can't find all of the issues automatically.


Granted. But it will at least raise the bar for building an exploit package from "knows how to code" to "knows how to code, knows something about exploits, and knows how to avoid detection by an automated scanner."


It really depends on how the developers work; if they know the software will have to run for 10+ years, mostly unmaintained / unmonitored, they can opt to vendor all the dependencies so that a future developer can dive into the source of said dependencies.

Also, the Go community tends to frown at adding superfluous dependencies - this is a statement I got while looking for web API frameworks. Said frameworks are often very compact as well, a thin wrapper around Go's own APIs.

I've also worked on a project which had to be future-proofed solidly; all documentation for all dependencies had to be included with the source code.

TL;DR it's down to the developer's approach.


Go also allows to 'dynamically link everything' nowadays.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: