Ah, I misread your OP - the only part I really planned to support is "this package requires building from source, and may run arbitrary code now (not just after installation) to do so; are you sure you want to proceed?". Although the other stuff certainly seems worthwhile - I think it would be easier to plug into my design than into pip. Especially since I'm explicitly creating an internal API first and wrapping a separate CLI package around that.
The project is still in quite early development stages, and not yet really usable for anything - I've been struggling to make myself sit down and implement even simple things, just personal issues. But the repository is at https://github.com/zahlman/paper and I am planning a Show HN when it seems appropriate. Hopefully the existing code at least gives an idea of the top-level design. I also described it a bit in https://news.ycombinator.com/item?id=43825508 .
I've also written a few posts on my blog (https://zahlman.github.io) about design problems with pip, and going forward I'll be continuing that, along with giving a basic overview of the PAPER design and explaining how it addresses the problems I've identified.
It's not visible on the screenshot for some reason, but if you run the latest version, you'll notice a little underline under the CVE mention. It's actually a hyperlink (Cmd+click in iTerm2) that leads to https://osv.dev/vulnerability/CVE-2024-24762 where you can find out more.
Yes, the reason I had to fork pip was that the dependency resolution logic is too complex and I couldn't recreate it from scratch with fidelity.
You're right I don't vendor dependencies, and I hope to get away with it exactly because I don't have the bootstrapping problem. In practice, you want to install pipask with pipx so that the dependencies don't mess with your local environment.
Ideally, you should use lockfiles for your CI/CD or docker. To create or update the lockfile, a developer needs to install dependencies manually first (as in `pip install X` -> `pip freeze`), at which point the checks would be executed and the user would consent.
That said, it's pretty uncommon to use lockfiles with pip, so I'm considering creating something like a plugin for poetry or uv, if there is demand?
Quite a few people use requirements.txt files with pip actually. I've seen many projects that even expect end users to do so. You might not notice - exactly because they aren't packaging for PyPI.
Sure, they presumably have a local dev environment where they install dependencies to test their own code.
But there are a lot of possible workflows around that. Some people might separately install things one at a time according to what they appear to need as they're developing, and then use `pip freeze` to create the `requirements.txt` file. Others might edit `requirements.txt` directly, and repeatedly re-create their environment based off that. Still others might involve any number of tools here, such as pip-tools (https://pypi.org/project/pip-tools/), pipenv (https://pypi.org/project/pipenv/), etc.
As long as they run `pip install` locally at any point in their process before pushing to the repo, they should get the opportunity to see the pipask report.
Thanks!
Good question. I think the main downsides are:
- installation takes a few more seconds to do the checks
- you need to trust me, a random person from the internet
- if there are any subtle differences between pip versions, the checks may be done for different versions than will be actually installed (I've done my best to prevent this for pip versions 22.2 to current latest), or if I missed any bugs, you may get an error you wouldn't get with pip
The current version is also interactive only - requires user confirmation, though I'm open to adding a non-interactive mode in the future.
Great point! If you alias pip to pipask in your .*rc file, than this should already work out of the box for some tools, but there may be problems such as the need for non-interactive flows and configuring failure thresholds.
It occurs to me that if people are executing pip over requirements.txt outputs it would work and be very helpful, but if they're giving LLM agents shell access directly probably the main problem is going to be finding a way for pipask to try to confirm that it's talking to a human and not just the LLM again (impossible in general but still)... probably out of scope though!
Perhaps it's not clear from my description above, but I'm afraid the flaw is in the Python package ecosystem itself rather than pip. I'm not very familiar with uv, but from what I can tell from the documentation, it needs to execute the same steps as pip to resolve metadata, as this is required by various PEPs. (You can have a look at the diagram in the linked blog post https://medium.com/data-science-collective/pipask-know-what-...).
But I also get your point - advanced users who care about security may not be using pip. Implementing the functionality as a plugin for uv or poetry is actually the next step I'm considering, if people find the concept of pipask useful. What do you think?
Dependency metadata is published within packages - both for wheels (prebuilt) and sdists (source that requires a build step - these projects often include non-Python code, but you can still request an entirely pointless build step for a Python-only project. But please read https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... and don't do that.)
PyPI even extracts it in most cases, to my understanding, so that installers can solve for the right versions of dependencies without downloading entire packages. (But the metadata for the project is still fragmented across multiple per-distribution files.)
However, source distributions are still allowed to have metadata - including dependencies - marked as "dynamic" (i.e., declaring that they'll be determined during the build process). This is rarely necessary (and probably happens much more often than is necessary), but a very complex project might for example have different dependencies based on specific details of the user's environment that aren't currently expressible in the existing environment markers (see https://peps.python.org/pep-0508/).
My experience with the Python ecosystem has been that it tries a bit too hard to make absolutely everyone happy (despite all the times that quite a few people end up unhappy). Today's concessions to backward compatibility always seem to make tomorrow's even harder to implement.
Short version, although you already got an answer:
If everyone had to use it, and everyone were only allowed to use "static" dependencies determined ahead of time, yes. But:
* legacy projects that don't use pyproject.toml are still supported
* it's possible to publish an "sdist" source package that's built on the user's machine (for example, because it includes C code that's highly machine specific and needs to be custom built for some reason; or because the user wants to build it locally in order to link against large, locally available libraries instead of using a massive wheel that copies them)
* When something is built locally, it's permissible to determine the dependencies during that build process (and in some rare cases, that may be another reason why an sdist gets used - the user's environment needs to be inspected in order to figure out what dependencies to fetch)
* Even if it did work, `pyproject.toml` is really more like "source code" for the metadata (about dependencies and other things). The real metadata is a file called `PKG-INFO` when it appears in an sdist, or `METADATA` in a wheel. The format is based on email headers (yes, really).
In short, you can get metadata from pyproject.toml, but (a) it can still involve executing code due to PEP 517 hooks, and (b) a malicious package would use the legacy setup.py to get their code executed.
That's a super helpful diagram. Saved it for later in case I have to explain to someone else. Thank you. I can see why something like pipask would be helpful. I saw in another comment that you are looking to make a uv plugin. I'll be on the lookout for that getting released!