yeah artifact caching is the obvious interpretation of caching when you're used to being compared to bazel, but the conversation was conflating "cache artifacts" and "cache should-run?" features.
I'm ignorant about the exact situation in Polars, but it seems like this is the same problem that web frameworks have to handle to enable registering arbitrary functions, and they generally do it with a FromRequest trait and macros that implement it for functions of up to N arguments. I'm curious if there are were attempts that failed for something like FromDataframe to enable at least |c: Col<i32>("a"), c2: Col<f64>("b")| {...}
1. There are no variadic functions so you need to take a tuple: `|(Col<i32>("a"), Col<f64>("b"))|`
2. Turbofish! `|(Col::<i32>("a"), Col::<f64>("b"))|`. This is already getting quite verbose.
3. This needs to be general over all expressions (such as `col("a").str.to_lowercase()`, `col("b") * 2`, etc), so while you could pass a type such as Col if it were IntoExpr, its conversion into an expression would immediately drop the generic type information because Expr doesn't store that (at least not in a generic parameter; the type of the underlying series is always discovered at runtime). So you can't really skip those `.i32()?` calls.
Polars definitely made the right choice here — if Expr had a generic parameter, then you couldn't store Expr of different output types in arrays because they wouldn't all have the same type. You'd have to use tuples, which would lead to abysmal ergonomics compared to a Vec (can't append or remove without a macro; need a macro to implement functions for tuples up to length N for some gargantuan N). In addition to the ergonomics, Rust’s monomorphization would make compile times absolutely explode if every combination of input Exprs’ dtypes required compiling a separate version of each function, such as `with_columns()`, which currently is only compiled separately for different container types.
The reason web frameworks can do this is because of `$( $ty: FromRequestParts<S> + Send, )*`. All of the tuple elements share the generic parameter `S`, which would not be the case in Polars — or, if it were, would make `map` too limited to be useful.
(a) The final category can never be lower than the
highest hazard-based category;
(b) The TCSS should adequately reflect the case of
high potential risk of two or more hazards. We
consider a hazard of high risk when its respect-
ive category is classified as 3 or higher (equal
to the definition for a Major Hurricane on the
SSHWS). Whenever (at least) two high risk haz-
ards have the same category value and the third
hazard has a lower category value, the final
category should increment the highest hazard-
based category. This implies that a TC scoring a
Category 3 on both wind and storm surge, and
a Category 1 on rainfall, will be classified as a
Category 4.
(c) To warn the general public for an event with
multiple extreme hazards, a high-risk TC can be
classified as a Category 6 when either 1. at least
two of the hazard-based categories are of Cat-
egory 5; or 2. two categories are of Category 4,
and one of Category 5.
Feb 2024 (last year there's data, I think) was a record low and it was 1.4% empty, according to NYC[1].
But I don't really know the methodology, and according to other nyc gov data it's surprising, since we still haven't recovered our population from COVID[2].
The first statistic (housing pressure) is based on population growth, but the NYC population statistics suggest still meaningful population loss since 2020.
I have seen articles in the past that suggest that apartment vacancy rates in NYC are self-reported and misleading at best, but I don't really understand how that would work and I can't find any sources on that now.
It's also my understanding that some classes of landlords can mark empty apartments as income losses, basically or partially making up for the loss of revenue in tax rebates. But that's also not something I understand well, just something I have seen asserted.
1.4% vacancy in a housing market is extraordinarily low. Remember: there is structurally always some material amount of vacancy, because people vacate housing units well before new people move into them. This, by the way, is a stat whose interpretation you can just look up. Real estate people use it as a benchmark.
Yeah I know it's among the lowest in the world, it's still an ~order of magnitude higher than a few tenths of a percent, which would be shocking for the reasons you mention.
My point though was just that I've seen arguments that these numbers can be manipulated, and the city's own data doesn't make sense by itself: either the 1.4% number is wrong or the slowly recovering population estimate is wrong. Especially considering the 60,000 housing units (representing 2% growth) created.
Vacancy doesn’t mean units held empty as either a parking place for cash or held off the market. Vacancy happens when you’re painting and repairing between rentals. Vacancy happens when there’s a renovation. Things like that are normal and not nefarious. Have 1.4% vacancy rate means there is essentially no usable housing for rent.
I was talking about the myth that there are tons of apartments held by rich people who don’t use them for anything.
My understanding is that vacancy means available units for rent. So, plausibly, if you say 50 of the 100 units in your building aren't available for rent because you say they're being painted then they don't contribute to the vacancy of your building.
That's almost the exact opposite of your definition, but I agree that a 1.4% vacancy rate means there's almost nothing available for rent.
Do you have any actual data on the rate of unoccupied properties that are not recently or soon to be available to rent in any major US markets? It seems like kind of hard data to find from my brief perusing around. I'm very interested in seeing some reliable data on this.
I had thought such units would have been included in the housing vacancy statistics, but apparently they are not.
I haven’t spent much time looking at any place other than New York. But there’s census data, tax data, and a lot of public records. The number of empty units is small. The total is probably close to 40k, but that’s a fuzzy number and moving target. That includes regular vacant units.
I recently wrote a similar tool focused more on optimizing the case of exploring millions or billions of objects when you know a few aspects of the path: https://github.com/quodlibetor/s3glob
It supports glob patterns like so, and will do smart filtering at every stage possible: */2025-0[45]-*/user*/*/object.txt
I haven't done real benchmarks, but it's parallel enough to hit s3 parallel request limits/file system open file limits when downloading.*
I have been chasing the gerrit code review high since I left a company that used it almost 5 years ago.
Stacked pull requests are usually what people point to to get this back, but this article points out that _just_ stacked pull requests don't handle it correctly. Specifically with github, you can't really see the differences in response to code review comments, you just get a new commit. Additionally, often github loses conversations on lines that have disappeared due to force pushes.
That said, I have a couple scripts that make it easier to to work with stacks of PRs (the git-*stack scripts in[1]) and a program git-instafix[2] that makes amending old commits less painful. I recently found ejoffe/spr[3] which seems like a tool that is similar to my scripts but much more pleasant for working with stacked PRs.
There's also spacedentist/spr[4] which gets _much_ closer to gerrit-style "treat each commit like a change and make it easier for people to review responses" with careful branch and commit management. Changes don't create new commits locally, they only create new commits in the PR that you're working on. It's, unfortunately, got many more rough edges than ejoffe/spr and is less maintained.
The dependency list[0] looks pretty reasonable, AFAICT the overwhelming majority of that line-of-code count comes from autogenerated Windows API methods.
If you're intending to install the package on an image, [dev-dependencies] are not going to be included in the package. So, no, it's not actually relevant to the surface area of the package.
If a C library used Python to run its tests, I don't think we would consider the whole Python interpreter to be part of the software supply chain for that library. Sure it's possible that running tests on a build machine could let an attacker corrupt the build later, with a bad PyPI package or something. But that feels more like a "not having a clean build environment" problem than a "this project has too many dependencies" problem. I think the fact that Cargo manages these two lists in the same file makes the relationship feel tighter, but I'm not sure it's actually tighter.
I see your point, but if you're going to consider all code that runs on the dev machine as a source of supply chain attacks, that's going to include all LOC for the Linux kernel. And the LOC for dev's web browser that they use to browse the issue tracker. And so on.
If you start doing a commit (via `c` in the magit status buffer, with the standard semantics of "you're going to commit everything that's currently staged") you can press capital F for an instant fixup, or capital S for instant squash.
When you press either of those, magit pops up a commit picker which shows the current git log. Selecting a commit will then instantaneously apply your staged changes to the selected commit. It's much simpler than any of the other workflows I've seen in response to your question.
The gif in this repo (for a tool I made that simulates this behavior as a cli tool for some jealous coworkers) tries to show the workflow: https://github.com/quodlibetor/git-fixup
That said, this _doesn't_ support the "automatically figure out which commits to apply hunks to" workflow. I personally find that I use both workflows depending on the nature of my changes.
One small thing: we do now have tables[1]! At the moment they are ephemeral and only support inserts -- no update/delete. We will remove both of those limitations over time, though!
If you specify sources but not "outputs" then mise will auto-track whether sources have been modified.
I requested the auto-track feature to speed up Docker builds a pretty long time ago, and it's been fantastic.