This is pretty cool. The title would be a bit more descriptive if it were “Fast Rolling Quantile Filters for Python”, since the high-pass/low-pass filter functionality seems to be the focus.
The README mentions it uses binary heaps - if you’re willing to accept some (bounded) approximation, then it should be possible to reduce memory usage and somewhat reduce runtime by using a sketching data structure like Dunning’s t-digest: https://github.com/tdunning/t-digest/blob/main/docs/t-digest....
You can also just get/use exact answers with the `accumulation_tree` package that Python `tdigest` impl uses, though for large windows a B-tree is faster than a red-black ranked/instrumented binary tree. Either tree also scales better than heap pairs to multiple quantiles per moving window (like 5%, 50%, 95%). Python's `blist` package may not have the exact API you need to build from, but it is very close. (`blist` also does not optimize for the "histogramming case" where there can be many duplicate values, IIRC. Instead each value gets its own independent slot in the data structure.)
Awesome! Rolling quantiles were a major bottleneck in my work a couple years ago, so much so that I ended up removing them and using approximations instead. It'll be good to try this if I ever go back to that code.
Perhaps, but as far as I'm aware, none of them are as flexible as my interface (to support e.g. streaming, pipelining.) I have not looked at all of pandas' time-series functionality that closely.
The README mentions it uses binary heaps - if you’re willing to accept some (bounded) approximation, then it should be possible to reduce memory usage and somewhat reduce runtime by using a sketching data structure like Dunning’s t-digest: https://github.com/tdunning/t-digest/blob/main/docs/t-digest....
There is an open source Python implementation, although I haven’t used it and can’t vouch for its quality: https://github.com/CamDavidsonPilon/tdigest