More

Gethsemane · 2025-10-17T10:05:02 1760695502

A highlight of my time in Turkey was the cats - thank you for your efforts! Antalya had a lot of cat hotels in the park and most looked very healthy.

Gethsemane · 2025-10-15T14:44:25 1760539465

If I was less lazy I could probably find this answer online, but how do you find the battery life these days? I'd love to make the switch, but that's the only thing holding me back...

Gethsemane · 2025-10-06T21:27:17 1759786037

I'd love to see some benchmarks for this on some common genomic formats (fa, fq, sam, vcf). Will be doubly interesting to see its applicability to nanopore data - lots of useful data is lost because storing FAST5/POD5 is a pain.

jltsiren · 2025-10-06T22:09:08 1759788548

OpenZL compressed SAM/BAM vs. CRAM is the interesting comparison. It would really test the flexibility of the framework. Can OpenZL reach the same level of compression, and how much effort does it take?

I would not expect much improvement in compressing nanopore data. If you have a useful model of the data, creating a custom compressor is not that difficult. It takes some effort, but those formats are popular enough that compressors using the known models should already exist.

terrelln · 2025-10-06T22:19:22 1759789162

Do you happen to have a pointer to a good open source dataset to look at?

Naively and knowing little about CRAM, I would expect that OpenZL would beat Zstd handily out of the box, but need additional capabilities to match the performance of CRAM, since genomics hasn't been a focus as of yet. But it would be interesting to see how much we need to add is generic to all compression (but useful for genomics), vs. techniques that are specific only to genomics.

We're planning on setting up a blog on our website to highlight use cases of OpenZL. I'd love to make a post about this.

bede · 2025-10-06T22:55:36 1759791336

For BAM this could be a good place to start: https://www.htslib.org/benchmarks/CRAM.html

Happy to discuss further

terrelln · 2025-10-06T23:04:08 1759791848

Amazing, thank you!

I will take a look as soon as I get a chance. Looking at the BAM format, it looks like the tokenization portion will be easy. Which means I can focus on the compression side, which is more interesting.

fwip · 2025-10-07T03:45:55 1759808755

Another format that might be worth looking at in the bioinformatics world is hdf5. It's sort of a generic file format, often used for storing multiple related large tables. It has some built-in compression (gzip IIRC) but supports plugins. There may be an opportunity to integrate the self-describing nature of the hdf5 format with the self-describing decompression routines of openZL.

felixhandte · 2025-10-07T18:06:57 1759860417

Wanna hop over to https://github.com/facebook/openzl/issues/76?

jayknight · 2025-10-06T21:38:51 1759786731

And a comparison between CRAM and openzl on a sam/bam file. Is openzl indexable, where you can just extract and decompress the data you need from a file if you know where it is?

terrelln · 2025-10-06T21:40:30 1759786830

> Is openzl indexable

Not today. However, we are considering this as we are continuing to evolve the frame format, and it is likely we will add this feature in the future.

Gethsemane · 2025-09-15T14:49:19 1757947759

Unfortunately, when you write a program that doesn't wrap output FASTAs, you have a bunch of people telling you off because SOME programs (cough bioperl cough) have hard limits on line length :)

sharedptr · 2025-09-16T06:28:39 1758004119

Is BioPerl still standard, did people move to BioPython?

When I was shown BioPerl I was tempted to write a better, C++ version, but was overwhelmed by other university stuff and let it go.

o11c · 2025-09-15T19:27:03 1757964423

You can use content-defined chunking to wrap at a predictable place so that compression still works.

Gethsemane · 2025-08-26T12:29:14 1756211354

I really want to like typer, and frequently go down the rabbit hole of rewriting all my argparse into typer, but I keep getting put off by it's high import cost and that development seems to be a bit up in the air (see https://github.com/fastapi/typer/issues/678#issuecomment-319...). A shame because otherwise it's a really nice library!

Gethsemane · 2025-06-26T20:48:41 1750970921

Agreed, there’s been some interesting developments in this space recently (e.g. AgroNT). Very excited for it, particularly as genome sequencing gets cheaper and cheaper!

I’d pitch this paper as a very solid demonstration of the approach, and im sure it will lead to some pretty rapid developments (similar to what Rosettafold/alphafold did)

Gethsemane · on Nov 29, 2024

Something I found useful is that you can create a much more minimal pandoc template for typst than for latex. Obviously if familiar with latex it probably won't be an issue, but when I tried to make my own barebones pandoc template (i.e., stripping out beamer) I gave up.

Gethsemane · on Nov 29, 2024

I've similarly found the combination of pandoc + typst to be quite exciting. I've found it particularly useful for typesetting academic papers - I'm quite averse to word in general, don't require extensive mathematical typesetting support, and find latex to generally be quite unapproachable (just look at the size of the default pandoc template!), and so it gives me a method of making a decent pdf whilst simultaneously producing a .docx for my collaborators. Being able to track changes with git is also a huge advantage, although never had the chance to work with someone who is comfortable using git :(

The recently added support for PDF/A is also quite exciting, as I've never found a satisfactory solution to this with latex. Now I just wish journals would support markdown submissions...

Gethsemane · on Nov 25, 2024

I am a fan of the colour scheme they selected in figure 2 - very relevant. https://pubs.rsc.org/image/article/2024/na/d4na00601a/d4na00...

Gethsemane · on Sept 26, 2024

As an example of a more informative map of income/deprivation, I recently encountered the Scottish Index of Multiple Deprivation website (https://simd.scot). Only applicable to Scotland (obviously), but it is interesting to see how each city is a mosaic of social status. From personal experience, it is extremely accurate down to the street level!