btw, xsv has solved most of my problems dealing with 'large' 40GB csv files

throwaway2037 · on March 26, 2024

xsv? I never heard of it. This one? https://github.com/BurntSushi/xsv

If yes, looks very cool. Plus, bonus HN/Internet points for being written in Rust!

jgord · on March 26, 2024

yep .. his utils are most excellent.

vmchale · on March 26, 2024

its parser is buggy! https://github.com/BurntSushi/xsv/issues/337

(I ran into this issue myself)

burntsushi · on March 26, 2024

I just responded to that. It isn't the parser that's a buggy. The parser handles the quotes just fine. If it didn't, that would be a serious bug in the `csv` crate that oodles of users would run into all the time. There would be forks over it if it had persisted for that long.

The problem is that `xsv table` doesn't print the parsed contents. It just prints CSV data, but with tabs, and then those tabs are expanded to spaces for alignment. Arguably it ought to print the parsed contents, i.e., with quotes unescaped.

It almost looks like it's doing that because the quotes are removed in one case, but that's only because the CSV writer knows when it doesn't need to write quotes.

mardifoufs · on March 26, 2024

Ok this might sound stupid, and a bit unrelated, but you make so many great tools that I can't help but ask. How do you start planning and creating for a tool that needs to "follow standards"(in this case I know CSV is under specified but still!), is it by iteration or do you try to set and build a baseline for all the features a certain tool needs? Or do you just try to go for modularity from the get go even if the problem space is "smaller" for stuff like csv for example.

burntsushi · on March 26, 2024

I suppose https://old.reddit.com/r/burntsushi/ might be a good place for questions like this.

I don't really have a simple answer unfortunately. Part of it is just following my nose with respect to what I'm interested in. So there's an element of intrinsic motivation. The other part is trying to put myself in the shoes of users to understand what they want/need. I typically do a lot of background research and reading to try and understand what others have done before me and what the pain points are. And iteration plays a role too. The `csv` crate went through a lot of iteration for example.

I think that's about it. It's hard to answer this question in a fully general way unfortunately. But if you want to get into it, maybe you can be the first person who opens a thread on r/burntsushi haha.