Notebooks are primarily a _teaching_ tool, not code or an experimentation interf...

rcxdude · on June 14, 2024

The issue with CLI is it's crap for editing the code. It's rare that your data analysis will just be one-liners, it'll be some mess of steps and you want to go edit one in the middle (but not re-run the whole script). That's the first thing that the notebooks give you that you don't tend to get elsewhere.

The second thing is the presentation of the plots: it's really handy to have multiple interactive plots laid out right next to the code block that produced them.

There's nothing that requires that this is done in a web interface, but no-one seems interested in making a non-web version (IIRC there was a QT version in the very very early days of ipython notebook but it's long dead). One big thing that any replacement will need is the easy ability for the code to actually be running on a remote machine, something that's trivial for a web interface but requires active effort for a native client (and makes a lot of interactive chart work more difficult: certainly there's currently no other matplotlib backend which can do it).

I'm fairly sure a better interface than jupyter is possible for data exploration, but I haven't found one yet since there's not really anything else that gives you the two features above.

(one thing people often miss about notebooks and the kind of work that go into them is that the code itself isn't actually really hard code. It's 100% about interacting with the data and the algorithms running on it. Working in one isn't much like software engineering)

jofer · on June 14, 2024

You're not editing code in the CLI. You're editing code in your preferred editor/IDE. ipython + %edit works wonders. That's one of the main reasons to avoid notebooks and stick with CLI-based or similar tools.

Jumping into the middle of a script and editing is often a bit dangerous. You usually wind up with something that can't be reproduced, as it depends on modifying state in a way that wasn't recorded. You do often need some sort of caching during exploration, for sure (it's not really feasible to wait an hour every time you want to tweak a visualization), but it's often better to break pipelines up a bit and do disk I/O rather than leaning on cells in a notebook.

I do agree on the remote client part, though. Browser-based interactivity often is a good solution for a dev instance + a local thin client. It's slower than native for a local solution, though.

fock · on June 14, 2024

> It's rare that your data analysis will just be one-liners, it'll be some mess of steps and you want to go edit one in the middle (but not re-run the whole script.

And that's why I use elpy to execute codeblocks marked by comments in IPython. VSCode supports this workflow out of the box (but you don't get the matplotlib windows out of the box ;)) - although it is very, very liberal in using screenspace...

> One big thing that any replacement will need is the easy ability for the code to actually be running on a remote machine

VSCode remote (proprietary sadly). Also tramp + SSH-X-forwarding likely beats whatever ergonomics jupyterlab offers.

__mharrison__ · on June 14, 2024

Not sure if this is sarcastic or not...

I've taught and worked with some of the biggest companies in the world. They were using Jupyter and sound EDA to great effect.

jofer · on June 14, 2024

So have I (not FAANG, though). I've been working in scientific python for over 20 years and long before notebooks came to the python world. (Yes, pre-numpy) And I've seen notebooks abused far more than used well. The same general thing was heavily abused in Matlab/etc too. It's not only Jupyter and it's not new.

It's not that they're not a good tool, it's that they're the wrong tool for a lot of what they're used for. They're an excellent teaching and documentation tool. However, they're remarkably bad for actually doing any type of analysis or exploration. They're a great way of presenting your analysis after you complete it, but they're bad for actually doing that analysis. Write analysis code and data exploration code using the same tools you would for production code. There's nothing wrong with an edit + run + explore loop. Do that with a standard REPL like ipython + your preferred editor.

But there's a lot wrong with trying to use a notebook to write and execute code, as it: 1) encourages writing code that won't actually run later (dependent on non-linear code execution and unsaved changes made to earlier cells), and 2) is a poor environment for writing code compared to more full featured editors or IDEs, and 3) discourages interactivity and data exploration due to being browser-based (poor performance compared to running directly limits interactivity during data exploration).

Your data analysis work needs to actually run independently later. Notebooks rarely do unless they're re-made from scratch. At that point, you're better off working in a more full-featured environment for the exploration phase and then using the notebook as later documentation / communication.

__mharrison__ · on June 14, 2024

Your experience differs from mine... Shrug