Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Notebooks are primarily a _teaching_ tool, not code or an experimentation interface. I know that's not how they're used, but it's really bizarre to me that people use them for exploratory data analysis. They're great for teaching and for _documenting_ the results of exploration, but fairly bad for any type of interactive data exploration, let alone actually writing code.

Frankly, notebooks are pretty terrible for exploratory data analysis, especially in python. The main issue is that they absolutely kill interactivity. Straight up CLI based ipython is far more interactive, and often better for exploratory data analysis. In a lot of ways, pretty much any other interface is better than code blocks that aren't independent but can be run independently.

Part of it is the interface (i.e. non-linear state, code blocks, etc), which is great for teaching, but bad for exploration. However, i.m.o. even more of it is the fact that it's browser based. E.g. folks often think it's best to use matplotlib in a notebook. However, using a notebook means you get very little interactivity out of a highly interactive library and remarkably poor performance out of something that's actually _very_ interactive and quite performant on standard Tk/Qt/etc backends.

If you ditch browser based stuff, you're usually far better off. This part is mostly just me griping, but I don't understand why we've stepped back 20 years in basic usability, interactivity, and performance by making everything use a web browser (and then making that not even remotely cross platform).



The issue with CLI is it's crap for editing the code. It's rare that your data analysis will just be one-liners, it'll be some mess of steps and you want to go edit one in the middle (but not re-run the whole script). That's the first thing that the notebooks give you that you don't tend to get elsewhere.

The second thing is the presentation of the plots: it's really handy to have multiple interactive plots laid out right next to the code block that produced them.

There's nothing that requires that this is done in a web interface, but no-one seems interested in making a non-web version (IIRC there was a QT version in the very very early days of ipython notebook but it's long dead). One big thing that any replacement will need is the easy ability for the code to actually be running on a remote machine, something that's trivial for a web interface but requires active effort for a native client (and makes a lot of interactive chart work more difficult: certainly there's currently no other matplotlib backend which can do it).

I'm fairly sure a better interface than jupyter is possible for data exploration, but I haven't found one yet since there's not really anything else that gives you the two features above.

(one thing people often miss about notebooks and the kind of work that go into them is that the code itself isn't actually really hard code. It's 100% about interacting with the data and the algorithms running on it. Working in one isn't much like software engineering)


You're not editing code in the CLI. You're editing code in your preferred editor/IDE. ipython + %edit works wonders. That's one of the main reasons to avoid notebooks and stick with CLI-based or similar tools.

Jumping into the middle of a script and editing is often a bit dangerous. You usually wind up with something that can't be reproduced, as it depends on modifying state in a way that wasn't recorded. You do often need some sort of caching during exploration, for sure (it's not really feasible to wait an hour every time you want to tweak a visualization), but it's often better to break pipelines up a bit and do disk I/O rather than leaning on cells in a notebook.

I do agree on the remote client part, though. Browser-based interactivity often is a good solution for a dev instance + a local thin client. It's slower than native for a local solution, though.


> It's rare that your data analysis will just be one-liners, it'll be some mess of steps and you want to go edit one in the middle (but not re-run the whole script.

And that's why I use elpy to execute codeblocks marked by comments in IPython. VSCode supports this workflow out of the box (but you don't get the matplotlib windows out of the box ;)) - although it is very, very liberal in using screenspace...

> One big thing that any replacement will need is the easy ability for the code to actually be running on a remote machine

VSCode remote (proprietary sadly). Also tramp + SSH-X-forwarding likely beats whatever ergonomics jupyterlab offers.


Not sure if this is sarcastic or not...

I've taught and worked with some of the biggest companies in the world. They were using Jupyter and sound EDA to great effect.


So have I (not FAANG, though). I've been working in scientific python for over 20 years and long before notebooks came to the python world. (Yes, pre-numpy) And I've seen notebooks abused far more than used well. The same general thing was heavily abused in Matlab/etc too. It's not only Jupyter and it's not new.

It's not that they're not a good tool, it's that they're the wrong tool for a lot of what they're used for. They're an excellent teaching and documentation tool. However, they're remarkably bad for actually doing any type of analysis or exploration. They're a great way of presenting your analysis after you complete it, but they're bad for actually doing that analysis. Write analysis code and data exploration code using the same tools you would for production code. There's nothing wrong with an edit + run + explore loop. Do that with a standard REPL like ipython + your preferred editor.

But there's a lot wrong with trying to use a notebook to write and execute code, as it: 1) encourages writing code that won't actually run later (dependent on non-linear code execution and unsaved changes made to earlier cells), and 2) is a poor environment for writing code compared to more full featured editors or IDEs, and 3) discourages interactivity and data exploration due to being browser-based (poor performance compared to running directly limits interactivity during data exploration).

Your data analysis work needs to actually run independently later. Notebooks rarely do unless they're re-made from scratch. At that point, you're better off working in a more full-featured environment for the exploration phase and then using the notebook as later documentation / communication.


Your experience differs from mine... Shrug




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: