But usually, I know how the thing works. I have data in this format, and I need data in this other format. The transformation may include cross-referencing among the data, processing strings in various ways, computing sums and counts, calling external libraries, etc. These are simple to code and well-defined tasks. In my experience, they rarely benefit from prototyping.
My recent example is processing around 10m chess games to get statistics on all positions that occurred inside them (above an occurrence threshold). This required parsing the games in chess notation, using a chess library to simulate the moves to get positions, and counting how often each position occurred, in which matches, etc. My first try was with Python. After I realized it's unbearably slow, I tried using PyPy, running multiple processes for each core, etc., and in the end my approximation was that the job would finish in a couple of hours. I tried more optimizations and nothing helped. And there's no number crunching to use numpy for. Then I wrote the same script in Rust, and it ran in a couple of minutes, finishing well before the original Python script would have finished, had I left it to run. I arguably didn't save time by using Python here.
My recent example is processing around 10m chess games to get statistics on all positions that occurred inside them (above an occurrence threshold). This required parsing the games in chess notation, using a chess library to simulate the moves to get positions, and counting how often each position occurred, in which matches, etc. My first try was with Python. After I realized it's unbearably slow, I tried using PyPy, running multiple processes for each core, etc., and in the end my approximation was that the job would finish in a couple of hours. I tried more optimizations and nothing helped. And there's no number crunching to use numpy for. Then I wrote the same script in Rust, and it ran in a couple of minutes, finishing well before the original Python script would have finished, had I left it to run. I arguably didn't save time by using Python here.