If you don't use REPL (and for many good reasons you shouldn't) or some other su...

eigenspace · on Feb 14, 2021

> (and for many good reasons you shouldn't)

Could you elaborate on this? I'd say repl based interactive programming is one of julia's greatest strengths, and avoiding the repl is probably setting yourself up for pain.

That said, if you do find yourself running lots of scripts and paying this penalty all the time, I'd suggest https://github.com/dmolina/DaemonMode.jl as a great way around these pains.

jampekka · on Feb 14, 2021

> Could you elaborate on this? I'd say repl based interactive programming is one of julia's greatest strengths, and avoiding the repl is probably setting yourself up for pain.

With REPL you have an invisible global state, can't reproduce what you have done, changes earlier in code path don't propagate to results, you don't have documentation of what you did.

It's for me really like trying to write a book by dictating. Except that you're dictating to somebody who's gonna give an independent summary of it to a third party and never gonna write down what you dictated. It boggles my mind how people can work like this, but they probably get hooked to REPL from the first tutorials and just don't know better.

I'll look into DaemonMode.jl. Not a fan of using a daemon (and I'm guessing there will be problems with e.g. interactive plots), but in the short term I'll take anything that could make Julia programming tolerable.

eigenspace · on Feb 14, 2021

> With REPL you have an invisible global state, can't reproduce what you have done, changes earlier in code path don't propagate to results, you don't have documentation of what you did.

Mhm, that's fair. I think Pluto.jl has a really neat approach to this, using reactivity (and technically even more state) to actually eliminate that experienced state.

If I could use it from emacs it might even be my goto way to interact with julia, but I also don't mind the statefulnes and find it manageable.

For me, the most important thing is that when I'm writing serious code, I create a local package. Then, in the REPL I load that package and have Revise.jl active so that it can watch the the package source ode and constantly do hot code reloading for me so that I'm never stuck with old versions of code running.

Then I do all my interactive analysis in the REPL, and plumbing in the package module. This eliminates a lot of statefulness, but keeps restarts to a minimum.

jampekka · on Feb 14, 2021

I briefly looked at Pluto.jl, and I think it's probably a good way. As I understood it it's a "notebook" that always runs the whole file. Like e.g. RMarkdown or sweave. All good. The state is fine too if it's explicit. But I'm fine with just CLI and print and occasional plot, which should be a lot simpler use case for development.

I create a "local package", meaning a file from which I relatively import. During development/analysis it's hard to foresee what the package structure is gonna be, so it's quite pointless to go through the whole packaging ceremony at this point. FromFile works fine for this.

As a temporary hack I could use REPL to call my "main" function and let Revise.jl update automatically. (In long term this is bad for interoperability with rest of the system). But in my experience Revise.jl tends to break a lot. Julia breakage is hard to analyze by itself, and Revise.jl often makes this more or less impossible.

I have to repeat that I really don't see how caching of the compilation results is even close the complications that Revise.jl or Pluto.jl have to do.

eigenspace · on Feb 14, 2021

> I briefly looked at Pluto.jl, and I think it's probably a good way. As I understood it it's a "notebook" that always runs the whole file.

Not quite. It builds a dependancy graph of your code and can figure out what definitions depend on others. So depending on what you change, maybe only one or two cells need to be rerun. Or in other circumstances, the whole notebook will have to re-run. It just depends on what changes.

> I have to repeat that I really don't see how caching of the compilation results is even close the complications that Revise.jl or Pluto.jl have to do.

I think the main trouble with the caching is that the native code you cache can depend very strongly on the exact combination of packages you have loaded. This means you can hit a combinatorial explosion of different methods to cache pretty quickly, so you'd need to find a very clever way to find the right methods to keep and which ones to delete once the cache gets too big.

I think there's also other potential issues that I understand less. This is being actively worked on though.

jampekka · on Feb 14, 2021

> Not quite. It builds a dependancy graph of your code and can figure out what definitions depend on others. So depending on what you change, maybe only one or two cells need to be rerun. Or in other circumstances, the whole notebook will have to re-run. It just depends on what changes.

But the effect is still that any changes up-file will be always reflected down-file? If so, I don't care how it's implemented (given it's fast enough and doesn't break), the semantics is the point.

> I think the main trouble with the caching is that the native code you cache can depend very strongly on the exact combination of packages you have loaded. This means you can hit a combinatorial explosion of different methods to cache pretty quickly, so you'd need to find a very clever way to find the right methods to keep and which ones to delete once the cache gets too big.

Yes, I think this is a problem for a clean solution. But for a big fat ugly hack that isn't too picky on wasting disk space or occasionally recompiling stuff needlessly it's probably less so.

For a lot of cases very rough invalidation would probably suffice. E.g. invalidate all definitions from all files that are changed from the last run (i.e. like Make does). And invalidate all definitions for any name that gets any definition. I'd guess accomplishing this would cut the startup time greatly; the end-user code rarely redefines (at least intentionally) anything that's in the packages, and vast majority of time is spent (re)compiling the packages themselves.

I'm sure there are complications with type inference. But I'd be willing to pepper some explicit typing in my code if it means I don't have to recompile it every time I run it. Binary of a method with concrete types should at least be trivially cacheable (given no library changes between runs).

> This is being actively worked on though.

It's been worked on for as long as I've known of Julia. AFAIK there's still absolutely zero logic on caching compilations of "end-user-stuff" (as opposed to stuff like package precompilation). I don't think this is necessarily due to technical issues, but because the community says that REPL (or notebook) is the only way of using Julia, and those don't suffer from the problem that much (Revise.jl breakage notwithstanding).

Technically it's probably very difficult to do "perfectly", and I'm thinking this is how the compiler devs want to do it. I'm not sure they even mean persisting-between-runs caching when they say "caching" in compiler related discussions. It may well be just some run-time caching of some compilation artefacts that are now compiled multiple times. And that would probably not have that dramatic performance gains for the re-run case.

For an AOT compiler Julia is clearly fast enough. There are probably no easy tricks left to make it a lot faster. But re-run performance doesn't need faster AOT, it just needs the compiler not to recompile the same identical stuff every time.

eigenspace · on Feb 14, 2021

> But the effect is still that any changes up-file will be always reflected down-file? If so, I don't care how it's implemented (given it's fast enough and doesn't break), the semantics is the point.

Yes, I was just bringing this up because it means that various things can be significantly faster, causing you to experience less latency than you normally would by re-running a whole file.

As to the rest of your most, I agree it'd be interesting to see a more quick and dirty solution. It appears that everyone who has the know-how to do this wants to 'do it right', so on the public facing side there's very little visible progress.

> It's been worked on for as long as I've known of Julia. AFAIK there's still absolutely zero logic on caching compilations of "end-user-stuff" (as opposed to stuff like package precompilation)

This is not really true. E.g. there's PackageCompiler.jl which does sysimage based caching and works quite well (at the expense of slow compilation and large binaries), and briefly there was StaticCompiler.jl which did good small binary compilation but then bitrotted quite fast.

All of our CPU compliation stuff is built using a small binary, static, AOT compiler (currently hosted in GPUCompiler.jl) and it's quite reliable. There's active work being done to make this work on the CPU again (basically a modern version of StaticCompiler.jl). So while I feel your frustration that this has been 'coming soon!' for a long time, progress has been made. The new compiler hooks for version 1.6 are partially designed to make this whole process less hacky and easier to iterate on.

jampekka · on Feb 15, 2021

Nice to hear about the progress. I did read up somewhere that AOT is already possible for GPUs. But I actually like the "just-in-time AOT" for development. For deployment a real AOT would be nice (but for the short term can be even something like precompiled blob with embedded runtime).

It would be huge if Julia could be compiled to shared objects with e.g. C interface. I don't even care if they are bloaty or hacky. Any way of accomplishing this would be an instant boost for using Julia in production. And would go beyond anything even close to Julia's productivity.

I think Julia people may underestimate the potential Julia has as a general purpose language, and overestimate the short term efforts to make it happen. Just add some hacks like AOT caching and any way to call with CFFI and it would go like wildfire.

I understand that most of Julia's community is about crunching data, and that's what I do most of the time too. But with that background it's probably not that clear how dire the situation in more general development is. An expressive, reasonably performant and interoperable language would be revolutionary.

orbifold · on Feb 15, 2021

I and probably many others intend to develop reproducible figures. The way to do that in python / matlab is to have a script which loads data from disk and then produces a figure (a png / pdf). You then execute that file many times each time tweaking one aspect of the figure. Julia makes that workflow almost impossibly slow.

cbkeller · on Feb 15, 2021

Then write a function that does everything your script would do in the clean local scope of that function, and call it many times as needed. I mean heck, that’s a more elegant solution even if script latency wasn’t in the equation.

jampekka · on Feb 15, 2021

I usually do a clean local scope of the function anyway. The problem is the reloading. Revise.jl works sometimes, but sometimes doesn't, and it makes debugging more difficult (this is difficult enough in Julia as is in my experience). Another problem is having to use the REPL that doesn't integrate as nicely with the rest of the OS as shell.

I don't see why REPLing it is a more elegant solution. With that solution I can call the function form the REPL if I want, but also from the shell if I want. With shell I get the elegance of having a persistent, complete and reproducible description of the state all the time, which can be e.g. version controlled.

eigenspace · on Feb 15, 2021

Why not just write a function? You can do this without restarting tour julia session every time...

orbifold · on Feb 15, 2021

Well to be more precise the script contains functions of course , each of which produces one of the figures and takes the path to the data / data as input. In any case the REPL has no place in such a workflow because I want to be able to check in the script at the end and have a reliable way of reproducing any of the figures I put into the paper later.

eigenspace · on Feb 16, 2021

You could literally just stick your script into a function and have it be just as reproducible (well, probably more reproducible) and never need to restart julia...