> But they do because it's not just syntax -- the language and runtime are really a part of the algorithm.
I don't think you understood my statement above—what you're attributing to the runtime (memory-model et al) is part of the abstract machine specified by the language's semantics, but that's orthogonal to the runtime.
A runtime can be a better or worse fit for a given language's semantics, but any VM can virtualize any abstract machine with enough glue. VMs are Turing machines, after all. (In practice, the overhead can be surprisingly low; you don't have to emulate what you can trace or dynamically recompile. The work around ASM.js is rich in data about how to squeeze performance out of the not-particularly-direct mapping of the C abstract-machine to the JS VM.)
The abstract machine formed by the language semantics can, of course, demand live support machinery of the underlying runtime (like a GC, say) that may have to be "polyfilled." This is what we've been doing forever—a PIT signalling a processor interrupt which is checked for between each cycle is just a polyfill for having concurrent processes on other processors sitting in blocking-sleep for a bounded time, for example. Any hardware emulator is full of these. Usually you can find ways to make them less necessary—IronRuby doesn't contain its own GC with Ruby semantics, it just translates Ruby's GCing requirements into calls to the CLR allocator+GC and the result works "well enough."
> Not only marshalling/demarshalling of data, but fanning-in and then fanning out your concurrency on each end.
Why are you assuming message-passing distribution? Shared-memory cross-runtime distribution works too. That's how ZeroMQ works, for example.
> what you're attributing to the runtime (memory-model et al) is part of the abstract machine specified by the language's semantics
OK, I see where you're going with this. But now you'll need a "bottom representation", which, even if it is easier to transform than machine instructions, will be pretty hard to decompile to another language. For example, most interesting lock-free algorithms require some sort of garbage collection; how do you describe that GC behavior, which doesn't necessarily need to be general purpose, in a way that can be ported to both C and Go?)
Theoretically it may be doable, but in practice the problem is very, very hard.
The big question, then, is why bother? The JVM already provides excellent interoperation with excellent performance that can cover the vast majority of (at least) server-side applications out there, and Graal/Truffle are extending the range of languages that the JVM can provide the fastest implementation for (it's early days and it's already pretty much on par with V8 when running JS, and faster than PyPy for Python). Those applications not within this profile range will use other languages (like C++), but those languages are already more expensive to develop in, and their developers happily pay more for more specialized algorithm implementations.
> Shared-memory cross-runtime distribution works too.
That's true (provided both sides can agree on an ownership and concurrency behavior).
I don't think you understood my statement above—what you're attributing to the runtime (memory-model et al) is part of the abstract machine specified by the language's semantics, but that's orthogonal to the runtime.
A runtime can be a better or worse fit for a given language's semantics, but any VM can virtualize any abstract machine with enough glue. VMs are Turing machines, after all. (In practice, the overhead can be surprisingly low; you don't have to emulate what you can trace or dynamically recompile. The work around ASM.js is rich in data about how to squeeze performance out of the not-particularly-direct mapping of the C abstract-machine to the JS VM.)
The abstract machine formed by the language semantics can, of course, demand live support machinery of the underlying runtime (like a GC, say) that may have to be "polyfilled." This is what we've been doing forever—a PIT signalling a processor interrupt which is checked for between each cycle is just a polyfill for having concurrent processes on other processors sitting in blocking-sleep for a bounded time, for example. Any hardware emulator is full of these. Usually you can find ways to make them less necessary—IronRuby doesn't contain its own GC with Ruby semantics, it just translates Ruby's GCing requirements into calls to the CLR allocator+GC and the result works "well enough."
> Not only marshalling/demarshalling of data, but fanning-in and then fanning out your concurrency on each end.
Why are you assuming message-passing distribution? Shared-memory cross-runtime distribution works too. That's how ZeroMQ works, for example.