Why physicists still use Fortran (2015)

thicknavyrain · on Oct 15, 2017

"Professors usually have this legacy code on hand (often code they wrote themselves decades ago) and pass this code on to their students. This saves their students time, and also takes uncertainty out of the debugging process."

This is so true. I'm a PhD student in physics using Fortran for pretty much that reason. At the start of my PhD, in response to my supervisor telling me I should learn Fortran to modify our current codebase, I asked if I could rewrite what I'd be working on into C++ first, since I was already familiar with it and wanted to bring future development into a more "modern" language.

His response was "You could do that and it would probably be enough to earn your PhD, since it'll take you at least three years. But I suspect you'll want to work on something else during that time".

He was right. I later learnt one of our "rival groups" attempted the same thing and it took three phd students working full time for a year to rewrite their code from fortran to C++.

osteele · on Oct 15, 2017

"Within a month of his arrival, Randy solved some trivial computer problems for one of the other grad students. A week later, the chairman of the astronomy department called him over and said, “So, you’re the UNIX guru.”

"At the time, Randy was still stupid enough to be flattered by this attention, when he should have recognized them as bone-chilling words. Three years later, he left the Astronomy Department without a degree, and with nothing to show for his labors except six hundred dollars in his bank account and a staggeringly comprehensive knowledge of UNIX."

– Neal Stephenson, _Cryptonomicon_ (p. 78). HarperCollins.

PoachedSausage · on Oct 15, 2017

Don't forget his girlfriend.

To paraphrase: "Usually a comprehensive knowledge of UNIX and a girlfriend are mutually exclusive."

LeifCarrotson · on Oct 15, 2017

I am unfamiliar with the passage.

But it describes two options:

1. "A staggeringly comprehensive knowledge of UNIX", three years of domain-specific education, and a network of people who trust you to--no, who depend on you to be able to get things done with that knowledge

2. A piece of paper just like everyone else in the department.

And while the former might be more difficult to make use of, I think it could be much more valuable in the long run.

freshhawk · on Oct 16, 2017

In context, he was underpaid tech support who was taken advantage of and got no grad school education while he paid for grad school. A bunch of scientists who know you as "that guy who fixes my email and doesn't publish or know anything" is not a great network.

newen · on Oct 15, 2017

More like two options:

1. "A staggeringly comprehensive knowledge of UNIX", three years of domain-specific education, and a network of people who trust you to--no, who depend on you to be able to get things done with that knowledge

2. A piece of paper just like everyone else in the department, three years of domain-specific education, and a network of people who trust you to--no, who depend on you to be able to get things done with that knowledge (and probably a girlfriend)

wisty · on Oct 16, 2017

> three years of domain-specific education, and a network of people who trust you to--no, who depend on you to be able to get things done with that knowledge

This might be the case, but he's now like the janitor who unplugs the toilet. His work may be appreciated, and it might be necessary, but it's not respected or well remunerated. I guess he gets a little bit of respect for unplugging more theoretical pipes, but a little respect isn't a PhD.

robotresearcher · on Oct 15, 2017

No, he left without a degree.

blacksmythe · on Oct 15, 2017

A good point, but the text suggests there are ways to get this knowledge.

  >> except six hundred dollars in his bank account

blacksmythe · on Oct 16, 2017

  >> are ways to get this knowledge.

Edit: are ways 'better' to get this knowledge.

jacobolus · on Oct 16, 2017

Not useful if he wanted to be an astronomer, but a much more marketable set of skills if his goal was to find a job in industry. Probably also made a more positive (albeit uncredited/unrewarded) contribution to astronomy than most of the grad students.

guitarbill · on Oct 15, 2017

Trusting legacy code with few users is a dangerous proposition. My roommate was given some "state-of-the-art" code and told to run simulations with it. The only graphical output was postscript (for some reason), so every frame was 150 MiB and took minutes to dump - so usually, this was only done at the end to show the result.

I managed to hack in a step which just dumped the memory of resulting frame to a file, and then we wrote a Python script to read that and produces a PNG.

We combine the PNGs into an animation and show someone else in the department because the supervisor wasn't in that day. "Cool! But, hrmm, those boundary conditions look wrong." Sadly, this was 2/3s into his Masters.

Like much now in Physics, the original code was simply incorrect. This happens all the time, and papers get retracted because of it - well, that's the best case if somebody notices.

At the end of the day, programming languages come with eco-systems, and must be chosen solely as a tool. The problem is not with Fortran, but that often Fortran == outdated development practices and in Physics horrible code hacked on by 10+ people without prior programming experience.

EvgeniyZh · on Oct 16, 2017

Another problem is that most people who wrote this code aren't programmers - they don't write clean code, no tests, etc. They don't really know those are important. Sometimes the code that is used and updated for years looks like a dirty prototype.

I don't know what can be done about it except hiring programmers to write code, which wouldn't be either easy or cheap

guitarbill · on Oct 16, 2017

Well, I guess one moral of the story is that doing image manipulation is easier in Python than Fortran, so it's again using the right tool for the job. And I think the push for Physicists to use Python if possible is good, as the learning curve is less steep. And once you have that tool at your disposal, you might use it more often (instead of Excel).

I can see many projects being improved by providing a Python pre-processor that writes out e.g. a binary config file, the hard-code Fortran/C simulation code reads that, and spits out the simulation results, and then having a Python post-processor that does the pretty stuff at the end.

Academically, it seems that pairing CS undergrads with Physics undergrads to do e.g. a molecular dynamics (MD) course would be cool. The Physics behind MD isn't too hard, and given the right parameters the programming part would be manageable. Then again, CS undergrads aren't necessarily great programmers either...

Clean code and tests are overrated. Version control and a big eco-system is underrated. It's like maths, physicists don't understand maths, they just use it like a carpenter uses a nail without understanding metallurgy, or a programmer uses a CPU without understanding solid-state physics. And that's okay.

cm2187 · on Oct 16, 2017

I would have thought a more natural candidate for scientists would be languages like Java or C# rather than python: you get both a language that is relatively easy to write, with no memory to manage, no hornet nest of pointers on pointers, nice debuggers, lots of 3rd party libraries, while a the same time getting the performance of static typing / running compiled code once the program has been JIT (which for long running code is a negligible cost). It's never going to be as fast as a C++ or Fortran, but you get a compromise between performance and ease of use, while I understand python is going the other extreme, i.e. all ease of use at the expense of performance. Doesn't matter for a simple simulation, but for some large data analysis projects, I assume it could make a big difference.

But that doesn't seem the case. The choice seems to be between python and c++/fortran. Does anyone know why?

asfdsfggtfd · on Oct 16, 2017

1. Nice bindings already exist for numerical libraries in python. It is more accurate in some ways to say that scientists are using a domain specific language based on numpy than they are using python.

2. The choice is either "fastest possible" or "I just don't care how long it takes" - for either development or run time. There is usually no middle ground.

3. Tooling - the scientist is most likely using a text editor not an IDE (especially when they start working). Fortran and python are both low enough on boilerplate to not need IDE support.

4. Abstraction level. Scientists in general don't bother care about abstraction level at all. Procedural programing's abstractions are usually more than enough for them (and may actually be the correct level for some numerical work - think cache misses vs. hits).

5. Some areas of science do use Java (check out imagej for example).

sametmax · on Oct 16, 2017

Why not using both ? You can reuse existing fortran code withing Python, and benefit from the prior experience and speed while enjoying the qualities of Python.

There are several ways to bind Python and fortran together. E.G http://arogozhnikov.github.io/2015/11/29/using-fortran-from-...

EvgeniyZh · on Oct 16, 2017

If you don't have clean code and tests, vsc would allow you to switch between old and new bad, probably non working code. That's nice, but probably not what you'd want. If I'd need to choose one out of all, I'd choose tests. From my experience, it's nearly impossible to implement big system without tests. Maybe I'm just not good enough as a programmer

cryptonector · on Oct 16, 2017

It's going to have few users anyways, since it's so specialized. And that's also why it will be legacy.

lostboys67 · on Oct 16, 2017

They where not back checking against real word experiments?

dagw · on Oct 16, 2017

They probably did at some point in the misty past. The problem is that running relevant experiments is often very expensive (otherwise they wouldn't need to do it in software) and they can only test a tiny handful of cases. More often than not your calibrating your model against another model that was calibrated against another model that was calibrated against a set of simple experiments done in the 60's.

MaxBarraclough · on Oct 16, 2017

I've heard horror stories of supposedly quite serious software modelling programs behaving quite differently when their input problem was rotated a few degrees.

dagw · on Oct 16, 2017

I've heard horror stories of supposedly quite serious software modelling programs behaving quite differently when their input problem was rotated a few degrees.

Sounds perfectly normal :)

The standard advice here at work for one of our modelling packages is: if it crashes try rotating your input model by one degree and try again. Most of the time that will fix it. If it doesn't fix it try one degree in the other direction.

MaxBarraclough · on Oct 16, 2017

Do people publish work based on this thing?

Is correctness somehow assured provided it doesn't crash? Or is the whole thing just a joke?

dagw · on Oct 16, 2017

Do people publish work based on this thing?

I'm sure lots of people do. We're a commercial company selling consultancy service around this software (technically around some inhouse tools built on top of this software), among many other things.

Is correctness somehow assured provided it doesn't crash?

On the whole well tested PDE solvers either crash, give answers that are off by several orders of magnitude, or give a correct answer. (If the answer they give is relevant to what you're trying to model is left as an exercise to the reader)

We're reasonably sure that, if the calculations converge, the solution provided is a correct numeric approximation (within the error bounds given) of the PDEs we're claiming to solve. We also believe that the PDEs we're using provide a reasonable balance between modeling what we claim we're modelling and our computation running in a reasonable amount of time.

lostboys67 · on Oct 16, 2017

Ugh - I remember spending a lot of time checking our pump simulator module against real word experiments back when I worked in technical computing

sidlls · on Oct 15, 2017

Yep. Most of my programming classes were either in pascal or c when I was in undergrad, but in my physics courses, all the way through my graduate education, I used FORTRAN. It's because physics doesn't change that much over the years, FORTRAN code that is time-tested and that works exists and there's no need to re-invent the wheel in another language when more interesting and important problems exist to solve.

scottlegrand2 · on Oct 15, 2017

David Baker's Rosetta code, which made him a pile of money, is IMO a spectacular example of craptastic C++ written by people who really didn't understand C++ but didn't let that deter them from using every single feature of the language, badly. Some years back we tried to port it to CUDA but there were so many levels of indirection, dereferencing, and virtual functions that it was nearly impossible to make any progress.

In contrast, porting the 30+ year old molecular Dynamics package AMBER to CUDA took about 3 months and probably established me as a CUDA expert. In my opinion its well-maintained Fortran 90 code was far easier to understand and refactor.

While my primary languages today are C++ and CUDA, there is something clean about Fortran when it comes to understanding underlying algorithms. I have a similar opinion about well written C code.

jacquesm · on Oct 15, 2017

That's the whole trick. Good code makes complex stuff look simple. Bad code makes simple stuff look complex.

derefr · on Oct 15, 2017

I wonder if you can formalize this. Assume TDD, and then measure the compressed size of the codebase vs. the compressed size of the test suite. Code is "worse" when it requires more "stuff" (informational entropy) to do less "stuff" (pass test cases.)

The compression would presumably remove the redundancy of the language itself as a factor (including differences in idiomatic cyclomatic-complexity "depths" of various stdlibs), and also remove any redundancy in the way the test cases were specified. So it'd be down to a measure of how much circumlocution and over-engineering you did in the process of implementing the solution.

I'd worry slightly that code-golf solutions would be rewarded, though. Maybe pass everything through an obfuscator + linter before computing the metric, so that things like identifier lengths and spaces aren't considered.

aswanson · on Oct 15, 2017

That quote should be the metric by which code/languages are judged.

S4M · on Oct 15, 2017

How do you measure the complexity of the code, and most importantly, of the problem that the code is solving?

aswanson · on Oct 16, 2017

Complexity analysis.

WalterBright · on Oct 15, 2017

> using every single feature of the language

I constantly run into code where someone takes an obscure corner case in the language, and builds an entire edifice around it.

runiq · on Oct 16, 2017

Oh, you're that Scott Grand? Allow me to thank you for the work you did on pmemd.cuda, my old research group wouldn't be where they are now without you.

scottlegrand2 · on Oct 16, 2017

You're welcome, it was ironically based on ideas drawn from a C Port of the Amber potential function I wrote back in grad school. While I'm not proud of the code these days, I put it on sourceforge a long time ago.

https://sourceforge.net/projects/folderol/

Why write the port in the first place? Because back in the days of yore, I was doing the equivalent of adversarial search to try and design a better potential function for predicting protein tertiary structure. I ultimately arrived at the result that there were too many adversaries to make linear models and single hidden layer neural networks work.

And unfortunately, my postdoctoral advisor at the time didn't consider this publishable research.

ajuc · on Oct 16, 2017

Maybe it's simply the fact that the C++ code was object-oriented?

OO allows you to make a much worse mess than procedural programming in my experience.

tiborsaas · on Oct 15, 2017

> which made him a pile of money

May I ask how? I don't see ads or any other obvious monetization channels.

fnord123 · on Oct 15, 2017

Rosetta is a protein folding bit of software: https://els.comotion.uw.edu/express_license_technologies/ros...

GP is not talking about http://rosettacode.org/wiki/Rosetta_Code

StudentStuff · on Oct 16, 2017

Its a product of the UW, so no surprise the quality is crap. IMO their Cybersecurity course (which is mandatory for CS students) is the biggest load of shit I've ever seen in a class.

The head of information security at the UW calls the PSTN bulletproof secure, but VOIP insecure, then babbles on about how he shares classified info with his buddies casually (happens to be a felony to share said info).

Needless to say, UW's Avaya phone system barebacks the internet, with no regard for using silly security things like TLS or SRTP.

ChuckMcM · on Oct 15, 2017

Now you need to do a follow on study to see how much science the 'rival' group does with a more modern codebase than your 'legacy' group does. Would you know if there are enough examples of two groups who have diverged like this to get meaningful (as in statistically significant) results on the cost benefit of porting / not porting?

sampo · on Oct 15, 2017

> Now you need to do a follow on study to see how much science the 'rival' group does with a more modern codebase than your 'legacy' group does.

I would guess that a C++ codebase written by PhD students, not by seasoned C++ experts, is more complicated and much slower to debug, than a corresponding Fortran codebase.

Turing_Machine · on Oct 16, 2017

Can verify that if you give numerical methods people C you may get wonders like (e.g.) main() functions that are 30 pages long.

No, I'm not kidding.

I shudder to think what they'd have done with C++. :-)

hzhou321 · on Oct 16, 2017

Don't be so assumptive that a 30 page long main function is bad code. The code still can be well organized and readable. Function is not the only method to organize code. Knuth's literate programming for example is invented to better organize code.

Function has its side effects. It's use as only once case is really questionable, justified by lack of other means.

ska · on Oct 15, 2017

They are both typically pretty bad - but there are many more ways to shoot yourself in the foot in c++

tragomaskhalos · on Oct 16, 2017

The very fact that the article itself, with a completely straight face, has all that malloc nonsense in there as "the way you have to do it in C++" is very telling of how easy it is to get the wrong end of the stick with C++ unless you are au fait with the history of the language and its evolution to better more modern idioms

adrianN · on Oct 15, 2017

Well, that Fortran codebase probably also was written in large parts by PhD students, not export Fortran programmers.

sampo · on Oct 15, 2017

Fortran is a much simpler and safer language than C++. And faster to learn. Fortran is even somewhat simpler than C or Java, whereas C++ is probably the most complicated language in the known universe.

So especially in the hands of non-experts, Fortran should produce less bugs.

HelloNurse · on Oct 15, 2017

C++ is complex. Perl is complicated.

aswanson · on Oct 15, 2017

Funny as hell. Perl, though.

sitkack · on Oct 15, 2017

C++ is a more efficient language to write difficult to track down bugs in than Fortran.

privong · on Oct 15, 2017

> Now you need to do a follow on study to see how much science the 'rival' group does with a more modern codebase than your 'legacy' group does. Would you know if there are enough examples of two groups who have diverged like this to get meaningful (as in statistically significant) results on the cost benefit of porting / not porting?

I don't know of any studies like this. But there are two aspects to consider. One is what you've pointed out, which is the long-term scientific gains and the productivity of the research group as an entity. But the other aspect is whether those students who rewrote the code into C++ were more or less successful than the students who weren't rewriting code.

Obviously, how you judge success in the latter will be tricky, since there may be differences in the interest of students who rewrote the code and their desired outcome after the PhD (e.g., stay in academia versus go into industry).

ChuckMcM · on Oct 16, 2017

That is a good point, if the majority of the code is living in Fortran then there is definitely some value in understanding large legacy Fortran code bases.

privong · on Oct 16, 2017

This is true. Though I realized I wasn't specific enough in what I meant. I was more thinking about the limited available time to students and that rewriting code might mean the students do less science and so are less able to obtain a good postdoctoral position (or whatever position typically follows a PhD in the particular field).

quantum_magpie · on Oct 17, 2017

True, but the students that re-implement these codebases also might have a much greater understanding of the underlying techniques, theory, and the interaction between them; compared to others who produce more results, but use these codebases as black boxes. I think both pathways lead to equally good academic careers, they just branch out into separate paths.

yjftsjthsd-h · on Oct 15, 2017

I would also really want to see a comparison of bugs and error rate, although that's probably a nightmare to try and work out.

serf · on Oct 15, 2017

His response was "You could do that and it would probably be enough to earn your PhD, since it'll take you at least three years. But I suspect you'll want to work on something else during that time".

as someone who had contact with that codebase, do you have any insights as to why that is?

Was it the sheer size of the thing? Was it some nuance that Fortran had as an advantage over other languages? was the math just difficult to follow?

Genuine curiousity.

diminish · on Oct 15, 2017

I have reimplemented all Fortran code of my professor at CERN in C during several undergrad classes during my Physics major. Then I have quited physics and became a developer.

thicknavyrain · on Oct 15, 2017

Pretty much just size! There was about 20,000 lines of it, it's legacy code that has been gradually added to since the 1980s, so it would have taken a while to rewrite all the parts to work with each other. Perhaps some day though.

pbhjpbhj · on Oct 15, 2017

What's the code do? [physics grad level description would be fine]

It must surely be generic equipment management, you're not running experiments in physics that are > 30 years old, surely?

euyyn · on Oct 15, 2017

Started in the 80s doesn't mean the code is 30 years old, as they keep adding to it and modifying it in Fortran. Plus most of the analysis part is probably just math routines that don't need modification ever after.

pbhjpbhj · on Oct 16, 2017

Doesn't mean it isn't c.37 years old either ... I think we need specifics to progress this conversation usefully.

rxhernandez · on Oct 15, 2017

20,000 lines of code really doesn't seem like that much. I probably output that much in about 2-3 months of biomedical research so there has to be more to it than that.

jlg23 · on Oct 15, 2017

>> legacy code that has been gradually added to since the 1980s

> 20,000 lines of code really doesn't seem like that much.

It's not "20k lines of code" but "n lines of code that grew organically over decades to 20k lines" - with the help of probably way more than 100 people who all are not trained as developers. I think it's a safe bet to say the current state only has a faint memory of being a consistent code base.

geezerjay · on Oct 15, 2017

> 20,000 lines of code really doesn't seem like that much.

Once I spent close to a month hunting down a subtle but nasty bug in a number crunching module which was perhaps around 2k LoC. Writing code is very easy. Verifying and validating number crunching code is very hard and very time consuming.

coldtea · on Oct 15, 2017

20,000 lines of code casually churned out in 2-3 months will be a nightmare to a) understand, b) maintain, much less to rewrite...

cocoablazing · on Oct 15, 2017

It sounds like he’s talking about scripts generated for research, rather than code for components intended to be persistent.

rxhernandez · on Oct 15, 2017

Precisely this.

CalChris · on Oct 16, 2017

20,000 lines of your code is one thing. 20,000 lines of someone else's code is another thing entirely. 20,000 lines of many other people's code can be insane.

shitgoose · on Oct 15, 2017

20k of fortran code is a lot. there are no beans/interfaces/adapters/mediators/listeners - just the code that does business.

walshemj · on Oct 15, 2017

Only 20k I have worked on bigger Fortran systems doing billing systems for a telco oddly enough.

ptero · on Oct 15, 2017

To do such a rewrite one needs domain knowledge much more than you need software engineering knowledge. This means you need to get research physicist to contribute major effort into such a rewrite. To get there one needs to make them see a clear benefit of translating a system (that works fine today, thank you), from language A to language B.

A Ph.D. student or a young researcher might be more interested if a rewrite lets them run experiments using Amazon / CUDA / whatever for a few thousand dollars instead of spending scarce grant money on dedicated hardware.

myrandomcomment · on Oct 16, 2017

They are talking about large scale massively parallel code to run on a super computer. Most grad students in physics that need it have have access to a system much faster then Amazon. For example my friends code runs on Cori @ NERSC which is a 622,336 core Cray.

http://www.nersc.gov/systems/cori/

I completely agree on the domain knowledge statement. It is 100% required more then the CS degree.

njharman · on Oct 15, 2017

I'm taking (educated) guess, because people involved estimating rewrite where all physicists and not developers. And certainly not developers with 5+ years of experience. Which is minimum I'd trust with archtechting major rewrite.

ptero · on Oct 15, 2017

Yes, C++ does not buy you that much. Sure, it is a more modern language, but when used by scientists (not software engineers) the advantages are hardly earth shattering. Much of complexity is hidden (as it should be) into libraries.

IMO, Python stands a better chance of breaking the Fortran's lock on physics related computing. Give it a few more years and enough numpy-based libraries might make Python a real competitor. My 2c

dsacco · on Oct 15, 2017

I have to be honest with you, as someone who writes in both C++ and Python, I really do not see Python being more of a candidate than C++ for displacing Fortran.

Can you clarify why you think Python might be able to do it? For scientific computation with high performance requirements, Python is not competitive with Fortan or C++. For work that continues to happen in Fortran due to "academic inertia", my impression (and vague experience) is that researchers find the convenience of what they're accustomed to (Fortran) to be greater than the convenience of things like rapid prototyping offered by the various Python scientific computing and data analysis libraries. There is a mental overhead is switching, and I think most academics are sort of okay writing code in whatever is familiar and battle-tested if it means they can focus more on the research at hand.

In other words, I'm not saying you're wrong, but I'm not following your reasoning.

ptero · on Oct 15, 2017

A disclaimer -- I do not work in physics, however a couple of recent physics Ph.D. that I work with expressed similar views.

Speed is not nearly as important now as it was 10-15 years ago. A typical scientist's worstation has 20 CPU cores. If I want to use 100 CPUs for a few days, it is trivial and 1000 is easy to get. Thus the fact that Python is slow(er) does not bother me unless I am setting up something major.

What matters though, is the availability of libraries that let me reliably run my experiments. If there is a bug, it must be in my code, not the library -- a "discovery" caused by a software bug is humiliating. This is where Fortran shines and Python is not quite there yet -- the decades of beating on those libraries made them very well understood and trusted.

DannyBee · on Oct 15, 2017

"Speed is not nearly as important now as it was 10-15 years ago." Please try telling that to other users sometime. They do not seem to agree.

ptero · on Oct 15, 2017

What users? I do work in research and this is what I see almost every time -- doubling the number of CPUs available for a simulation is trivial; getting someone to rewrite existing code to make it twice as fast on the same CPU is hard.

I am not talking about interactive programs -- a slow browser is annoying. I am talking about scientific computing. This is just my experience, can you provide some counterexamples?

lostboys67 · on Oct 16, 2017

So more cores means you can simulate to a higher resolution

cknight · on Oct 15, 2017

I worked as a programmer in a molecular dynamics research group for a while. I was asked to work in Python because it was what the boss was familiar with - so there's some of the same inertia happening again, just with a new(er) language, I guess.

Speed is not a huge issue if you're happy to leave your simulation running overnight anyway, or if you have the option to just throw more and more cores at the problem (or in our case, both). The goal is to get your papers published. Code developing/interpreting/debugging time is generally far more serious an obstacle to that goal than simulation speed. I was the only developer there. Everyone else in the group coded on an almost daily basis, but none of them considered themselves programmers, and very few of them actually learn about good programming practices. They're biophysics researchers.

Having not worked in Python much before I was also pretty pleased to find that, having determined that I needed to use Dijkstra's algorithm for path-finding and then working out the smallest Standard Deviation between certain sets of data points, Python came with libraries to do both of those things off the cuff. It's just so easy, I can see why it has a favoured place in this field.

War_Machine · on Oct 15, 2017

As long as Fortran is able to keep up with Python's capabilities (or any other language's for that matter), there seems to be absolutely no reason to make a change considering Fortran's history and familiarity in Physics. In other words, other languages need to make a big enough leap forward or Fortran to lag behind enough to justify a change. Your example just shows that it's fine to use other languages, and I agree, but it's not very compelling in promoting a change in language.

hzhou321 · on Oct 16, 2017

> Speed is not a huge issue if you're happy to leave your simulation running overnight anyway, or if you have the option to just throw more and more cores at the problem

What about those cases that it's the matter between throwing at 1000 core cluster and wether we can have results before next conference in a couple months. That is what really defines the scientific programming -- it's about feasible and infeasible. For other jobs, isn't it just matter of taste?

lostboys67 · on Oct 16, 2017

Depends not all scientific programming is at universities

when I worked for BHR group (a Hydro dynamics research organisation) some times we had emergency projects that had < 24 hour turn around. I recall one where a clinet had had a serious issue at a plant and we ran the simulation and produced a report in a single day

harigov · on Oct 15, 2017

Just because python is slow doesn't necessarily mean that every library is also slow. With numpy/scipy, which are written in C with python bindings, scientific computation is pretty fast. Python is fast becoming standard language for data science with so many tools like notebooks that make it extremely easy to do many things. I am not sure how well it compares to Fortran but it is no brainer to use it over C++, especially for research needs.

CardenB · on Oct 15, 2017

Isn’t numpy written in Fortran?

dangerbird2 · on Oct 15, 2017

The native code python extensions for numpy are written in C, but numpy often links to matrix libraries written in Fortran like BLAS and LAPACK

Redoubts · on Oct 16, 2017

Scipy is where a lot of the fortran is.

brei · on Oct 15, 2017

I don't see Python displacing Fortran or C++, but I do see Python showing up in HPC as a platform for DSLs. Instead of handling the expensive part of the computation, Python acts as a UI layer / configuration manager / code generator. For an example, see fenics: [https://fenicsproject.org/]

vacri · on Oct 15, 2017

In a previous discussion of this kind (I'm not in this area myself), a scientist had said that the speed of the languages is less important than iterating the algorithms, and it was a lot easier for non-developer scientists to work with python than with C++.

dralley · on Oct 16, 2017

>Give it a few more years and enough numpy-based libraries might make Python a real competitor.

A lot of Numpy is written in FORTRAN.

dagw · on Oct 16, 2017

Hardly any of numpy is written in Fortran. It's basically all C and Python. Numpy does however links to and provide wrappers around some existing Fortran libraries.

quadruplebond · on Oct 15, 2017

Potentially your "rival groups" will have a long term advantage now though. This might be bad for the graduate students that did the work, but could be good for the professor and group over the next decade.

Graduate students need to think in terms of 4-6 years while forward looking professors might want to think in terms of 5-15 years.

thicknavyrain · on Oct 15, 2017

I do wonder about this a little, but so far I haven't seen anything implemented elsewhere in my field that hasn't been possible to do with our existing code. In fact, I have spent a fair bit of my time replicating others' results with our code and getting sub-percent level agreement.

However, I guess one drawback is that a lot of the things we currently implement are all written from scratch (for very standard things such as numerical differentiation and parameter optimisation), which has the advantage of having "control" and more understanfing over the code, but less time saving/potentially not as efficient as using pre-existing libraries.

quadruplebond · on Oct 15, 2017

I agree, it is very project/field dependent. There are times when a painful redesign might pay off in the future, but I am sure there are also times when it is a waste of effort. In your case a redesign might have been a waste, you are in the best position to judge this.

ryanpepper · on Oct 15, 2017

I use Python for most things and write C extensions which can use OpenMP or CUDA and which are hooked in via Cython for the slow parts. Find this works well, although it can lead to you duplicating things unnecessarily sometimes (needing a C function to be callable by Python requires you to write a wrapper for it).

PascLeRasc · on Oct 15, 2017

That was without a doubt pretty bad for those grad students' careers. That kind of work shouldn't have been done by anyone without tenure.

thicknavyrain · on Oct 15, 2017

I might actually disagree on this one. If it was the first year of their PhD, then yes, it does seem like grunt work which is dissatisfying, but they will have fully learned the ins and outs of their group's code framework.

In the process, they would have understood nearly every approach taken by the group towards producing the results that it does, and I bet that has helped them when they've ended up modifying that same code later for their research. And with fewer supervisor meetings to work out exactly what X, Y and Z part of the code does because they will have worked on it themselves.

I'd say I've easily spent 6 months just spending time getting my head around all the code we work with anyway, so at the very least, I hope that made their following years of research more productive.

robotresearcher · on Oct 15, 2017

Or you could spend the time reading the papers of many groups.

quadruplebond · on Oct 15, 2017

Not necessarily. It was definitely bad if they want to become academics and needed to publish papers. But if they knew from the start what they were getting into and wanted to transition from physics to software development then it might not have been so bad (although there might have been a more optimal path).

A bad advisor wouldn't care what his students wanted, but a good advisor might still have students work on this kind of project as long as they were aware it wasn't going to help them get a tenure track position in the future. If they wanted to go work at a national lab doing HPC work and programing it might have been plenty ok for their career (This is what I am transitioning to now) or if they want to go work for a hedge fund or apple it might also be an okay option.

dagw · on Oct 15, 2017

I don't know. Having "rewrote X kloc of scientific Fortran to modern idiomatic C++" on your CV should get you to the head of the line in many places when looking for a job.

roel_v · on Oct 15, 2017

Only if you go after run of the mill coding jobs after your phd, in which case why bother at all. When applying for a postdoc, you'll very much want to bury that part of your work.

cocoablazing · on Oct 15, 2017

Plenty of high-paying quantitative jobs requires such a degree and experience. The majority of phd’s don’t go on to support their family with a tenure-track academic salary.

dagw · on Oct 15, 2017

Or you could be applying for a HPC research job in one of the countless non-university setting that also does that sort of thing. Then having both a relevant PhD and some relevant hands on experience will be extremely helpful.

0xbear · on Oct 15, 2017

Why would you rewrite something that works well instead of just linking to it?

quadruplebond · on Oct 15, 2017

Because you need to modify or extend it in some nontrivial way.

felipemnoa · on Oct 15, 2017

>>I later learnt one of our "rival groups" attempted the same thing and it took three phd students working full time for a year to rewrite their code from fortran to C++.

I wonder if there are any transcoders for Fortran to C++? I wonder if there is even a market for something like that? I've written alpha versions of transcoders for C to Java and Java to Objective-C and I think I could do the same for Fortran to another language but why?

Writing a transcoder first is the right way to do it and it should take you a couple of months to do it, especially if you already have some experience with this kind of stuff. Definitely less than a year for a single person. I would not try to translate the code by hand. It would be a never ending project full of bugs.

Edit: Apparently there are plenty of Fortran to C++ transcoders. Here is just one I found during my google search: http://cci.lbl.gov/fable/

No idea how good they work.

dwc · on Oct 15, 2017

When I was working for LROC, most of the planetary geologists used IDL (Fortran-ish in many ways) but there was still plenty of actual FORTRAN floating around. Oh, look, in the last couple of years this legend of photometry worked up this new method and here's the associated Fortran code. That kind of thing.

I did actually rewrite both IDL and Fortran, but it was always smallish, single-purpose programs or functions.

Note that the SPICE Toolkit[1] is still written in Fortran today, translated to C using f2c, and the C version used as the base for other language support. The Fortran part is unlikely to ever go away, since support for processing past missions is crucially important and all that processing code was written in Fortran. Also, talk about your stable, backward compatible APIs...

1. https://naif.jpl.nasa.gov/naif/toolkit.html

aleph_naught · on Oct 15, 2017

Just before I left JPL, the SPICE team had announced a plan for a complete rewrite in C++. The SPICE team is exceptionally small (especially considereing how widely used and impactful the software has been). IIRC since the team’s inception, to the time I left, it has been about 4-5 people. Their primary goal has always been stability and correctness over speed. Recalling a conversation with Boris, they have something like 2.5 million lines of test code. So it would take some time to port over. The codebase is probably the most documented I have ever seen, every mathematical deduction is described in great detail.

That being said, as someone who has integrated CSPICE into several C++ and python projects, the modernization would be a very welcome change. The current arch depends far too much on global state, and none of it is threadsafe.

https://naif.jpl.nasa.gov/naif/Toolkit_evolution_R2

dwc · on Oct 16, 2017

> Recalling a conversation with Boris

Good to know Boris is still there! Was Chuck still heading up the team when you left?

> a plan for a complete rewrite in C++

Oh, no! That's a real shame. I'd be all for a rewrite, especially considering their insane test coverage, but that decision worries me.

> That being said, as someone who has integrated CSPICE into several C++ and python projects, the modernization would be a very welcome change.

As someone who worked mostly with CSPICE in C but also with some other non-Fortran toolkits, I agree.

walshemj · on Oct 15, 2017

I recall that a local aerospace outfit was looking for C++ programmers to re do a metric F%^K ton of Fortran into c++.

I actually rang the recruiter to query why would not be simpler to train your existing staff to use Fortran - the add stayed up for years and years I always wonder if it ever got ported.

andai · on Oct 16, 2017

Did you get an answer about why not just train them to use Fortran?

walshemj · on Oct 17, 2017

Not really its very odd as when I worked at on of this organisations peers straight from high school I was told to get a fortran book from the company library and learn it I also had an hours basic instruction in how to boot the PDP11

hilbert42 · on Oct 16, 2017

"...and it took three phd students working full time for a year to rewrite their code from fortran to C++"

There's a salient lesson here. Clearly, this was a stupid decision for two reasons:

(1) there are many thousands of scientific routines that have had 40-60 years of fine honing and careful debugging and they just work! (For instance, we send voyagers—Voyager, Cassini, etc.—to the end of the solar system and they invariably get there; the Fortran routines that get them there do exactly what they're supposed to do (unlike much of today's poorly written C code)!

(2) Rewriting that already-reliable fully-debugged Fortran code into any other language will almost certainly make it far less reliable, thus it's a no-brainer to stick with the original .F source.

Yes, Fortran is simpler than C [consider it BASIC on steriods] and years ago it had long since evolved well past John Backus's 1954 incarnation of the language into solid workhorse that both physicists and engineers use regularly.

Just because something is old and out of fashion doesn't mean that it's broken or doesn't work well. (Longevity ensures that there's been sufficient time for many hands to make it reliable.)

(Oh, BTW, I'm reminded that decades ago when I was just beginning to learn Fortran using punch cards on an IBM360 mainframe with the WATFOR FORTRAN-IV compiler, that I made four errors in only six lines of code. After dutifully printing ERROR against each offending line, the compiler finished with the message: "YOU NEED TO SEEK ANOTHER CAREER " or words to that effect [yes, it was in uppercase]. Eventually, I got considerably better.)

jordanpg · on Oct 16, 2017

It's not just that the code is already written and debugged; in many cases results have been published using analysis done with this legacy software. Using the same code ensures a certain amount of consistency between experiments from the same lab.

From this point of view, rewriting code is an extremely high-risk proposition. As we know, the likelihood of discovering bugs during this process is quite high.

klipt · on Oct 15, 2017

Lawrence Berkeley Lab has a C++ to Fortran compiler called FABLE: http://cci.lbl.gov/fable/

Curious if anyone in your group considered using it?

bluedino · on Oct 15, 2017

>> Professors usually have this legacy code on hand (often code they wrote themselves decades ago) and pass this code on to their students

So cut-n-paste code. The students are running code that they don't really know what it does, it might not even be correct...it's like StackOverflow in academia.

robotresearcher · on Oct 15, 2017

Did you write your own libc and libm? Compiler? We are ALL using code we didn't write.

spc476 · on Oct 16, 2017

Daniel J. Bernstein did that for all of his projects (qmail, daemontools, etc) and I don't see him getting a ton of flack over doing that.

dagw · on Oct 16, 2017

He certainly used to back when qmail vs postfix as the better sendmail replacement was a real debate 'we' where regularly having. If he's not getting any flack today it's most likely because he's fallen more or less into irrelevance when it comes to practical day to day operations.

bluetwo · on Oct 16, 2017

Why isn't Julia a good choice?

smcdow · on Oct 15, 2017

Speaking from a government contracting point of view: Nobody is going to pay you to rewrite existing code that's already working. Nobody. The customer doesn't give a flying shit about the implementation. He'd be happy with a box of diodes as an implementation, as long as it worked and came in on time and on budget.

When you're writing up your proposal for a contract or a grant, the theme should always be that you're "adding capabilities" (which should be well-defined and constrained) to the existing codebase. If you get the money, then you've got carte blanche to rewrite to your heart's content - just don't tell the customer that this is what you're doing. Just make sure that those new capabilities indeed make it into the re-write and that you introduce no regressions in the new code.

fnord123 · on Oct 15, 2017

People don't tend to write tests for their Fortran code so the assumption that its already working and the numbers coming out are correct is a matter of faith.

But yes, no one sees it this way.

gmueckl · on Oct 16, 2017

Writing tests for the the kind of numerical code that FORTRAN is usually used for is hard. Sometimes there is no direct way of testing it because if you knew any of the results already you wouldn't need to run the simulation in the first place. Quite often, the best that you can do is proper sanity checks like conservation of energy and momentum or things like that.

fnord123 · on Oct 16, 2017

Yes, it's hard. But without an automated test suite checking the numbers coming out then any change to the code could introduce numerical instability.

ehsankia · on Oct 15, 2017

As with any sort of big transition (changing e-mail, using a new password manager, changing programming language), the solution is always to do it incrementally.

For e-mail, I generally create a new one, and over a year or two, I create new accounts with the new e-mail and gradually move accounts until the old one is seldom used. Similarly here, it may be a bit tricky, and it really depends on how intertwined it all is, but gradually writing new pieces that you're adding in a new language, or using C++ for pieces that you're rewriting. eventually you'll be much closer than trying to do it all at once.

andai · on Oct 16, 2017

This gradual approach is what Mozilla is doing with transitioning Firefox to Rust. Parts of Firefox are already written in Rust.

noobermin · on Oct 15, 2017

The one thing that helps I think is writing of codes that are open source. Yes, it's a sticky point regarding getting funding, but in that imaginary world where you are funded well, transitioning to open codes (save the things that are...export controlled) would be beneficial for all of us.

I hate how I can't publish easily on modifications I make to our PIC code because it isn't open source; eventually I'm planning to switch to another code (and might implement a needed solver for it) just for the sake of my publications.

quadruplebond · on Oct 15, 2017

As someone working on an Exascale project for electronic structure calculations, I have a theory about the longevity of Fortran. It's the fact that many of these codes were started years ago and the people who have the credentials and ability to get funding for super computing projects learned on Fortran and stayed with Fortan because they were scientist first and programmers second.

Modern Fortran has many nice features in 2017, but the people that wanted these features moved to C/C++ long before the features became available in Fortran and those that are left using Fortran are usually scientist, not programmers, and so don't care so much about these features. I think it is largely the older generation that says they will never stop using Fortran, in the survey mentioned in the article.

Just to suggest where the field is moving though. NwChem is a large successful electronic structure package using Fortran. Its next gen version NwChemEx that is being designed for exascale will exclusively be written in C++ (https://www.pnnl.gov/science/highlights/highlight.asp?id=441...).

Also just from experience people who work in HPC mostly would rather be writing C/C++, but use Fortran because they have to not because they want to.

gnufx · on Oct 15, 2017

I makes no sense for people to have moved to C for things that have recently appeared in Fortran. What are the features they've been missing which have appeared in Fortran in the last 10 years?

So why is NWChem being re-written in C++, and what relationship does that have to exascale? Richard O'Keefe said 10 years ago ago "Why not start by rewriting the Fortran code _in_ Fortran? Fortran 90 is a very pleasant language." (NWChem is just in Fortran77 as far as I remember.)

I work in HPC, though I don't write numerical code these days, but I'd definitely prefer to write it in Fortran than C, and I'm not clever enough to use C++. I'd also much rather maintain typical scientific Fortran.

quadruplebond · on Oct 15, 2017

I don’t use Fortran but I agree as of today and probably post 90 Fortran is nice enough but people moved to c++ for things like polymorphism and templates and pointers among other things. You could probably also rewrite nwchemex in Modern Fortran just fine. I’ll give you one example of why C++ might be easier though. Next gen linear algebra libraries are being written in c/c++ such as elemental and dplasma also Cyclopes tensor framework and probably others. When the libraries you want to use are in c/c++ the that can be the easiest path forward.

sseagull · on Oct 15, 2017

Yes, the electronic structure community is rapidly moving away from Fortran.

In my opinion, Fortran will fall behind as newer hardware, libraries, etc, will drop Fortran support. Also, newer grad students are all much more interested in C/C++/Python. I think this is in part because the newer languages are widely used outside of science and therefore there is much more documentation and tutorials/guides. Not to mention that skills in those languages are transferrable to other areas (data science, machine learning, etc).

As a side note: Wow someone working on the ECP on HN? I'm tangentially related to the project (and just visited PNNL last week)

jabl · on Oct 15, 2017

> Yes, the electronic structure community is rapidly moving away from Fortran.

Really? Seems to me that with very few exceptions (e.g. GPAW which is python/C and nwchemEx which I've never heard about until the parent poster mentioned it), electronic structure is pretty much a Fortran bastion.

(Source: I did a Phd doing mostly electronic structure calculations, graduated ~5 years ago)

sseagull · on Oct 15, 2017

Well, it's moving, not moved yet.

New libraries are being written in C/C++, and maybe Python. This includes libraries that should form the foundation of the QM community (matrix/tensor and integral libraries). These are meant to take advantage of newer hardware and libraries which themselves are written in C/C++, and often in a way that is inaccessible from Fortran.

The old Fortran code will be around for a long time, but I don't know of any large-scale, serious efforts to develop new packages or major new functionality that are starting with Fortran.

(I'm not totally against Fortran - I just spend a week devloping in it. But I still much prefer C++ and Python)

gnufx · on Oct 15, 2017

I don't understand why the implementation language of low level libraries should determine a high level language that uses them. In what way are C libraries inaccessible from Fortran, given that it defines interoperability with C?

I'm afraid you need a 10-, or preferably 20-, year perspective, not a week.

sseagull · on Oct 15, 2017

I have been developing in Fortran for years. I just mentioned that as an aside.

About C compatiblity: Many C libraries use pointers in their interfaces. Interoperability is indeed defined by the Fortran 2003 standard, which I have used several times to wrap existing C libraries. However, much of the existing code is F90 only (some even F77...), and a vast majority of Fortran developers in the field are not familiar with even modules and other F90 features, let alone iso_c_binding in 2003.

Also, newer libraries tend to be C++ as well, which is more difficult (or at least more awkward) to wrap.

quadruplebond · on Oct 15, 2017

It’s partly the power of library authoring that is moving things I think. To my knowlegdge most post lapack tensor algebra is all C/C++ also. I don’t know Fortran but I am not sure if it has the same flexibility when it comes to generic code and things like writing things like template expression math codes.

Finally the fact that groups like Facebook and google are writing their machine learning code in C++ shows that 1 they find it useful and 2 plenty performant. This kinda became a response to the comment above yours sorry.

sseagull · on Oct 15, 2017

If you don't mind, I'd like to maybe chat with you a bit and get your opinion on some things (and maybe see if I've actually met you before). My (mainly throwaway) email is ytterbium35 (at major email service run by google).

If you don't feel like it, feel free to ignore.

yosyp · on Oct 15, 2017

Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is written in C++ and it's over 20 years old. The FORTRAN codebases seem to be centered around the finite elements/difference methods and the fluid dynamics community.

sseagull · on Oct 15, 2017

Sorry, I wasn't specific. My comment mostly applies to the quantum mechanics (QM) community, rather than molecular dynamics. In QM, many people still run Gaussian/GAMESS/ADF/MolCAS/MolPro/Dalton, which are all Fortran (or majority Fortran). And most are Fortran 90 or earlier.

My background is in QM, so I guess that's my bias showing through :)

gnufx · on Oct 15, 2017

Even in Molecular dynamics, there's dl_poly, for instance. Other things that come to mind in Fortran apart from nwchem are cp2k and castep.

Somewhere under http://www.archer.ac.uk, there's a summary of the time used by various codes on that UK "Tier 1" system.

quadruplebond · on Oct 15, 2017

There is a lot of electronic structure in c and c++. For SMP there is psi4 and Orca also Garnet Chan has a new python C++ package. Then some widely used integral code generators generate c or c++ (libcints, Libint2)

On the parallel front MPQC has been around a long time and is C++.

quadruplebond · on Oct 15, 2017

Just a grad student, soon to be postdoc, so not a big wig on ECP.

neutronicus · on Oct 15, 2017

Specifically on the topic of HPC, I know of a lot more GPU stuff happening in C/C++ than in Fortran.

cbcoutinho · on Oct 15, 2017

I thought Cuda was it's own language, which is accessed by either C or Fortran through bindings

cubano · on Oct 15, 2017

I have a somewhat amusing FORTRAN story from my undergrad days at the Florida Institute of Technology...

So my first programming class ever was a Numerical Analysis class taught at FIT, and to be honest, this was my first exposure to a "real" editor (vi on a PDP11 in this case)..up to then it was all MS-BASIC with that wonky line editor and, of course, goto-s and line numbers.

At the end of the first class (8am ugh), the instructor announced that anyone looking to get extra credit and perhaps skip having to come to early class, to talk to him after class. Of course that sounded good to me, so I went to see him and he said "ok...if you can write me a bowling league manager in 10 weeks you will get an A and not have to ever come to class."

Ok...hell yes in fact! This sounded a ton more interesting then sitting around a silly class talking about programming. He gave me a spec sheet and away I went to lab to begin my struggles with vi and FORTRAN.

I wasn't easy, but holy shit did i learn a lot...more then I ever could have just doing the exercises in floating-point rounding error and non-linear simulations (I ended up doing that later as well) that were "taught" in class.

I can still remember FORTRAN (77 I believe) has the a very strict formatting scheme where the column had to match the keyword in order of the program to compile or something stringent like that. But mostly, coming from BASIC, it was a breath of fresh air.

I ended with completing the program with extra bells and whistles...sorting, multiple leagues and other things...and the instructor was duly impressed.

I got my A and never woke up before 8am again.

trapperkeeper74 · on Oct 16, 2017

I knocked out a bunch of lower division CS classes at a JC (taking 5 at a time). I think I went to the first class and a few before midterms and finals. Just got the labs and handed them in the next day.

Hurray for attendance-optional JCs! :D

Most of it transferred to an UC and then the fun began:

- caching http/1.0 forking select() proxy server as the third project in a networking class, circa 2002

- Java subset to MIPS assembly compiler

- Reimplement most of the OpenGL pipeline in C++, quaternions and write a trapezoid (scanline to scanline) engine (on which a triangle engine could be built). Oh and then model the interior of the building.

- Pipelined, microcoded, simple branch-predicting processor. Bonus points for smallest microcode and fewest microcycles. (I Huffman mapped the histogram of the sample assembly programs’ executed instructions to the user-defined binary macro ISA (students had to write the assembler too), and then used progressive decoding in the microcode (43 micro ops long microprogram IIRC). Blew the doors off the extra credit in that class.)

noobermin · on Oct 15, 2017

Some of the points brought up here are in fact correct, mainly legacy, testing, and awesome compilers tuned for supercomputers. However, a lot of these "why fortran" articles (on both sides) I find are written by people who don't dabble enough on both sides of the fence, and are ignorant of what either side offers. For example, numpy implements a lot of the stuff from fortran the author listed, like broadcasting operations across arbitrarily shaped arrays, striding and negative indices, etc, not to mention the scipy library that contains leagues of the famous fortran codes...you get all that with a quick and easy to prototype language for the stuff that isn't bottle neck.

Another issue is computational people think C++ is about OOP, ffs, what a way to sell C++ short and ignore the more significant tool C++ brings to the table: generic programming. Whenever I talk to my computational colleagues, they talk about "C++ and OOP" as if they are two peas in a pod; what if I told you you didn't need to use inheritance to leverage the best of what C++ offers (what if I told you you didn't need inheritance to even leverage OOP!?). Templates have the potential to be a powerhouse for performance in codes I feel, just no one in the computational side has leveraged them because they quite simply don't understand it.

The same sort of thing is true for cs people and their critique of Fortran usage, but I'll leave scathing comments on one of those stories that are shared here.

solidsnack9000 · on Oct 15, 2017

> ...not to mention the scipy library that contains leagues of the famous fortran codes...

This doesn't actually make Python an alternative to Fortran, though -- what it's saying is that no one writes the performant code in Python.

noobermin · on Oct 15, 2017

Indeed! But stuff like parsing input files and plotting are tons easier to code in Python than in Fortran. The point is the performant parts are not in Python, but the parts we don't want the difficult baggage can be written in a much easier and kinder language like Python.

DaiPlusPlus · on Oct 15, 2017

C++'s generic-programming feature still has shortcomings - I think functional-programming has more relevance to scientific computing, but C++'s functional features are "okay" but still not as capable or proven as, say, Haskell's or OCaml's - for example for tail-recursion you still depend on the compiler supporting that optimization, you can't force it or necessarily assume it will happen, with fun consequences for your stack if it doesn't.

HelloNurse · on Oct 15, 2017

For scientific computing, whether a language has very advanced or merely okay functional features is only a superficial style issue.

Regarding more important performance issues like low level control of memory layout and avoiding pointless copying and indirection, C++ and Fortran are both at the most effective end of the spectrum, while typical functional languages lie between "don't even think about it, by design" and "it might be OK but only a fool would put a project at the mercy of what optimizations a relatively unproven compiler opts to do".

sidlls · on Oct 15, 2017

> I think functional-programming has more relevance to scientific computing

What do you base this on?

beisner · on Oct 16, 2017

Functional languages force you to be more correct, more often. Eliminates a bunch of classes of bugs which are anathema to scientific computing, and are generally so high level that compilers can optimize extremely aggressively. Also, scientific programming is usually much more about data flow and transformation, which is FP’s wheelhouse.

sidlls · on Oct 16, 2017

Do you have examples where it is the case that FP code yielded a superior scientific result?

DaiPlusPlus · on Oct 17, 2017

Output results would be the same - that's concerned with program correctness, regardless of whether it's written in a function, object-oriented, or procedural paradigm.

It should instead be compared with how long it took to engineer and build the system or program in a particular paradigm and the qualitative engineering aspects of a particular platform. FP may be amazing for certain areas, but a difficulty in basing a large-scale project or business on it would be hampered by the small supply of developers who can comfortably program in it.

gnufx · on Oct 15, 2017

The article is wrong or misleading in a number of respects. For instance, OpenMPI doesn't define the language interfaces -- the MPI standard does. It talks about "no aliasing of memory" -- the rules actually concern "storage association" -- and then claims Fortran passes by reference, misunderstanding the whole thing. The Benchmarks Game is pretty useless generally, but it's clearly useless to compare supposed language speed by using two different compilers anyway. I don't mean to knock Fortran.

PeachPlum · on Oct 15, 2017

Briefly stated, the Gell-Mann Amnesia effect works as follows ...

https://calhounpress.net/blogs/blog/78070918-wet-streets-cau...

foob · on Oct 15, 2017

One of the key points of the article is that there is a lot of legacy code written in Fortran. As a former high energy physicist, I have an anecdote here that some people might find interesting.

There was a library written in Fortran called CERNLIB which included a broad variety of miscellaneous numerical algorithm implementations (e.g. minimization, integration, special functions, random number generation) [1]. I couldn't tell you exactly when the library was first released, but my best guess would be the early 80s. It can't possibly be later than 1986 when PAW was initally released [2]. The field has since transitioned from the Fortran based PAW to the C++ based ROOT since then, but many high energy collaborations still rely on CERNLIB for their own analysis frameworks (keep in mind that many of these experiments had been in planning and development stages for over a decade before they actually turned on).

The thing about this that I find interesting is that compiling CERNLIB has become a lost art and that this fact has had far reaching consequences. The last available binaries were compiled with GCC 4.3 in 2006 and packages are only available for Scientific Linux 4 [3]. This crucial dependency has led to collaborations using extremely outdated Linux distributions and GCC versions in their computing facilities. The majority of analysis code is written in C++, but not even C++11 additions can be used because everything is frozen on GCC 4.3. Nobody can even run the analysis environment on their local machines without resorting to the use of virtual machines running SL4. It was really a nightmare to deal with.

[1] - https://en.wikipedia.org/wiki/CERN_Program_Library

[2] - https://en.wikipedia.org/wiki/Physics_Analysis_Workstation

[3] - http://cernlib.web.cern.ch/cernlib/version.html

gnufx · on Oct 15, 2017

Cernlib is available in current Debian (and also in EPEL 6). I don't know why it's not in anything else under the Fedora banner, and wonder what nasty non-standard stuff CERNLIB does. I don't remember ever having to look at more than bits of it.

I've heard an anecdote of HEP analysis code that was written by a team in C++ and wasn't ready for impending data collection on LHC. Someone apparently rescued the situation by turning up with a working Fortran system he'd written on his own. I don't know details other than which university group the report originated from; I'd be interested to know more about it.

jjdredd · on Oct 15, 2017

Huh, funny thing. I had to use a heavy-ion collision simulation program written in fortran, which had to be compiled using a certain compiler implementation (1). After a 2-3 weeks of debugging and trying different compilers in vein, my supervisor had put me on a phone with a guy that was more successful than me and told me which fortran compiler to use.

(1) Each compiler gave different results: compilation errors, code that goes in an infinite loop.

noobermin · on Oct 15, 2017

For every nth performant, tested fortran code, there are 2^n goto ridden legacy codes.

sitkack · on Oct 15, 2017

Getting locked to a particular compiler isn't usually a quality of the language, but the development team. It happens to all languages where the underlying platform remains fixed a site, it evolves into every capability of the instance, not the spec. This is why it is important to have platform diversity from the start. Run your unit tests on at least two toolchains, hopefully on at least two different distros if not kernels. I like to use GCC/Clang and Centos/Ubuntu as my base platform matrix.

jacquesm · on Oct 15, 2017

One good example of this is to run your code on both 32 bit, 64 bit, big endian and little endian machines of all those combinations. That works pretty good as a way to keep you on your toes with respect to portability.

dguest · on Oct 15, 2017

What collaborations are you talking about?

The bigger experiments at CERN use C++14 and python quite a lot. There's still a bit of wrapped FORTRAN code kicking around, and we definitely use rather old school distributions, but I haven't seen anything as bad your anecdote.

oconnor663 · on Oct 15, 2017

> The benchmarks where Fortran is much slower than C/C++ involve processes where most of the time is spent reading and writing data, for which Fortran is known to be slow.

Why would IO be slow in any language? What does the language have to do besides buffering and system calls?

> In Fortran, variables are usually passed by reference, not by value. Under the hood the Fortran compiler automatically optimizes the passing so as to be most efficient.

Aren't arrays implicitly passed by reference in C also?

tyingq · on Oct 15, 2017

>Why would IO be slow in any language?

I believe many Fortran implementations default to unbuffered io.

Which is probably easy enough to change.

But I think that's really the core issue. Physicists don't want to learn more about programming languages. They want whatever mostly works out of the box and has local documentation and expertise specific to their problem domain.

>Aren't arrays implicitly passed by reference in C also?

He covers that. C passes arrays by reference, but the individual elements aren't contiguous. He says Fortran passes an optimized reference.

sbmassey · on Oct 15, 2017

Unless you're passing arrays of pointers to things about, the individual elements in a C array should be contiguous.

tyingq · on Oct 15, 2017

>Unless you're passing arrays of pointers

That seems to be the context from his example:

  for(i = 0; i < nrows; i++){
     array[i] = malloc(ncolumns * sizeof(double));
  }

I suppose the counterpoint is that he's doing it wrong. But again, maybe physicists just don't want a tool with that much flexibility.

ryanpepper · on Oct 15, 2017

I was taught to do it like this in C:

  double *a = malloc(sizeof(double)*nx*ny*nz);
  int id;
  for(int k = 0; k < nz; k++) {
      for(int j = 0; j < ny; j++) {
          for(int i = 0; < i < nx; i++) {
              id = k*nx*ny + j*nx + i;
              a[id] = /* some mathematical operation */
          }
      }
  }

You can do it in other ways (arrays of pointes, etc), but I think this is probably the simplest conceptually, and the memory is contiguous.

cowsandmilk · on Oct 15, 2017

he's definitely doing it wrong there. As a scientific C programmer, I would never do that, I would malloc one contiguous array of nrows*ncolumns. And my freeing of the array would be as simple as his Fortran deallocate.

Sure, C could make multi-dimensional array handling nicer, but I have macros that basically do his A[x,y,z] for me, admittedly a bit more verbosely.

Many of his points about Fortran are true, but almost all his statements about C are false. Or nearly so, yes, copying an array of floats requires calling a function, memcpy, and not just using an equals sign, but that ain't rough and falls into his idea of "what actually happens ‘under the hood’ inside a computer". And others are easily handled by adopting things like the MKL (want to take the sin of everything in an array, see https://software.intel.com/en-us/mkl-developer-reference-c-t... )

sbmassey · on Oct 15, 2017

Fair enough then, I suppose that in C you have to do something like that if you want both that your matrix size is decidable at runtime, and to be able to index it with m[x][y] rather than some function.

iamrecursion · on Oct 15, 2017

That’s not strictly true. You can malloc an m*n sized array and then assign to another array the pointer to the head of each column, allowing you standard style indexing.

PeterisP · on Oct 15, 2017

If you do that, every access will actually dereference that pointer and won't be able to optimize the standard style indexing to a multiplication and addition, so this still carries a significant performance cost.

Peaker · on Oct 15, 2017

You can always use a macro (or inline function, if the array is a rich wrapper type):

  arr(m, x, y)

Not quite as nice, but not too bad.

pletnes · on Oct 15, 2017

Fortran is much easier to use than its direct competitors if you are writing array-based number crunching. For all other used, it’s hopelessly outdated. If I were to explain its purpose to a non-fortran programmer, I’d say that Fortran is useful for number crunching kinda like regular expressions are great for certain text processing tasks. It’s basically a domain specific language and should be used accordingly. Wrap in C++ or f2py and call the routines from python/C++, where you do the «software parts»: IO, GUI, ...

That being said, I usually just use python and surf on other people’s hard work!

Oh, and the author is wrong on many of the specific details. For instance, MPI is available to many languages, including python.

robotresearcher · on Oct 15, 2017

> [Fortran is] basically a domain specific language and should be used accordingly.

I read this whole discussion with interest, and I think this is the most compact and insightful statement here. Thinking about it this way makes the situation very clear.

rdiddly · on Oct 15, 2017

The paragraph on "legacy code" is a bit weak and half-hearted because it underemphasizes one of the most important arguments for using old code: it's been thoroughly debugged already. The most the author can summon on the topic is the fact that legacy code "takes uncertainty out of the debugging process." What? There is no debugging process, because that code has been debugged for 40 years and is damn near bulletproof at this point!

Everybody is used to cringing when they hear "legacy code," and that's justifiable for several good reasons. Note that "not wanting to learn an unfamiliar language" isn't one of them. And "not having, or not being willing to use/cultivate, the skill set of reading someone else's code" isn't one of them either.

But there is obviously a lot of bad code out there. And that's the thing, there are only two kinds of code: good code and bad code. And by extension there is bad legacy code and there is good legacy code. Don't assume legacy code is always bad code. If something has been used successfully for 40 years, do yourself a favor and try to have the humility to assume people implemented it well, found all the bugs, know what they're doing, and/or generally are rational-thinking adults who make good choices... instead of the usual naïve assumption that everybody's an idiot but I'm going to change all that! No, you're going to duplicate a lot of effort, and possibly (depending on the faithfulness of your reading of the code) reintroduce some of the same bugs that were dealt with years ago.

mushishi · on Oct 15, 2017

I don't know about mathematical domain, so this might be off-topic but in some cases a legacy code base that works perfectly can be brittle and full of holes and bugs that never manifested themselves because the code not directly interfacing the input data is guarded by subset of possible data it should support, and many of the possible code paths have not been evaluated.

But if you try to refactor the code to, say support other feature or optimise it, you might get into nastiness that is beyond comprehension, and you cannot count on that the code is coherent or correct.

This is my experience for maintaining a legacy base that has been many years in production. It's just a pile of frozen code that nobody has properly refactored probably out of fear for breaking something. This way you end up with unreadable layers and weird technology-specific hacks that were carefully made just work, probably not understanding what the existing code actually did but cargo-culting and resulting with massive amount of code that does little.

druidcz · on Oct 15, 2017

Another reasons not mentioned : 1) builtin support for complex numbers 2) Fortran compiler usually generates faster code (due lack of pointers it can do more assumptions). But recently I see more and more physics code written in C++

dahart · on Oct 15, 2017

> Interestingly, C/C++ beats Fortran on all but two of the benchmarks, although they are fairly close on most.

I think this is fairly recent that C/C++ wins. I don’t know how recent exactly, but I remember a colloquium not too long ago by a compiler researcher who said that cross-compiling to Fortran and then optimizing almost always produced faster code than the C/C++ compiler could. Fortran is apparently easier to optimize.

quadruplebond · on Oct 15, 2017

For science codes, where Fortran is still used, the most expensive pieces (think DGEMM Kernel) are largely written in architecture dependent ASM anyways so programming language doesn't have that much of an effect on the most time critical pieces of large HPC programs.

If a specific function is important enough it will be hand optimized in a way that the language doesn't really matter. Sometimes that means calling MKL, or using Cuda, or writing your own assembly.

pletnes · on Oct 15, 2017

DGEMM doesn’t necessarily win over MATMUL for small matrices. Useful in e.g rotation matrices in finite element codes, and lots of other areas. Since matmul can be done with inlined loops, you also avoid the function call overhead, but it looks like a function call which is good for readability.

quadruplebond · on Oct 15, 2017

Yeah small and skinny matrix multiplies are an active area of research.

nejenendn · on Oct 15, 2017

Fortran is easier to optimize compared to c/++ if you don’t use restrict for the c end. If you do use restrict (iirc) the compilers are competitive.

noobermin · on Oct 15, 2017

What I don't get is why someone doesn't make a superb template library with C++ that hides the restricts; it would make writing performant codes in C++ easier and you'd get the host of what C++ has to offer

quadruplebond · on Oct 15, 2017

Eigen isn’t good enough? You can offload some stuff to MKL with it also.

pjmlp · on Oct 15, 2017

Kind of, compilers don't enforce correct use of restrict which opens the doors to strange errors, as it is UB if they actually do overlap.

Which is the main reason why ANSI C++ members are not keen in having restrict in C++.

nejenendn · on Oct 15, 2017

Agreed. However, C++ pragmatically does have restrict in both clang and g++, which is still useful in cases where you can deductively prove the lack of aliasing.

pjmlp · on Oct 15, 2017

There are more compilers out there, although I do agree in the context of HPC, those are probably the most used ones.

vram22 · on Oct 15, 2017

It say something like that here:

https://en.wikipedia.org/wiki/Restrict

RhysU · on Oct 15, 2017

> Even if old code is hard to read, poorly documented, and not the most efficient, it is often faster to use old validated code than to write new code.

Amen. A one-character mistake might take a week to find as it exhibits only subtly wrong behavior (e.g. wrong grid convergence rate, overly noisy boundary condition, odd symmetry breaking beyond IEEE floating point). During that week no science happens.

trapperkeeper74 · on Oct 16, 2017

Yup. Think of C as a rusty straight razor and Fortran as a barn full of rusty implements about ready to fall at any time. C++ maybe a rusty safety razor.

Originally, Fortran had manual memory management, as per the times. Thankfully, the language progressed.

Overall, the evolution of languages from assembly/raw instructional to procedural ones needed early languages like Fortran on which other higher-level languages, tools and OSes could be later built/bootstrapped.

ISL · on Oct 15, 2017

Our physics group has a core library, first written in 1987, that is in Fortran (a Microsoft dialect, to be specific).

Why haven't we moved to something else? It works, it is time-tested, and the original author continues to maintain it.

(P.S. I'd like to compile it with the gfortran tools, in order to preserve the library for the future. Is there any documentation for simple conversions from Microsoft's implementation of the language to the more-traditional spec?)

craftyguy · on Oct 15, 2017

> and the original author continues to maintain it.

What's the plan when they retire?

ISL · on Oct 15, 2017

My hope is to release it in a form that can be compiled by open-source tools.

noobermin · on Oct 15, 2017

A problem for computational science is people care about their publications more than people being capable of reproducing their work. The funny thing is an open source code is a sure way to attain a legacy.

cratermoon · on Oct 15, 2017

Is MS Fortran still a supported product? I'm not sure I'd call software that depends on unsupported and unmaintained other software properly maintained.

ryl00 · on Oct 15, 2017

I may be mistaken but I believe the Intel Fortran compiler supports the MS Fortran extensions

pletnes · on Oct 15, 2017

There are some caveats AFAIK. Intel changed some default behaviors. I believe you can change them with a command line switch though. Of note is the SAVE property on all local variables being the default on MS. Now that’s an insane default if there ever was any!