Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
C, what the fuck??! (bowero.nl)
73 points by bowero on Dec 15, 2019 | hide | past | favorite | 36 comments


This is about that ancient EBCDIC (IBM's ASCII "competitor") specific hack called trigraphs, which in modern compilers are opt-in anyhow: "To understand this, I have to admit one thing: I have to pass -trigraphs to a modern version of gcc before this actually works. "

C++17 actually removed support for these.


Trigraphs were not specific to EBCDIC: they were also a bad accommodation for the ISO 646 ASCII variants that supported European languages, which used characters {\|} etc. for letters. This is why the newer better header with alternate spellings for restricted character sets is called <iso646.h>

https://en.wikipedia.org/wiki/ISO/IEC_646


The reasonable accommodation is to substitute with characters that are available in the ISO 646 ASCII variant being used, preferably the characters with the same binary representation as in US ASCII.

Outside of silly games like IOCCC, did anybody actually use trigraphs? I find it really hard to believe that anybody tolerated trigraphs for serious C programming. Was there an obscure European government agency that actually wrote code with trigraphs?


IBM opposed their removal in C++1x

From http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n291...

> The biggest argument against deprecating trigraph is the fact that we know there are source code from world-wide companies, based on an internal survey of our customer problem reports that make heavy use of trigraphs for one reason or another, and customer testimonials. Google code search doesn't find them because these are proprietary code. Further discussions with some of these customers have shown serious dissatisfaction with WG21’s intention to deprecate trigraphs, because they do not wish to make any changes to their source code. It doesn't seem fair to force this group of existing users to change their source code for no apparent benefit, other then reducing an annoyance to another group.

ISO 646 is probably nowhere near as relevant to their argument as EBCDIC is.

> Trigraph deprecation is vexing for existing world-wide companies (of which IBM is one such user) because of their use of EBCDIC variant code pages. It makes it difficult to start in a code page neutral manner, establish what the active code page is, then start using invariant characters. We use a pragma filetag("IBM-1047") to toggle the code page, then we can use #. Prior to that we are in a code page neutral environment and must use the trigraph ??=


Digraphs were supposed to be a easier-to-read alternative to trigraphs. Unfortunately, the extended list of digraphs with many additional logic operators by C++ never entered C, i.e.

    %:%:    ##
    compl   ~
    not     !
    bitand  &
    bitor   |
    and     &&
    or      ||
    xor     ^
    and_eq  &=
    or_eq   |=
    xor_eq  ^=
    not_eq  !=
C only has basic ones (still more intuitive than trigraphs).

    <:  [
    :>  ]
    <%  {
    %>  }
    %:  #
Last time, I had to type a C program on a touchscreen, and those symbols were extremely difficult to enter. I knew digraphs can help, but then realized that I couldn't use the digraphs for logic operators without programming C under C++.


C has the named operators as macros in <iso646.h>.


Great tip, thanks. Have to remember this magic ISO standard number.


Also, have to remember to #undef them in any file that includes something that includes iso646.h.


Should've been `bitxor`


TL;DR: Trigraphs. That at least makes the ‘??!’ in the title meaningful.

It's still clickbait, though.


Especially that it requires enabling a compiler flag. If you need to require that then you could require "#define true false" as well.


What is the probability of something like this appearing in a normal program written by somebody who is unaware of trigraphs?

I am also not aware of any professional who would use multiple question marks in any kind of serious code that could be read by anybody else because that kind would reflect negatively on the perception of his professionalism.


The article mentions that they had to explicitly enable it with a compiler flag to get it to work. I doubt any modern compiler would still support these by default.


The chance of this accidentally happening is almost 0. This is what `gcc` does:

`test.c:6:31: warning: trigraph ??/ ignored, use -trigraphs to enable [-Wtrigraphs]`


So it would take some clueless developer who adds the compiler flags to get rid of the warnings during the yearly code cleanup season.

Therefore the probability of this happening by accident is 0 but the probability of this happening by incompetence (after the trigraph already slipped in undetected) hovers around 82%.


Sadly, you might have a very good point here.


> What is the probability of something like this appearing in a normal program written by somebody who is unaware of trigraphs?

A good compiler (eg: gcc) will warn if any token is interpreted as a trigraph

Edit: digraph works a bit differently [0]

[0] https://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C


Nice "No true Scotsman" logic there. Sure, if you see multiple question marks as an indicator of unprofessionalism, then by definition, no professional would use that. At the same time, your definition of professional is not one that is useful in any other context.

In code written by people whose job it is, in part, to write code, i.e. what I would call professionals, multiple question marks do sometimes occur.


Did I really produce a Scotsman there? I would break it down like this:

* Professionals want to appear professional to the people who pay them.

* People who read the code can have an influence on the people who select which professional gets hired. (At least negatively if they find examples of poor practices.)

* Professionals therefore avoid any behavior that would be interpreted as unprofessional.

* Multiple question marks appear unprofessional and are therefore avoided. (Or would you concur???)

* I could not find a counterexample in my memory.

If this can be interpreted in a "no true Scotsman" way then it would be: "All Scotsmen who showed the behavior have been denied the citizenship." and then in turn they really wouldn't be true Scotsmen anymore because they lost the citizenship.


I think that last point illustrates the problem: you base your idea of "professional" on the professionals you have worked with or whose code you have read, who might not write such things. At the same time, in code I have read, I have seen it in ways that did not look out of place to me. What that tells me is that this is not a professional vs. unprofessional thing, it is just different programming norms. It is similar to how I think we can agree that neither tabs nor spaces are unprofessional, but insisting on either in a codebase already written using the other style is.


> Multiple question marks appear unprofessional and are therefore avoided.

Seems like a really weak assumption to me. Seen enough swear words etc in actual code bases, "???" in bug trackers, ... to think people would worry about the number of "?" they use in comments.


Please refrain from misusing fallacies.


Pro dev here. I am working on a codebase that is large and juvenile. The original authors didn't know about functions or something; it's pure spaghetti PHP.

I definitely add comments with tons of questions and potential trigraphs when I use punctuation in lieu of actual cursing. Don't get me wrong. I curse a lot in comments, too.

TL/DR: bad code -> angry comments


Similar here. Deepest sympathies.

However: https://www.jetbrains.com/help/phpstorm/refactoring-source-c... ?


The author purposely misleads the reader by doing syntax highlighting wrong. A trigraph-aware editor would keep the third line green.


Plot twist: Many developers are using editors that are not trigraph-aware.


> Therefore, this line actually says:

    !didIMakeAMistake() || CIsWrongHere();
> If you understand how short-circuit evaluation works, you can understand that this will result in the following:

   if (!didIMakeAMistake()) 
      CIsWrongHere();
Can someone explain this? I’d have thought that was right without the !


It must be a typo.

Function calls have the highest priority here, followed by logic NOT and logic OR.

    !didIMakeAMistake() || CIsWrongHere();
!didIMakeAMistake() is evaluated first (or you can say didIMakeAMistake() is executed and evaluated first, then its result is inverted and checked), if it's true, evaluation is finished. If it's false, CIsWrongHere(); is evaluated.

It is indeed equivalent to

   if (didIMakeAMistake())  /* !didIMakeAMistake() == false? */
      CIsWrongHere();
Without the invertion.


Thank you guys so much! I've changed it immediately.


Well, I did always think less of people who // wrote comments like this????


Seriously, at this point C should be considered dangerous and should be actively discouraged. I would even go so far as to legislate it. We can't have this sort of thing in 2019.


We don't really have this in 2019. All modern C compilers disable trigraphs by default, so you have to go out of your way to see this in person.

They only existed because really old terminals sometimes didn't have various punctuation characters used by C so they needed the ugly workaround so people trapped on those terminals wouldn't be left out in the cold.


The sad thing is that in safety critical embedded systems often the only choice you have is C and C++ because nobody would shell out for ADA developers. Rust would be a contender as well, however I'm not sure how many embedded targets that aren't major (like arm) even have a target.


Please don't comment on the state of programming in 2019 if you're an Angular developer.


Angular is one of the many tools I use.


I would definitely vote for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: