Well, except for Windows (which insists on UTF-16/UCS2), as far as I can see we ...

scott_karana · on May 6, 2014

I decided to see what OS X did, so I created a new account and pre-configured nothing.

Terminal.app, on 10.9.2:

  Scotts-MacBook-Air:~ test$ locale
  LANG="en_CA.UTF-8"
  LC_COLLATE="en_CA.UTF-8"
  LC_CTYPE="en_CA.UTF-8"
  LC_MESSAGES="en_CA.UTF-8"
  LC_MONETARY="en_CA.UTF-8"
  LC_NUMERIC="en_CA.UTF-8"
  LC_TIME="en_CA.UTF-8"
  LC_ALL=
  Scotts-MacBook-Air:~ test$ echo $TERM
  xterm-256color

Arabic text copy/pastes into the terminal and appears to be displayed correctly. (I don't read it, so I wouldn't know if there were subtle errors)

acdha · on May 6, 2014

It is, but all POSIX operating systems are brain-damaged when it comes to things other than interactive terminal windows – they default to LANG=C so e.g. the same script which runs interactively crashes when you run it in cron or some other launcher.

Crito · on May 6, 2014

Hmm, is OXS still actually POSIX compliant? I thought they haven't bothered with certification for the past few versions.

Anyway, if a changed LANG breaks your script, I'd primarily place the blame on the script. You should be able to shuffle around UTF-8 bytes non-interactively without paying much mind to that.

acdha · on May 6, 2014

The reason the Linux distributions don't set UTF-8 in the default environment is for backwards (bugwards?) compatibility with legacy code where e.g. something like Python/Perl/etc. might start actually throwing exceptions if it thinks you want UTF-8 and you gave it either a different encoding or just unvalidated garbage.

There's no solution to this which will make everyone happy. I prefer to set UTF-8 for everything so I can find things which break and fix them but a lot of legacy shops choose not to spend the time fixing things which are “working”.