I seriously applaud the writer to dive into Unicode in such detail and compare m...

babuskov · on Sept 9, 2019

> Imagine writing a game. Everything works nice and dandy with your ASCII format and now your boss approaches you and wants to distribute the game for the asian market!

You just make sure all translated text is in UTF-8, and use Google Noto fonts for those languages. All game engines I know render UTF-8 text without problems if you supply a font that has the needed glyphs.

Source: I'm and indie game developer and have recently localized my game to Chinese. The game is a mix of RPG and roguelike, so it has a lot of text (over 10000 words). I used SDL_TTF to render text. Precisely: TTF_RenderUTF8_Blended() function. The only issue I had is with multiline/wrapped text. SDL_TTF doesn't break lines on Chinese punctuation characters (.,;:!?) so I would search+replace strings at runtime to add a regular space characters after those.

thaumasiotes · on Sept 9, 2019

> SDL_TTF doesn't break lines on Chinese punctuation characters (.,;:!?)

Those aren't Chinese punctuation characters. Chinese punctuation characters are full-width, including the spacing that should follow them (or in the case of "(", precede) within the glyph itself: （。，；：！？）. (You may also notice that the period is radically different.) Chinese text should almost never include space characters.

Chinese applications seem happy to break lines anywhere including in the middle of a word, but punctuation seems like an especially good place for a line break, so I'm confused why SDL_TTF would go out of its way to avoid breaking there.

simonask · on Sept 9, 2019

It sounds more like a bug in SDL_TTF than a deliberate attempt to not break the line on Chinese punctuation marks.

I wonder if SDL_TTF works with Unicode Zero-Width Space (U+200B). If so, that would probably be the right choice.

babuskov · on Sept 9, 2019

> Those aren't Chinese punctuation characters.

I know, I meant the actual ones you wrote above.

> I'm confused why SDL_TTF would go out of its way to avoid breaking there.

SDL_TTF doesn't break at all. If you have a long Chinese text which uses proper punctuation characters, it would never break, because it only breaks on ASCII whitespace.

I wanted to avoid breaking lines in the middle of a word, so I added extra "regular" space characters to force breaking the line.

buntsai · on Sept 9, 2019

You don't really need to only break on punctuation. There is no convention to do so and so long as you so not break any logograms in half, the resulting text reads perfectly fine. In fact, the convention is to have left and right justified text with equal numbers of monospaced logograms, including punctuation, on each line (on the equivalent for vertical text). Classical Chinese before the 20 th century was seldom punctuated.

babuskov · on Sept 10, 2019

I wasn't aware of this. Thanks.

coldtea · on Sept 9, 2019

>You just make sure all translated text is in UTF-8, and use Google Noto fonts for those languages. All game engines I know render UTF-8 text without problems if you supply a font that has the needed glyphs.

The game could have used a custom engine (like tons of games do), or the requirement could include e.g. Arabic or some such RTL text, further messing up the display...

flohofwoe · on Sept 9, 2019

First rule: don't panic :)

Also disclaimer: I've been working on games which were translated to >20 languages including some exotic ones, but I'm in no way an UNICODE expert.

- most important: consider using UTF-8 as text encoding everywhere, and only encode and decode from and to other text encodings when needed (for instance when talking to APIs which don't understand UTF-8, like Windows)

- be very careful in all places where users can enter strings, and with filesystem paths, this is where most bugs happen (one of the most popular bugs is when a user has Unicode characters in his login name, and the game can't access the user's "home directory", happens to the big ones too: https://what.thedailywtf.com/topic/15579/grand-theft-auto-v-...)

- get familiar with how UTF-8 encoding works and how it is "backward compatible" with 7-bit ASCII, there are good chances you don't need to change much of your old string processing code

- rendering is where it gets interesting, and here it makes sense to only do what's needed:

(1) The easiest case is American and European languages, these are all left-to-right, have fairly small alphabets and don't have complicated 'text transformation' rules

(2) East-Asian languages with huge alphabets can be a problem if you need to pre-render all font textures.

(3) The next step is languages which render from right-to-left, the interesting point is that substrings may still need to be rendered left-to-right (for instance numbers, or "foreign" strings)

(4) And finally there are languages like Arabic which rely heavily on modifying the shape of 'characters' based on where they are positioned in words or in relation to other characters, you need some sort of language-specific preprocessing of strings before you forward them to the renderer. HarfBuzz is a general solution for this problem, but it's also a lot of code to integrate (we created a specialized transformation only for Arabic).

(5) For actual rendering, all text rendering engines which can use TTF fonts are usually ready for rendering UNICODE text

So basically, the problem becomes a lot easier if you only need to support a specific set of languages.

gmueckl · on Sept 9, 2019

The last part about having support for. limited numbers of languages is bigger than you probably expect. Generally, unless a software has actually been adapted to and tested with a specific language, it shouldn't claim to support it, even if is just processing UTF-8 encoded text in that language.