>*The problems only arise when taking about "a string of length 20" which indeed...

wongarsu · on Sept 9, 2019

limited display space is a case for "string length in px", which is notoriously hard to calculate and has poor library support. Just because 20 "x" fit doesn't mean 20 "w" will fit. Fixed space fonts are an exception, but they have problems with Chinese.

Databases with varchar columns exist, but varchar(20) sounds generally suspect unless it's a hash or something else that's fundamentally limited in length.

coldtea · on Sept 9, 2019

>limited display space is a case for "string length in px", which is notoriously hard to calculate and has poor library support.

It is notoriously easy when the display is a LED display, a banking terminal, a form-based monospaced POS, something that goes to a printed out receipt (like a airline check-in or a luggage tag), a product / cargo label maker, and tons of other systems billions depend upon everyday, where one visible glyph = 1 length unit, and type design doesn't come into much play...

yorwba · on Sept 9, 2019

It's only easy if the system forbids everything that would make calculating visible length hard, which I think constitutes extremely poor library support. I want to see the monospaced system that can correctly print Mongolian: ᠮᠣᠩᠭᠣᠯ ᠪᠢᠴᠢᠭ If properly implemented, it should join the characters and display them vertically. But your browser is probably showing them horizontally right now, because support for vertical writing is seriously broken: https://en.wikipedia.org/wiki/Mongolian_script#Font_issues

Manishearth · on Sept 9, 2019

Most of those systems are terrible at handling non-latin text because they get all these things wrong. Of course it's "easy" to handle length in these cases, they've selected out any kind of text that makes handling length hard.

The mere assumption that "one glyph" is a meaningful well-defined concept that works across languages is the problem here.

andreareina · on Sept 9, 2019

I was at one time responsible for code to populate preprinted forms. I only had to deal with ascii but the best solution was still to just do the layout and then check that the bounding box wasn't too big.

Arnt · on Sept 9, 2019

I quite agree... mostly... but I tend to abbreviating input from untrusted sources using the String.length() anyway, because it's so simple, and a single-digit overestimate is often safe.

For example, I have a phone app that abbreviates various things at 300 whatevers because I know that on one hand 300 is more than the phone will display, but on the other it's little enough that it'll cause no processing problems. If I process 300 whatevers and then the display can only show 120, I haven't burned much of the battery. I overstimated by a factor of 2.5, burned a minuscule part of the battery, and I gained simplicity.

I strongly agree that numbers as low as 20 are a code smell.

tialaramex · on Sept 9, 2019

Mechanically abbreviating strings is likely to have nasty corner cases. Avoid if at all possible.

There are fancy Unicode corner cases, like the flags (cutting a flag emoji in half doesn't get you half a flag it gets you one half of the flag's country code identifier) but there are plenty of corner cases already in ASCII.

Abbreviating "Give all the money to Samantha's parents to hold in trust" as "Give all the money to Sam" doesn't involve any scary modern encoding, just the problem that this isn't how human languages work.

Arnt · on Sept 10, 2019

Can you suggest a relevant nasty corner case?

Consider an unbounded input string (generally less than 1k, but sometimes more than 100k), a phone display large enough to display, say, 80-120 glyphs, and a mechanical abbreviation to a few hundred codepoints soon after input, before all expensive processing. What are the nasty corner cases?

emilfihlman · on Sept 9, 2019

You are not thinking outside of your web development box.

Embedded systems care deeply and know exactly display sizes.

wongarsu · on Sept 9, 2019

Embedded systems increasingly have screens with real pixels and support for weird languages. At least they are low-level that "give me the width and height of that string" is actually an achievable task.

fragmede · on Sept 9, 2019

The ambiguity of which length is being talked about is what smells. It is a bad odor, not bad enough that I'd not use a library that had 'length' as a property, but it is absolutely a smell because I can't be sure that every single last usage of 'length', inside the library nor in my code, is using the same definition for 'length' as is meant.

It's subtle, which is what makes it a smell, rather than a "DO NOT WANT".