AFAIK this was not a clean room reimplementation. But since it was rewritten by hand, into a different language, with not just a different internal design but a different API, I could easily buy that chardetng doesn't infringe while Python chardet 7 does.
I've worked on a system where ULIDs (not UUIDv7, but similar) were used with a cursor to fetch data in chronological order and then—surprise!—one day records had to be backdated, meaning that either the IDs for those records had to be counterfeited (potentially violating invariants elsewhere) or the fetching had to be made smarter.
You can choose to never make use of that property. But it's tempting.
I made a service using something like a 64 bit wide ULID but there was never a presumption that data is be inserted or updated earlier than the most recent record.
If the domain is modeling something like external events (in my case), and that external timestamp is packed into your primary key, and you support receiving events out of chronological order, then it just follows that you might insert stuff ealrier than you latest record.
You're gonna have problems "backdating" if you mix up time of insertion with when the event you model actually ocurred. Like id you treat those as the same thing when they aren't.
Python managed to do this by not actually checking the types at runtime. If you declare a list[int] return type but you return a list[string] then nothing happens, you're expected to prevent that by running an offline typechecker.
PHP chose to check types at runtime. To check that a value is really an array<int> the runtime could have to loop through the entire array. All the types PHP currently implements are simple and cheap to check. For more elaborate cases you need an offline checker like PHPstan and comment-based type annotations. (PHPstan catches 99% of issues before the runtime gets to it so for my own code I'd prefer the Python approach with its cleaner syntax.)
The runtime checking seems the key difference, not so much the historical strength of the type system. Python's language implementation does very little typechecking itself and PHP's third-party offline typecheckers are respectably advanced.
Precisely. PHP has tools for this too, but lack the syntax. Right now you need to to all typings in comments, and thats just as bad as jsdoc was in 2005.
This could be the way PHP could go, they just need the lexer to handle types, and not do any runtime checking at all.
But i guess that goes against what the php devs want, but it sounds so wasteful, to typecheck the same code time after time even if it passed some sort of initial "compile time step".
The current amount of typechecking might be a net efficiency improvement AFAIK. It provides a hard runtime guarantee that variables are certain types while Python has to check whether something is supported at the last possible moment. But I don't know how much use the optimizer makes of that.
Python is not (usually) run like PHP. Python programs (like most other languages) "run", compared to PHP where you in 99% of all cases instead "execute". (run = the program is running for a long period of time, and execute = run/die immediately).
This subtle difference has huge implications. You could in theory have an "compile step" in. Python, but in PHP you really cant as the program is never "running".
Python built syntax for types / generics etc. Its actually a quite capable typesystem (im not a python developer, but use python on some occasions). Python then has tools for static typechecking that can be run outside execution.
This means that if python would do actual static typechecking on runtime it would be nothing more than wasted cpu cycles.
Thats why python opted for the syntax only, as its basically zero cost. In php land the typechecking is done on EVERY execution, even if the code was unused. (a void functions that has an int param, but gets passed an string, that just discards the parameter). Even worse, a type error thats not executed wont be caught by every execution.
In short PHP typesystem is just runtime checks for primitives / classes and wont catch errors where not executed. Its like the worst of both worlds.
I don't think this shows deep thought on his part.
By Stallman's own telling a free Objective-C frontend was an unexpected outcome. Until it came up in practice he thought a proprietary compiler frontend would be legal (https://gitlab.com/gnu-clisp/clisp/blob/dd313099db351c90431c...). So his stance in this email is a reaction to specific incidents, not careful forethought.
And the harms of permissive licensing for compiler frontends seem pretty underwhelming. After Apple moved to LLVM it largely kept releasing free compiler frontends. (But maybe I'd think differently if I e.g. understood GNAT's licensing better.)
rustc is only loosely tied to LLVM. Other code generation backends exist in various states of production-readiness. There are also two other compilers, mrustc and GCC-rs.
mrustc is a bootstrap Rust compiler that doesn't implement a borrow checker but can compile valid programs, so it's similar to to your proposed subset. Rust minus verification is still a very large and complex language though, just like C++ is large and complex.
A core language that's as simple to implement as C would have to be very different and many people (I suspect most) would like it less than the Rust that exists.
RFC 3629 says surrogate codepoints are not valid in UTF-8. So if you're decoding/validating UTF-8 it's just another kind of invalid byte sequence like a 0xFF byte or an overlong encoding. AFAIK implementations tend to follow this. (You have to make a choice but you'd have to make that choice regardless for the other kinds of error.)
If you run into this when encoding to UTF-8 then your source data isn't valid Unicode and it depends on what it really is if not proper Unicode. If you can validate at other boundaries then you won't have to deal with it there.
AFAIK this was not a clean room reimplementation. But since it was rewritten by hand, into a different language, with not just a different internal design but a different API, I could easily buy that chardetng doesn't infringe while Python chardet 7 does.