I've noticed that Twitter will translate Hindi written in Roman characters, which is certainly common on the internet, but I'd think given (AFAIK) there's no formal standard for it it'd be hard to get good data to feed into an AI. Or is it enough of a 1-1 transliteration that all you need to do is encode the Roman->Devanagari rules and that'll work nearly 100% of the time?
The challenge isn’t with the script - it’s with lack of acceptable technical vocabulary in Hindi/English.
Video works because Hinglish is how tech-talk in (many, not all) companies works.
But you’d never see a technical doc in any of the these companies in Hindi, because you can’t even translate simple terms like Server or package-management. Even if you find acceptable translations, they aren’t immediately obvious, because nobody has heard them before.
In a Python video, you might hear: “मेरा code requests library से Google server को HTTP request भेजता हैं” (My code uses the requests library to send HTTP requests to Google”.
It works on video, but it doesn’t on text because nobody is used to reading this in Hindi in the first place.
So if I understand correctly, Latin characters aren't used for loanwords like this in written text?
When I was young I used to play some MSX games in Japanese, the language doesn't really matter for a lot of these 1980s games, and you would frequently see English words and terms written in Latin characters used all over the place.
Why won't this work for Hindi? Are people not familiar with these characters? Or is there just no tradition of doing so?
> So if I understand correctly, Latin characters aren't used for loanwords like this in written text?
It happens in casual text - WhatsApp forwards, SMS messages. But for official writing - you pick a language and stick to it, as much as possible. This made more than a few notices impossibly hard to read when I was in college, because the Hindi felt archaic, even if it wasn't.
Other countries had a rich culture of research and scientific literature published in native languages. India never got that to a national scale, because India has hundreds of languages[0] so any efforts were local. A paper published in Tamil would be unreadable by folks a hundred miles away, so English became the technical lingua-franca of the nation (The colonial imposition didn't help either).
When a developer searches stack-overflow for an answer, english works better because it serves all developers in India.
[0]: India scores 0.914 on the Linguistic diversity Index, which ranges from 0 (everyone has the same mother tongue) to 1 (no two people have the same mother tongue)
I think English is the lingua franca of science and computing pretty much anywhere now? Just as Latin was in the past? Newton didn't publish in English, but in Latin, as did most people of his day.
In Dutch, I would just say "de server is kapot" ("the server is broken"). There is no attempt to translate words like "server" to Dutch. You see the same in Indonesian (standard Indonesian, Bahasa, there are many Indonesian languages) where these kind of words are just copied ad-verbatim from either English or (for older words) Dutch. For many technical terms in the IT world there are no "Dutch words": just the English ones. The exceptions seem to be the ones where there are Dutch words that are close enough to the English ones ("function" → "functie", "variables" → "variabelen"). Both languages having similar Germanic roots with Latin/Greek influences helps I suppose.
And in those cases all the languages use the same Latin script, so it's easier to include loanwords and technical terms.
So it seems to me, unless I'm misunderstanding something, that it's at least partly an issue of script translations? Adopting the example someone else posted, why shouldn't "नमस्ते आप कैसे हैं? मेरा server ओली हूँ" be considered acceptable Hindi?
It depends a bit on the person and word, but for most technical terms I'd say it's quite close to the English (other loanwords: a bit less so; my favourite example is "halve zool" ("half sole") which is a way to call someone a fool or idiot; which is adapted from the Britsh "arsehole").
> We all know how Dutch people like to pronounce their Gs :)
This depends on the regional accent; the south (and Belgium) has a "soft G", whereas the north (including Amsterdam, for example) has a "hard G".
That does happen, a common word for 'school' is स्कूल ('skūl') for example.
It's just that another phenomenon is the alphabeticising of Hindi (as in actually Hindi words) like 'namaste aap kaise hain? Mera naam Ollie hoon' (IAST āp, nām, olī, and hūñ) is a contrived sentence but the sort of thing someone might text if they didn't have the keyboard for नमस्ते आप कैसे हैं? मेरा नाम ओली हूँ or whatever reason.
The standard is IAST - but colloquially people don't use it, preferring a more Anglophone phonetic approximation (since English literacy is high) so you get 'aloo' instead of 'alū' (potato) and 'jeera' instead of 'jīra' (cumin), for example.
It's quite annoying as a learner, since it can make it difficult to map back to devnagārī (resp. devnagari) to look up a new word, for example. (It's almost entirely true to say that devnagārī script is phonetic, so if you write कुछ and I don't know the word, I know how to pronounce it without knowing what it means, and can ask someone or look it up, which is a great feature that English of course doesn't have at all, and while Hindi phonetic approximation in the alphabet might get closer, it's still non-standard and different typers will spell words differently.)