Hacker Newsnew | past | comments | ask | show | jobs | submit | papercrane's commentslogin

One of the reasons Calibri was selected over Times New Roman was it has a lower rate of OCR transcription errors, making documents using it easier for people using screen readers.

Link on that, as OCR should be more reliable with Times New Roman due to significant serifs.

I don't have link on that, but the main difficulty with OCR isn't the OCR part (not anymore at least), it's the "clean up" part, and serifs are a pain in the ass, especially on sightly crumpled paper. My use case was an ERP plugin that digitalized and read to receipt to autofill reimbursement demands, and since most receipt use sans-serif fonts, it was mostly fine, but some jokers use serifed font (mostly on receipts you get when using cash, not credit card receipts) and the error rate jumped from like 1% to 13% (not sure about the 1%, it might be a story i told myself to make me feel better, it was a decade ago, before i pivoted to network from AI. I always take the best decision it seems)

I don't know what studies Blinken's State Department considered, but here are 2 studies on the matter.

https://www.academia.edu/72263493/Effect_of_Typeface_Design_...: "For Latin, it was observed that individual letters with serif cause misclassification on (b,h), (u,n), (o,n), (o,u)."

https://par.nsf.gov/servlets/purl/10220037: [Figure 5 shows higher accuracy for the two sans-serif fonts, Arial and DejaVu compared to Times New Roman, across all OCR engines]


The memo at the time said the serifs can cause OCR issues.

https://x.com/John_Hudson/status/1615486871571935232


Just because they claimed it, doesn't make it true. OCR and screen reader software in 2023 did not have problems with serifs.

That doesn't make much sense, since a typewriter will neither type Calibri nor Times New Roman. And OCR should only be needed for type written documents, because any document made with Calibri or TNR is already digital.

We have a process at work where clients export information from their database as a pdf which they email to us so that we can ocr it and insert into our database.

No one else seems to think this is bat shit insane


printed documents, images, horribly inaccessible pdfs, horribly inaccessible websites

> Printed documents - Use the original, which is digital.

> Images - Use the original, which is digital.

> horribly inaccessible pdfs - Use the original, which has real text in the PDF

> horribly inaccessible websites - All text on any web site is digital. Nobody uses OCR on a website.

A massive paper producer like the government shouldn't adopt their type setting to people who are using technology wrongly.



it's easier to mandate font than to excise all processes within the fed bureaucracy that result in these.

images being digital have no bearing on OCR ability


It could've been, but it was probably a turbocharged L-series engine.

When Doyle wrote most of the Holmes stories cocaine was a popular and novel new drug, it wasn't until later that it's risks became widely known. In one of his later stories, "The Adventure of the Missing Three-Quarter", Doyle portrays it as an addiction that Watson weaned him of, but is still concerned that his friend may fall back into.

"For years I had gradually weaned him from that drug-mania which had threatened once to check his remarkable career. Now I knew that under ordinary conditions he no longer craved for this artificial stimulus, but I was well aware that the fiend was not dead but sleeping, and I have known that the sleep was a light one and the waking near when in periods of idleness I have seen the drawn look upon Holmes’ ascetic face, and the brooding of his deep-set and inscrutable eyes. Therefore I blessed this Mr. Overton, whoever he might be, since he had come with his enigmatic message to break that dangerous calm which brought more peril to my friend than all the storms of his tempestuous life."

- https://en.wikisource.org/wiki/The_Return_of_Sherlock_Holmes...


In the US copyright just requires a level of originality. The bar isn't very high, but for example simple logos, like IBMs blue lines logo is not copyrightable.

There are examples of software code that is probably not copyrightable, but that's limited to very simple code that has only obvious implementations.


I believe the RTL in RTL-SDR is "Realtek Limited", the manufacturer of the chips used in the early days of SDR. I don't think the chips these days are exclusively Realtek, but the name has persisted.


Thanks! I'm getting myself a RTL-SDR!


For clarity, they're requiring apps to be signed by a verified developer on certified Android devices. You can still side load, but the verification is still required for the side loaded apps.


Future HN headline: Pam Bondi orders Google to revoke verification status and code signing certificates of authors of {partisan/politically-unfavourable Android app}


This still means that Google is effectively gatekeeping what can be installed on the hardware you own and what cannot.


It's 100k a year. H1B are normally valid for 3 years, so that's where the 300k comes from. The $1M figure is for the "Trump Gold Card" visa, which is unrelated to the H1B program.


I think the E.O wording says a 100K for an appliciant to enter the lottery and they'd hold that money I am assuming until you won, what happens if you don't win unclear? But I think this is all horrible law making. E.O. are not effective leadership or law building because its so underspecified and rush and haphazard. Its a shame that we can't have a sensible immigrantion reform and it is behavior like this that makes me feel republicans simply don't care about immigrantion reform just vibes. How they are doing it is simply unserious and punitive but short sighted.

This is simply going to push people away from coming to the US and we will see more and more robust tech competition with laws like this. Like them or hate them H1B visas are a major brain drain on all of the nations the US wants to compete with which is good for us not bad. Tech workers are not hurting in the salary department.


Please stop spreading the unsubstantiated rumor that it's 100K a year. It's not. It's for the lifetime of the visa which is 3-6 years, potentially longer, subject to employment.


This unsubstantiated rumor coming from......the Secretary of Commerce?

"Reuters was not immediately able to establish how the fee would be administered. Lutnick said the visa would cost $100,000 a year for each of the three years of its duration but that the details were "still being considered.""

"Lutnick said on Friday that "all the big companies are on board" with $100,000 a year for H-1B visas. "We've spoken to them," he said."

https://archive.is/WYuI1#selection-1571.0-1575.32


It's not what the official announcement from the White House said. The official announcement from the White House has made it seem that the $100K fee applies for the full duration of the visa. This number is chump change for a 3+3=6 year visa.


I don't know why I was downvoted since what I said is the truth, and it checks out as per the announcements from the White House. The forces of disinformation are strong.


The Data Act allows for termination penalties in cases like that. You just have to make sure they're clearly disclosed in the contract.


That doesn't make it sound better. Generally "arrest them all and sort it out later" sounds like a 4th amendment violation.


As far as this administration is concerned, lawful immigrants aren't citizens, and due process is what they say it is. Til SCOTUS nuts up, that won't change.


This does not mean it is not a 4th amendment violation. The Supreme Council may declare what's legally executable, but they don't define the truth.


Do you have a source for that because MAI Systems Corp. v. Peak Computer, Inc established that even creating a copy in RAM is considered a "copy" under the Copyright Act and can be infringement.


It's not an issue of where it's being copied, it's who's doing the copying.

Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.


There are many copies made as the text travels from Library Genesis to Anthropic. This isn't just of theoretical interest. English law has specific copyright exemptions for transient copies made by internet routers, etc. It doesn't have exemptions for the transient copies made by end users such as Anthropic, and they are definitely infringing.

Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?


> But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

Well, the question here is "who made the copy?"

If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?


Copyright law is literally about the copies. A xeroxed book is exactly one copy. Mailing and reading that book doesn't copy it any further. In contrast, you can't do anything with digital media without making another copy.

> "Who made the copy?"

This begs the question. With digital media everybody involved makes multiple copies.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: