Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Quick and dirty way to rip an eBook from Android (shkspr.mobi)
192 points by Amorymeltzer on Dec 27, 2021 | hide | past | favorite | 61 comments


Does anyone have any advice on how I can pay authors to get a copy of their eBook without going through Amazon or Google? I really don't want Big Tech or advertising companies profiling me based on the books I read, but I absolutely want to pay authors for their work.

I've considered torrenting books and just sending a check or something to authors, but I'd love to find a better way.


Go to bookshop.org. Buy the book you want. Then download from libgen or z-library guilt free. If you don't want the hardcopy, you can pass it on to a friend. Start a tiny library in your neighborhood. Or just leave it on a coffeeshop table to entertain some random person.

Bookshop.org is basically a coop organized by booksellers across the country. A local bookstore will fulfill your order and get the proceeds. Your purchase will support local businesses as well as pay the author.

Bookshop does offer ebooks via My Must Reads, but their ebook format looks to be non-standard. Still, you could buy via Bookshop / My Must Reads and still download a friendlier version from libgen, etc. So there's an option which would avoid getting an unwanted physical book while supporting bookshops and authors.

Kobo has a partnership with local bookstores, and supposedly gives them a cut. I don't know much about this, but it might be worth looking into. Kobo DRM is easily stripped so it would be a viable option. But Kobo is at least partially owned by Walmart and that might be something you'd rather not support.

https://www.kobo.com/indie


There are a few of methods I use. First, my local library lends eBooks. In the UK, authors get paid for each book that is borrowed.

Secondly, sites like hive.co.uk sell eBooks outside of the Amazon / Google services. You can read them on any device which supports Adobe's DRM - or you can remove the DRM yourself.

Finally, go to the publisher's website. Some of them sell books directly.


> First, my local library lends eBooks. In the UK, authors get paid for each book that is borrowed.

I'm in the US, and I wonder how it's done here. I also get all of my eBooks and audiobooks from my local library. Just yesterday I was perusing titles at Barnes & Noble and putting stuff that looked interesting on my library queue. I ended up actually buying dead-tree manga books from there, since I really can't get into eBook-formatted manga for some reason. If it's illustration-heavy, I still want printed stuff.

For the audiobooks I don't even have to strip DRM. The ODM files provide links where you can pull down unprotected MP3 files.

For the eBooks I keep a Windows VM just for running an older version of Kindle for PC, Calibre with DeDRM, and Adobe Digital Editions.

I know there are websites from where I can just directly download the DRM-stripped eBooks, but I figure that going through the library means that somehow some kind of revenue is making it to the authors.

I'd be more than happy to tip the authors directly, but I'd imagine that publishers forbid that sort of thing contractually.


Here’s a list of DRM free ebook stores: https://www.defectivebydesign.org/guide/ebooks

Baen is one of the big sci-fi publishers which runs their own DRM free store: https://www.baen.com/


Baen also used to hand out CDs chock full of ebooks, and they're available for download here https://baencd.thefifthimperium.com/


I recently came across a site called Leanpub [https://leanpub.com/] while searching for Jeff Geerling's ebook on 'Ansible for Devops'.

The company looks to be ethical, and is aimed at Technical authors who wish to self publish ebooks. Each ebook sold displays how much the author will receive from the purchase (usually 80%).

Most Leanpub books are available in PDF, EPUB, and MOBI.

If you buy a Leanpub book, you get free updates for as long as the author updates the book [which Jeff Geerling seems to regularly do]

The big plus is that Leanpub books don't have any DRM copy-protection.


I always wished there was a bandcamp like service for things other than music. Unfortunately they still take fees etc, but still quite an excellent user experience for owning and buying music.


Indeed. Bandcamp also has "Bandcamp Fridays" where every dollar goes straight to the artists you buy from (after payment processor fees, that is).


I'm pretty sure that's itch.io, which looks like a game site but it was originally (and still is) just a way for you to sell zip files.


Thanks for the heads up! I always thought it was some startup site lol!


itch.io is kindof nuts - IIRC the creator is also responsible for lapis and moonscript, basically building a whole lua-based stack just for itch.


I don't know about authors who themselves chose to publish with Big Tech or traditional publishers (& not asking for money elsewhere) but how about supporting authors who are asking for funding in crowdfunding sites (e.g. Indiegogo), membership subscription sites (e.g. Patreon), or the author's own website?


Yes, that's definitely an option. But I'd like to choose books for their content, not for their publisher.


Well if the author you like chose Big Tech as their publisher, and you don't want to support Big Tech, and the author isn't asking for money in any other place, I don't know, maybe you can write a message to the author so you can give them money somehow (if the author is interested in it at all)?


Have you tried other eBook stores? Depending on your country, there’s Kobo, Barnes and Noble, and often massive freely available collections through local libraries (at least in the US and some European countries).


Buy at a bookstore, buy directly from the publisher, or ask the author.


Cryptocurrency. Not bitcoin either, one that can actually be used for fast anonymous payments with low fees. Ask them to set up a Monero wallet and publish the address so people can send money there. There are easy to use apps like cake wallet.

I've actually gotten paid for code this way. Worked great. I sincerely hope this will become common. Cryptocurrency will never be real if we don't start using it.


I once did something similar with a Windows ebook reader that nicely let you copy up to 10 characters to the clipboard. I wrote an Autohotkey script that would mouse down->move the mouse->mouse up->copy to a variable, and then start again. Lost the source code but never came across that format again. Still have a video of the program running: https://youtu.be/umou8uuNIPU


Nice use of autohotkey, it doesn't get enough love. I learned to program using autohotkey to automate parts of my job.


The workhorse behind this script is pdfsandwich

http://www.tobias-elze.de/pdfsandwich/

which itself uses the OCR engine Tesseract. Unfortunately, Tesseract is not quite accurate even for nicely aligned, printed computer texts. When I was working on an OCR problem within my firm, I was surprised to find that it was not a solved problem, and there are still public and for-profit efforts to improve the state of the art.


I would imagine it could be a "Hard Problem". I mean look at all of the problem anyone has ever had with Unicode now imagine trying to reverse engineer it form images of characters in random font without the bit-stream of the charecter, the benefit of textual context (ie it that a Latin script letter O an zero or O shaped letter from another character set used in a quote name or excerpt from another language), or metadata in file header. I think this will probably be one of those thing that will get better asymptotically until we have sufficiently advanced general intelligence, as alot of it will be context dependent.


Could an ocr reader do a first pass on the easy characters to guess the font, eliminating the problem of random font for the second pass of parsing more difficult characters.


I suppose this could be called the "semi-analog hole". I'd probably analyse the app first because these protections are usually quite trivial. I remember one a long time ago (PC-based), which was basically a PDF "encrypted" using a short XOR key and appended to the end of the executable.


> DRM on textbooks is an annoyance. For computer science books, it's little more than a fig-leaf.

The idea that you have to jump through hoops to format shift an ebook (or other item in digital form) that you paid for extremely annoying.

Apple's iTunes music store tracks went DRM-free with iTunes Plus in 2007. Amazon also sells DRM-free music. Why are iBooks and Kindle so late to the party?


I wonder if this works for apps that have set WindowManager.LayoutParams.FLAG_SECURE


That setting is essentially malware and it’s a huge shame that android added it.


I'm flabbergasted that there at least isn't a developer-option to disable it.


You can use things like https://github.com/veeti/DisableFlagSecure to disable it if you're willing to unlock your bootloader and install Xposed. But yes, it's incredibly user-hostile of Google to provide this without giving the user a built-in way to override it.


The point is, for every stupid DRM "solution" you come up with, some technical capable guy will find a way to crack this. Happens since more then 40 years in computer sience.


Unfortunately DRM is winning. It's failed before because the user could always drop down to a lower level and gain access, but now DRM is baked in to the hardware and OS of the devices people have. Those tricks to disable the secure flag feature do not work if the app uses safety net which can detect a rooted device and is backed up by boot chain attestation and hardware security chips.

Sure, for each part there is usually some exploit known, but the whole thing is getting much much harder to the point where if the average user said "How do I screenshot this app" the only real answer is to take a photo of the screen.


Yeah. At this point most computers up for sale actually belong to the manufacturer. The people who buy them have almost no control anymore. We'll never be truly free until we can make our own computers at home just like how we can make our own free software at home.


Agree with this. Cracking DRM was difficult and a PITA 20 years ago, but now the blackboxes are using the DRM modules that practically every device ships with it will get to the point where we need to go full "bunnie" Huang on the mobos to extract the keys.


Oaw, you 2 must be young. Remember the wars in end of 80's / beginning of 90's about PPV from satellites and everything was hardware protected? You know who won them? Pirates did. Hardware, software, whatever - if it's made by humans will be unmade by humans as well.


Absolutely - that was the point I was making. It'll be harder because we have to go into the hardware rather than the software, but we'll break it.

The PPV in the 90s was broken because too few bits were used on the encryption and it became possible to essentially brute force the keys, IIRC.


There's a way: don't use the app.


Android has become as user hostile as iOS these days. In some way its worse. iOS does not have a setting to block screenshots (although telegram seems to have found an obscure hack to censor parts)


Blocking screenshots is developer-hostile but user-friendly. Why shouldn't I be able to screenshot what's on my screen? Moreso, Apple DOES allow developers to blockout screenshots, although only for DRM-protected content via FairPlay and people constantly complain about it.


It's really not.


Why do you think that? I made the claim because it’s a feature that acts against the user. The user wants to take a screenshot and there is never a good reason to stop them from doing that on their own device.


You can still use the tool scrcpy to stream a video of the screen via USB to a PC.


Huh! I just tried this and you are right. I would have thought scrcpy wouldn't be able to see those screens. Thanks!


Only works up to and including Android 11, though.


Yea, I have even tried to capture a screenshot on a computer with scrcpy before that was not possible in the app that restricts screenshot


I always thought that flag was to protect the users sensitive information, like banking logins. Seems I was wrong and it’s mostly used for DRM purposes.


I've only encountered it with banking apps, so perhaps you were right.


Yeah it's a shame. Android allows for a lot of cool apps that have access to control or record other apps. I assume FLAG_SECURE was added for banking apps and the like (it's on for my banking apps), so it's a great addition.

If Android really cared about its users, all it would have to do is permit user-prompted screenshots and screen recordings but still block other app access. I can't think of a time where I'd want my own phone preventing me from screenshotting, but I can see the use case for general screen access prevention.

I don't personally know a root-obly way to disable this, but xposed has the disable_flag_secure module that I'm weary of using because it removes the flag everywhere.


Not without some kind of root access. But you can usually find an app that doesn't have that flag and can still decode DRM.


I just pop my phone on the photo-copier. Damp postit on the ADF, auto-scrolls the screen on each cycle.


You are joking right? If not, I'd kill for a video of this setup working.


Sadly I am. Although I'm idly just considering putting a magnet on the scan-head of a flatbed scanner. Could then just use tasker to detect the passing of the magnet as a 'page-down' scroll on the screen.


the "adb shell input" command as used in the linked article would be simpler.


You can either use root access or disassemble the app, remove the flag, and then assemble it again. Impossible to do if the app uses SafetyNet though.


Back when Google Books still showed entire pages for every search result, you could just script a search for each page number, and rip an entire book that way.


>The text is "sandwiched" behind the image of the page, so you can't see it but can search for it.

I already have a bunch of these PDFs that have this. Does anyone know how to extract this text out?

In particular I'd like to make it into a nice mobi/epub file, but calibre doesn't do this on its own.


At a basic level, you can copy and paste it out. CTRL+A, CTRL+C, CTRL+V. Done.

You can also use Poppler's PDFtoText https://en.wikipedia.org/wiki/Pdftotext

Calibre struggles with some PDFs, especially those that don't contain metadata about chapter headings etc. That's a failing on the PDF, not Calibre.


As always, Ghostscript is the way.

$ gs sDEVICE=txtwrite -o output.txt input.pdf


Awesome, I did not know that this works :-) Thank you.


Additionally, some PDFs (like textbooks) insert intentionally mangled embedded text to prevent copying/pasting. You can rasterize them and push them through pdfsandwich/ocrmypdf to wipe these hidden layers and re-ocr them.

Haven't done it in a while, but looks like this should work: https://askubuntu.com/a/845119

Obviously replace convert with mogrify ./* after backing them up


Great note! ocrmypdf can also output the OCR’d text as a file using the oddly named “sidecar” parameter.

$ ocrmypdf --sidecar text.txt input.pdf output.pdf


I like this, it seems like the sort of thing I might do, except I would probably spend way too long extracting the actual text and attempting to produce a "better" pdf. I would probably fail, and end up back with what this article produced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: