Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Readable: A more readable version of "readability" (tastefulwords.com)
138 points by jbm on May 8, 2011 | hide | past | favorite | 92 comments


"Legally, Readable's source is under the Creative Commons Attribution-NonCommercial-ShareAlike license."

Erm... Why? Correct me if I'm wrong, but Creative Commons wasn't made for code. There are plenty (even too many) open source licenses to choose from. Why not one of those?

EDIT: Yup, using cc for source code is not recommended: http://wiki.creativecommons.org/FAQ#Can_I_use_a_Creative_Com...


Honestly, I browsed around for less then ten minutes, before deciding on Creative Commons for the code -- and I did so only because I previously saw source-code under various Creative Commons licenses. I did not know that it wasn't recommended; thanks for letting me know. I'll be sure to study the issue more thoroughly and re-license the code.


Open Source Licenses[1] is a good place to "shop" for licenses, though you might find the "by name" view a bit overwhelming. Maybe start with from the "Licenses that are popular and widely used or with strong communities" section on the "by category"[2] page.

[1] http://www.opensource.org/licenses/index.html

[2] http://www.opensource.org/licenses/category


Thank you.


Why not release it under the MIT license?

http://www.opensource.org/licenses/mit-license.php


As soon as I get a few free minutes, I promise I'll look over licenses more carefully and re-release the code.

In case anyone is interested in using the source code right this second, just come back to the site tomorrow and you'll find the new license explicitly stated.

If you have any cool ideas for what to do with the code, let me know -- I like hearing cool ideas.


It's not recommended for code, but it's one of the few understandable licenses that includes "non commercial" clauses. I've got clients that want to put out code under "creative commons non commercial" because they don't want the code they paid for being used by commercial competitors, but they're fine with people using it non-commercially.

Almost all of the 'open source'-type licenses out there concern themselves solely with distribution, not with use. When people find the CC non-commercial, it makes sense to them. Perhaps there needs to be a good 'open source' license created that also includes some use restrictions for these types of cases?


In my experience commercial vs non-commercial is a can of worms. Not to mention that using it for code is not recommend as you can read on the Creative Commons website.

Especially if you (the one licensing your work) are not including a definition of commercial use. Since the license does not include a definition (unless things have radically changed lately) of commercial use, your interpretation of commercial may differ from those using your work. Which for both parties may cause issues.

If you do include a definition than at least the people using your work should have some clue as what constitutes commercial in your interpretation. However 'translating' this across judicial systems/countries/cultures/social spheres will again most likely lead to issues you probably want to avoid.

In my personal opinion the commercial vs non-commercial clause should never have existed in the first place. Maybe it should be shelved just as the Developing Nations license[1] was shelved.

disclosure: I was part of the Creative Commons Netherlands team which introduced the licenses in The Netherlands. I'm still involved in the CC community albeit less active then I used to be.

[1] http://creativecommons.org/licenses/devnations/2.0/


Example of libraries using CC for exactly that end: Highcharts. It's under CC-BY-NC.


I meet people who license code under Creative Commons. The license is made for lots of stuff, but it doesn't deal with the intricacies of source code.

Some of them are often confused about open source licenses. It probably doesn't help that there's such heated debate (GPL vs. BSD, for example) amongst the licenses.


Maybe a bit off topic, but readability, readable, instapaper, etc - weren't user-defined style sheets supposed to address at least some of the issues these services are addressing? I ask because it seems that sometimes we get decent tech ideas (the idea behind CSS is good at least) but then we settle for half-baked implementations and don't pursue good user tooling. CSS specifically - no browsers have good ways of defining/saving/using custom stylesheets, yet this (IIRC) was a selling point of CSS back in the day. Had we had better browser support for this, we may have had a much different web than we do today.

As it is, people think it's really cool when there's those stupid 3 font size icons at the top of a page and they can 'resize' the page's fonts between 9, 11 and 13 point fonts (w00t!)


Couldn't agree more. If user defined style-sheets worked (and if content and presentation were truly separated) the web would be a much better place.

When I was developing Readable, btw, I pretty much thought of it as a better implementation of user-defined styles. The text-parsing and main body text extraction is just my way of getting around the problem of content/presentation.


BTW, this wasn't a bash at you at all (if you took it that way). More a lament that we ended up in this chicken-and-egg situation. No point in making user-css tools because there's so much inline style and mixing, and no point in moving towards building sites with user-defined CSS in mind because there's no good support in browsers.


Don't worry, I didn't take it in a bad way. And I do, actually, completely agree with you; I thought that my comment made that clear -- but I do appreciate you making sure I didn't take offense; that kind of courtesy is rare.

Back to our issue, though: if, tomorrow, browsers all got good enough at user-css that Readable wouldn't be needed anymore, I would gladly convert it into a one-page tutorial explaining to people how to set up their user-css :)


Yeah, I was probably writing back more to assure anyone else reading it that I wasn't bashing you :)

Thanks for some great work, by the way!


You are most welcome.

It was -- and still is -- my pleasure.


Awww..... group hug! :-)


Most of those services provide additional utility which CSS doesn't provide: strip out content from other site distractions. But yes, sites should offer more reading options and usability tools to users. Or may be browsers?


(For what it's worth, I was a fan of the old Readability plugin, but I am somewhat concerned about it's direction now.)


Our company (EDIT: not related in any form to readable or readability) does nonrss based crawling (blogs, news, message boards), so we spent quite some time on automatically identifying article, author and date and comments in articles.

If someone is interested in creating something similar to readability (e.g with us doing the article extraction for you) or does a need a website article extraction, you can contact me t.britz@trendiction.com.

PS. We have so many other ideas on our own and that's the reason we are not doing it ourselves.


Would've been nice if you mentioned that your company has nothing to do with Readable, as there is a chance someone reading might get that impression -- especially for the short time that your reply is the first reply.


Sorry, didn't think about that. Edited it. thanks


Thank you, too.


What about printable?

One of my major uses for Readbility was to format articles so that I could print them out for later reading. But a recent change to Readability made it so that (at least under Firefox) the browser print function will not break an article into multiple pages. And that makes it pretty useless for printing.

Whatever the problem is, Readable has it, too. Can something be done about this? (Also, does anyone know what the problem is?)

In any case, improving the readability of the web is a worthy goal. Thanks for your efforts.


I'm not aware of the problem you're describing. And I can't seem to reproduce it.

Could you provide more details, please? -- URL of an example article, more details on what exactly happens that shouldn't happen.

If you'd like, you can get in touch, to do this (http://readable.tastefulwords.com/about-and-contact/) -- as a matter of fact, it would probably be preferable, as opposed to using HC as a bug reporting forum.


I'll do both (report here & there).

I just tried the top five HN links. All exhibited the problem.

http://blogs.msdn.com/b/powershell/archive/2011/04/16/powers...

http://www.logolounge.com/article.asp?aid=lnPf

http://code.google.com/p/leveldb/

http://www.technologyreview.com/computing/37525/?p1=A2&a...

http://openfarmtech.org/weblog/2011/05/solar-fire/

For each, I clicked on the link, and then clicked my newly made Readable bookmark, which I had set up using the default settings. Then I went to Firefox's File:Print Preview. I have it set on the default settings (Shrink to Fit, Portrait). For each of the five articles listed above, the result was that only one of the displayed pages had article text on it, and this was not the entire article, since it would not fit on a single page. Sometimes other pages were displayed, sometimes not, but if there were any, then they were blank pages.

I am running Firefox 3.6.16 under Ubuntu 10.4 (Lucid). I'm perhaps a bit behind in my patching, but, in any case, this is not a new problem.


The problem, in the steps you said you followed, is the "Print Preview" step. File >> Print Preview will show you a preview of how the entire current page will look like, when printed -- but that's actually not what you want to print.

What you want to print is only contents of Readable's overlay. To that end, please use the Print link, shown in the menu at the bottom of the overlay -- unfortunately, there is no "Print Preview" available.

The technical explanation for this is that Readable's overlay is actually an IFrame -- and browsers support printing the iframe contents as if the iframe were a window onto itself; but they will print the iframe as an element in the main page, when you're printing that instead.


Wow.

Well, you're right. "Print" works fine.

Three suggestions for you:

(1) Make the "Print" link easier to get at. Not just 'way down at the bottom of the page. Maybe put one with the "Close" link at the upper-right?

(2) Change the text of the "Close" links. I'm not sure what they should say; but "Close" doesn't really convey the right idea. What it actually does, from the user's POV, is not closing, but rather returning to the original styling.

(3) Maybe figure out how to make Readable work with Firefox's Print Preview. (I realize that FF's behavior isn't your fault, but it would still be nifty if it worked.)

Thanks again for all your work.


Readable's interface is due for a slight overhaul. But I haven't yet decided what the new design is going to be. I will take your suggestions into consideration, though.

As for the work: you are most welcome.


Easily reproduced. Tried on the following page:

http://www.go-hero.net/jam/11/languages


Thanks for confirming. Read my reply here: http://news.ycombinator.com/item?id=2526937


I'm Readable's author.

If anyone has any suggestions for improvements, or even feature requests, I'm all ears.

I'm not promising I'll implement them; but I'll definitely listen and consider them carefully.

P.S. jbm, thanks a lot for posting this -- I've tried posting Readable to HC myself (twice), but it didn't stick.


I'm another one who uses such services for PRINTability. And one of my big sadness is that the author_line / date_line url_lines are often missing: a real nuisance on a piece of paper that gets discovered months (years?) later.

Note: readability only of late thought to start displaying URL at end of articles.

EDITs:

1) Meant to say I'm pleased to see you working on more flexible/powerful version than Readability (don't much like their latest approach of redirecting during reformatting).

2) You might want to try this sort of test-case - for which Readable only presents the first paragraph: http://boston.com/bostonglobe/ideas/articles/2011/05/08/seni...


Boston.com is now fixed; thanks for the report.

As for your issues on printing: thank you.

I had honestly not given printing very much thought -- as I don't really use it myself.

Your ideas make a lot of sense, though; so count on seeing them implemented.


Great, Gabriel...

I like the way your mind works - and so look forward to following your work.


Why do you strip quotes? Specifically, you're doing:

    blockquote, q {
        quotes: "" "";
    }
and

    blockquote::before, blockquote::after, q::before, q::after {
        content: "";
    }
You'd be better off completely removing this to begin with, as even if you did specify the quotes (which should only be on <q>, not <blockquote>, and it makes no sense to include <blockquote> in that rule anyways), you would have to re-define the quotes for every single language (q:lang(...)).

If this wasn't intentional, you should really pay attention to what your CSS reset does and make sure to explicitly define everything it resets.


This is a bug; thanks for catching it. It will be corrected.

As for the intentionality of that CSS rule: it was intentional -- but some time ago; when quotes were redefined in another part of the CSS.

Blockquotes are included because some browsers quote the content of that element too; don't ask which browsers though, as I honestly don't remember -- but older versions of one browser or another definitely did this.


Scroll marker support (a la Opera[1], although Opera seems to have dropped support for it) would be excellent! This would be a visual indicator on the screen marking the previous bottom of the screen when the user hits space or page down.

[1] Screen shot of scroll marker in action: http://my.opera.com/Tamil/albums/showpic.dml?album=210985...


Thanks for the suggestion. This is already a planned feature; so look forward to seeing it sometime soon.


Awesome, thanks!


One thing the new readability does that I love: when paging down with the space bar, it puts a spot (•) in the margin where the previous bottom of the page was, and then smooth scrolls. Together, these make it much easier to keep your place on the page when scrolling; a bit enhancement to easy readability.

It would be even better if you could bookmark your place in a long document and return to it later...


Noted. Count on finding both of those in a future release.

The first one I was actually doing before Readability -- the old version of Readable did it (and Readability wasn't), but I haven't yet reimplemented the feature in the new version.


I put this above, but I'd like to have access to my Apture extension. For some reason it gets disabled.


I'll be sure to look into exactly how Apture works -- and, if possible, I'll make it work inside of Readable's overlay too.


loving the script, it is breaking the right+click 'smooth gestures' extension in Chrome though. Which is how I normally close a tab.


I'll look into it, and see if i can fix that.


A great improvement over Readability! It's bookmarklet has been replaced, at least on my bookmark bar: it's faster, has increased customization options and justification, and is open-source... Add hyphenation to that mix and you've got a solid ten out of ten service.

(Justification and hyphenation were the two features I always wanted Readability to have, and it always seemed odd to me that something branded "readability" lacked them...)

In the FAQ you answer to whether Readable is open-source: "no" in the sense that the source isn't on display somewhere. Not to split hairs here, but that sentence is incredibly confusing; what does having your code "on display" have to do with being open-source?


As for the license the code is under, I say "Yes and no" and detail them both. The "no" part address the fact that Readable's code isn't (and probably won't be) in any public code repositories.

If you're interested in the code, though, you can find the bulk of it here: http://readable-static.tastefulwords.com/_r/bulk.js

And, yes, it is open source; currently under a variation of the Creative Commons License; but, as detailed further up in this thread, that'll probably change by tomorrow -- as, apparently, Creative Commons isn't recommended for source code.


Why would you want justification in something that's supposed to make something more readable? Justification might make text look nice, but it's terrible for readability, and people with dyslexia often struggle at reading it, and the same usually goes for hyphenation.


"Readable text" does not mean the same thing for everyone.

I look at it as the long tail of text formatting preferences; and I think it's everyone's right to have text look exactly the way they want -- no compromises. That's why Readable is so incredibly customizable.

So don't worry: justification is optional; and hyphenation will be too.


Why, then, are books typeset justified and hyphenated?

> Justification might make text look nice.

And that's bad? You have to look at text to read it; I'd rather it look "nice" and as non-distracting as possible. Perhaps "readability" is subjective, but I think text that flows evenly is easy on the eyes.


On the contrary, variable spacing from justification is very distracting.


Hyphenation is a planned feature. As are hanging (typographic) quotation marks.

So hang tight: more cool stuff is coming -- I just have to find more hours in the day.


I tried using Readability the other day to zap the SpaceX article from here and it damn near froze my computer. When it was done, it didn't even show me the right article...


One thing that makes me prefer Readability (or in my case the Readability Redux Chrome extension[1]) is that it's able to stitch together multiple-page articles into one page. It makes reading articles on sites like The Register and Ars Technica much more bearable. I'd love it if Readable gained this feature. (Safari Reader mode also does this, using code from the old Readability.)

[1]: https://chrome.google.com/webstore/detail/jggheggpdocamneaac...


Quoted from an email, where I answered the same exact question:

  Most likely, Readable will never have multi-page support.
  The reason for this is Readable's philosophy, and not any technical difficulties that feature may imply.

  It is Readable's intention to take whatever is in your browser window right now, and make it better.
  But it is not Readable's purview to go beyond that.

  Readable tries to act like a browser, in this respect.
  Think about it like this: Readable getting subsequent pages in the background would be pretty much equal to web-browsers doing the same thing for all paging, on all websites.
  Just because it is technically possible, does not mean it should be done.


Fair enough. I like that you have a strong sense of Readable's goal(s) and know what features would be overextension. As for sites that publish articles spread across multiple pages, I should really vote with my feet (eyes?). Looking forward to hanging quotes!


It's been quite a small bit of time, now, but IIRC I had some direct correspondence with the author/creator of Readable, and he was a quite agreeable chap. I realize this doesn't speak to the technical merits, but the exchange left me feeling more comfortable about using his product. (Leaving aside the role of advertising and the need to support sites, a separate but not insignificant concern.)

Here's the old URL/site, which is also more visible with Javascript disabled.

http://readable-app.appspot.com/


Glad I made an impression; and thanks for the compliment. I'm actually about the same in real life, too.

Don't worry: Readable won't ever have advertising -- the most I'm looking for is something like a very small "Sponsored By XXX" banner, at the bottom -- if any cool company will ever be interested that is.

The new site will be redesigned -- and it will be a lot better when JavaScript is turned off. But I really don't recommend you use the old site any more, as that still runs the old version of the application -- which is way worse, in terms of performance.

I hope you're still using Readable, and that it's still helping you read the web more comfortably.


I hope this doesn't track what I'm reading unlike the new readability.


Yup; all text processing is done in the browser.

URLs are tracked, though -- they would be tracked in my server logs, even if I didn't do anything specific to track them.

But they are intentionally tracked -- and a percentage of them are run through an automated Readable test, every day, so that I can see, on average, how often Readable gets things wrong.


I will also be adding an option that will disable even this anonymous tracking, for those who really want/need it.

However, you'll just have to take my word for it that I won't be reading the server logs -- which contain the same, if not more, data.


Author says that all the text processing is done within the browser.

http://readable.tastefulwords.com/faq/

Still, not 100% sure if the URL is considered text too and is not sent anywhere.


So the source is licensed CC-Attribution-NonCommercial-ShareAlike, but there's no public repository or support? That's a curious approach.


The source code isn't in any repository because I don't work with repositories. And, if I were to list a specific version of the code, it would become slightly obsolete in less than a week -- as I am continually tweaking Readable's text-parsing algorithm. You can find an always up-do-date version of the code here: http://readable-static.tastefulwords.com/_r/bulk.js


>> The source code isn't in any repository because I don't work with repositories.

You really should be using version control. If you want people to contribute to this open-source project, you can't beat a public repository (like github) at which you can receive patches from other developers. Right now, it is not apparent how I can work to improve your project.


You are correct, sir. But I haven't really gotten in to the open-source game just yet -- maybe I should; I don't know.

I am using version control; just not public version control.

If you want to help, get in touch (http://readable.tastefulwords.com/about-and-contact/). I'm happy to hear any ideas; and, if you want to do some actual work, that would be quite awesome too.


I don't know why you would use your own version control. and not release it... Makes little sense.


Comparison of the new versions of Readable and Readability, here:

http://www.filterjoe.com/2011/04/11/web-page-reformatting-se...

Hightlights:

The new Readability takes 6-12 seconds to reformat a page, vs. less than 2 seconds for both the new readable and the old Readability.

New Readability has more features, including Instapaper-like sharing for those who pay.

It is possible to use the old Readability, for those who prefer it.


Readability 2.0 choosing to load and reformat the pages via their server instead of doing it in place in browser really slows it down.


It does make it nice that I can share links to the "readable" version with other people.


Presumably it could be done by embedding the link in the same page.


This is a planned feature for Readable, too.

Count on seeing it in a future release.


I still don't get how/why is this different/better than the firefox readability extension?

The commercialized readability was a step back from the bookmarklet, but this firefox extension seems to work as well if not better. Could someone more experienced please explain this to an ignoramus ...

[1] https://addons.mozilla.org/en-US/firefox/addon/readability/


Readable doesn't really want to be "better" than Readability. In all, they're actually very different beasts.

And I honestly don't consider Readability to be my competition. Readable first started because of my own desire to have text formatted a certain way, no matter what website that text happened to be on.

Unfortunately, Readability beat me to the launch by 2 weeks -- otherwise you would all now be talking about Readability as a version of Readable :)

The only reason I didn't kill Readable after that, was that it was different enough from Readability to diverse it's own shot -- plus, I love working on Readable's text-parsing algorithm; it's a very cool problem to solve.

P.S. The extension you pointed to is based on the first Readability bookmarklet -- and it's made by a guy who also made an extension based on the first version of Readable.


Thank you for the clarification, I was unaware of the details you just mentioned.

Talking of algorithms, does Readable use something like the Knuth and Plass line break algo used by LaTeX? A Javascript implementation was mentioned a while ago on HN[1].

Good luck with Readable, anything that helps reduce clutter (in any part of life) is a great gift. Thanks for sharing!

[1]http://news.ycombinator.com/item?id=1974963


You're very welcome.

No; Readable doesn't use anything like the Knuth and Plass algorithm.

But I did thoroughly check out the JS implementation you mentioned; and Readable will probably use a part of the Knuth algorithm in the future -- as I am planning to support hanging quotation marks, hyphenation, as well as better (typographic) justification.


Author of the JS implementation here. Let me know if you have any questions about the implementation or problems integrating it. I was hoping someone would pick it up and integrate it with Readability-like service. (I've slowly been working to add support for it to Treesaver http://www.treesaverjs.com/.)

For hyphenation you might also want to check out my Hypher project (https://github.com/bramstein/Hypher) which is a minimal hyphenation engine written in JavaScript. In my benchmarks it is about 4 times faster than Hyphenator.js (and a lot smaller.)


Thanks for the info, Bram. I'll be sure to hit you up if/when I have any questions.


This one is better than Readability, seems faster. It's more instant like the old readability.

I am complaining the new direction that Readability take since day 1 when they decided to first abandon old readability, than force a meaningless frame around content with a very slow implementation.

My only critique for all: please have a name something other than contains "read".


I agree with you about the names -- and I really wish I could've thought of something clever when I first named it; unfortunately, it's a bit late now.


A favicon. I just want a favicon. Every single bookmark in my bar has an icon, and no words, and ALL I want is a favicon! Readability doesn't have it; neither does Readable. Makes things much uglier, no matter how pretty the text is. A golden star to whoever hacks it first.


I believe there's an extension (for Firefox, at least) that allows you to create custom buttons for bookmarklets.

In the near future, a native solution will be available -- i.e. I'm working on a thin extension, for all browsers, that will act as Readable's launcher (with benefits like keyboard shortcuts, an icon, and slightly faster load times).


It's nice that you can customize it because the default - serif - font isn't that readable, atleast not on win7+opera.

Also, I think they need to update the default font families because calibri and other post Vista fonts are as commonly used as Arial & co.


If you have any suggestions, I'm very open to them. As a matter of fact, if you're up to it, you can design your very own theme -- I'll let you name it, and give you full credit. Or, if you just have a couple of quick suggestions, feel free to list them here, or get in touch (http://readable.tastefulwords.com/about-and-contact/).


Design own theme? Sure, why not. How does this work?

Here's a bug report for you. Using Win7,32bit/Opera 11.10, the text field for specifying custom font families doesn't really show because the borders aren't visible. Screenshot http://i.imgur.com/gwb2O.jpg


Well, just customize Readable (http://readable.tastefulwords.com/?setup) to your heart's content. Then, get in touch, and send me the custom options you used.

If the theme is awesome, I'll add it to Readable's selection of themes (http://readable.tastefulwords.com/?setup#explain-style-theme...). Did you try those out, by the way?

If you know CSS, you can also heavily customize Readable via the "More CSS" option.


Doesn't work on this page: http://boston.com/bostonglobe/ideas/articles/2011/05/08/seni...

I like it though, keep it up. :-)


Boston.com is now fixed; thanks for the report.


I use NoScript, and they mentioned that possibility. I'm glad they showed respect for my choice, even though there's no escaping Javascript here.


I'd use it, but my Apture extension doesn't work while using your script.


Readability works just fine




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: