Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



Thanks. I'm trying to get cURL to decode, but it doesn't seem to natively handle. Now I'm digging into .../escape.c

I feel like I must be missing something... :/


You can't simply decode each character without losing information. For example, &#x3c; means a literal < character to be shown on the page, as opposed to a < in the stream which starts an HTML tag.

If you're just planning on displaying the text in a browser, no decoding is needed. If you want to parse the text to do some sort of textual analysis, an HTML parser library might be best.


I understand what you're talking about re: &#x3c; and '<' -- the json -looks- page (terminal in my case) displayable, barring the &#xhhhh; encoding. cURL has facilities for decoding %20 (for example), but not what we're getting back w/ this json.

You've given me an idea though, so back to vi for me.

Thx.


not sure if you figured something out already, but just saw your comment and remembered that this exists in PHP:

http://us1.php.net/manual/en/function.get-html-translation-t...

absent another source, you could dump it out for your usage elsewhere.

  % php -r 'print_r(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES|ENT_HTML5));'
or

  % php -r 'print json_encode(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES|ENT_HTML5));' | jq .
edit: just found http://dev.w3.org/html5/html-author/charref (but might be harder to parse..)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: