You can still avoid loading whole new pages. You simply attach Javascript events to your anchor tags and do whatever Ajax content trickery you want that way. The page content itself is maximally flexible and useful to all agents if the URLs inside of it are actual URLs.
The only problem with that is you end up with a mix of both. If a spider collects all the non-ajax links, and shows that to a javascript-enabled browser, the user will end up on eg. /shop/shoes.
If the site is ajax enabled for a slicker experience, then as the user browses from here they might get something like this in their address bar:
/shop/shoes#!shop/socks
or even
/shop/shoes#!help/technical
which starts to look really weird. The google hashbang spec at least fixes this problem. The spider understands the normal URLs of the app, and will dispatch users to them.
Can you not use JavaScript to figure out your URL is a mess and redirect accordingly? JavaScript for redirecting people to the homepage of websites have been available on dynamicdrive.com for at least a decade now.
That's one redirect to the homepage (which you're already doing by 301-redirecting the JavaScript-free URLs anyway), so it's hardly going to be difficult.
I'm puzzled, considering the haphazard redirects already going on for incoming links to hash-banged sites, why this isn't a trivial problem.
Incoming link is to /shop/shoes#!shop/socks
JavaScript right at the top of /shop/shoes that window.location to /#!shop/shoes
1.) Is a limitation of Google's crawlable Ajax proposal. That would probably not have occurred with a proper standards body. What sequence of events would have to happen to have that as an inbound URL? I sense some previous JavaScript would have to have failed to allow that scenario.
2.) The site is already paying this price by redirecting _escaped_fragment_ URLs, and the old clean style urls. All inbound links will have this problem, so you're only shifting some of the burden through this door instead of the others.
no, with google's proposal, the #! links are all from the site root, see Lifehacker and Twitter's implementation. So these ugly half and half URLs never exist, and you're not paying a double request price
ah you're right, and yes that could possibly introduce the issue of redundant work done on the server depending on the implementation. However the two major implementations I've seen (Twitter and Lifehacker) use it from the root and so dont have that problem.