Google search and Web Edition

If you use Real Studio Web Edition right now the contents of your app are invisible to Google. I think a lot of people are using WE for internal projects or for apps that need a user to log in before anything useful is available. This isn’t a problem for that kind of project but for some web apps it would be nice to have certain areas show up in Google’s search results.

This isn’t just an issue with WE, other AJAX based web apps have the same type of issue and Google has a document that describes how to make AJAX web apps crawlable:
http://code.google.com/web/ajaxcrawling/

The basic idea is that you have special hash tags for the parts of the your app that should be indexed. When Google sees those it asks your app for snapshot of what the app would look like after the JavaScript has been evaluated.

One approach that Google suggests for getting snapshots is to use something like HtmlUnit. HtmlUnit is a program that acts like a web browser but doesn’t have a user interface. It executes JavaScript just like a browser would and then you can extract the contents of the page.

I did a little bit of experimentation and found that this approach works for simple cases. I ran into a some problems with a real application but they probably have solutions. It seems like the best solution, though, is for WE to support Google’s approach itself. I’ve created a feedback report requesting this as a feature.

If there are parts of your app that could serve as entry points and you want to be able to drive search engine traffic to them, please sign to this report:
feedback://showreport?report_id=19412

7 thoughts on “Google search and Web Edition

  1. Did you actually get this to work in a WE app? I don’t understand how since the app won’t “load” nor any of the links work unless the javascript is executed first. In other words, were you able to get your WE app to respond to the Google crawler? If so, could you provide more details of how you accomplished that?

    Thanks,
    Jay

  2. HtmlUnit executes the JavaScript just like a browser does. I did get a proof of concept working on a very simple app but when I try it on a real app it produces the error dialog every time. If I could get around that, here’s how it would work:

    (1) Create a Java app using HtmlUnit that produces the source for a WE URL that it’s given
    (2) Put a URL rewrite rule on the server that checks for Google’s indexer request. The rewrite would send the requested URL to the Java app which in turn would access the WE URL and produce a snapshot of it.

    I’m not sure what the best approach is for deploying Java apps on a server.

  3. @Seth Verrinder
    Does the URL rewrite actually work with the indexers? I mean, will the search engines allow it, or flag the site as suspicious? There’s a lot of no-nos that will get your site excluded from the search indexes.

  4. A rewrite takes place on your server so I don’t think Google would actually be able to tell that it had happened. I don’t think they would care, though, even if they could tell because there are plenty of legitimate reasons for a server to rewrite URLs internally (for example to show pretty URLs to humans that map to ugly but easier to manage URLs internally).

    BTW, I’ve done a bit more experimentation and I think I’ve found a way to actually implement this. Look for a follow up sometime soon.

  5. Not yet. I went on vacation for a couple of weeks and have been working on catching up. As soon as I get a breather I’ll do a new post.

  6. Seth, I’ve finally finished building and testing a system that works with Google’s crawler. It uses a combination of Google’s AJAX crawling spec, redirects (mod_rewrite on Apache) and WE’s special URL handler. Turns out that the “special” entry into the app doesn’t use javascript – but the caveat is I have to hand-build the html to send back to Google. But since all my pages are generated from a database, it’s pretty easy, especially since I don’t have to worry about formatting or styling the html for the search engine.

    Is any of this similar to what you are working on?

Comments are closed.