Saturday, November 11, 2006

Amazon Web Services [or why I won't be sleeping anytime soon]

As part of All The Code I was planning on using AWS Alexa Websearch to find the bits of code to put into the database, saving me having to write a crawler. Sadly, it would appear that Amazon's Alexa Web Search still has a long way to go before being usable.

So over the course of about an hour my program as able to pull 900 results out of AWS, which is totally not enough. AWS fails ~60% of the time to return any data. Another ~5% of the time it returns an error message [normally along the lines of "Timed Out"] and the remain 25% of the time it returns some useful data. I could be more understanding if the service was in "beta", but it is supposedly release quality. In my books, a 60% failure rate is nowhere near release quality. I could live with the 60% failure rate, my program automatically retries failed requests using an exponentially delayed backoff, but I can't seem to get beyond 920 results, even for something with 144000 results.

I suppose it serves me right for trusting another companies products to work, but now it looks like I'm going to have to write my own crawler, which I will probably have to limit to svn/cvs/git on sourceforge so I can get something done in time for the demo I agreed to give. No rest for the wicked :-)

