Sunday, December 31, 2006

Last minute fun

As a lot of people have noticed, you don't really notice a lot of bugs until you actually try and ship something. One of the more frustrating ones is that my AMI image (for Amazon EC2) I had constructed didn't want to boot, which is somewhat inconvenient to say the least. I hadn't noticed the problem, since all of the earlier versions were run on my one dedicated server [with a limited data set]. Fortunately, after a few days of butting my head against the wall I managed to get a working AMI image, so now its just down to getting it to run through my data in time [which is a bit questionable], in part because I did my bench marking on my laptop and assumed that the EC2 images would be faster; but my laptop would appear to be about 4 times faster.

While I was doing some random poking around online, waiting for a batch job to run, I found which has a really cool preview feature. So I randomly added it to AllTheCode, but then I thought back to how I originally discovered it [which was on some website with it and then I went to go find out more about, completely forgetting the originally sight], which seems like a bad thing for visitors to do, so I'm not so sure if I should leave it in the soon to be released version.

Working on code alone all the time can get a little bit boring, so I had a coding party where a few people (~4) all hung out in the same room, drank caffeinated beverages and hacked away on our own projects. If I have enough time in the new year, along with all of the other stuff going on, it is something I'd like to do more often.

Tuesday, December 19, 2006

Scaling & Pretty Printing

I've been spending a fair bit of time trying to get AllTheCode able to scale to handle more than just a few users, and I'm finding bottlenecks in the usual places, plus a few interesting ones.

Pretty printing is one of those things which has been around for a long time and most people just kind of ignore. However, for all the code, I have to show people the results, and rather than just provide them with a download link [as I first wanted to do], most of my competitors seem to have decided to provided a pretty printed HTML version of the code. I can totally understand why, for one thing it is more convenient since if you just want to look at some of the results quickly you don't want to download all of them. As luck would have it, vim can dump its wonderful pretty printing to HTML and Ruby on Rails has a built in method to highlight the relevant terms in the output [yay!]. The only problem is, that vim2html is a surprisingly cpu intensive. Since there will probably not be too many users, I'm going to keep it for now and perhaps replace it with Kate eventually, but I still find it humorous that my next projected bottleneck is pretty printing :)

The more traditional bottlenecks of the DB [despite some early locking issues], seems to have been mostly solved. Things are easy for me since there are plenty of places for me to cheat [like handling updates, pshaw. Updates are for people aren't me :)]. Despite all of the "ruby on rails" is hard to deploy, so far its only marginally more difficult than deploying a traditional php script. I'll admit, a single server deployment has a lot of unnecessary hassle, but when doing a multi-server deployment, the things that people complain about you have to deal with regardless of ruby on rails, or django, or perl or really anything.

Sunday, December 17, 2006

Last Day

Friday was my last day @ Xandros (conveniently also the day of the company Christmas party). Looking back on the past four months, its been more fun than my previous Co-Op jobs. I've had a chance to play around with the wonderful world of device drivers, file systems, and boot loaders for increased stylishness. Part of the reason why I think it was so fun, is that I was learning a lot of it as I went along, whereas I was pretty well qualified for most of my previous jobs.

The plan for the next week is to split my time 50/50 between All The Code and another project that I'm working on. I've fixed the concurrency bug with All The Code, but I've also noticed a few not so good thing, and a pretty awesome thing. The not so good thing is that about ~10% of the records in the DB weren't java programs (which I've fixed). The awesome thing is that I've figured out a way to get ~10 languages supported really quickly, so I'm going to try and get that done over the next few days.

Saturday, December 09, 2006

Slight Change of Plans

Software, rarely, if ever ships on time and it looks like All The Code will be no exception to this. Well, sort of. The software is sort of done, in the sense that it works, just not well enough. The main problem is one of scaling, that is in my less than scientific tests at anything more than ~10 users it chokes. On the upside, I know why this is and how to fix it, but its going to take a bit more time than I have. Instead of having a public alpha on December 11nth, I will have a by request pre-Alpha on December 11nth and the public Alpha on Jan 2nd. The delay gives me a good solid week of time I can dedicate to fixing the two bottle necks that are slowing it down the most.

If, for some crazy reason, you want to play around with all the code, e-mail me ( ) along with the subject line "All The Code pre-Alpha" and I will either send you back an e-mail with a username & password or I'll let you know if there isn't enough room.

Friday, December 08, 2006


Today has been one of those "D'Oh" sort of days for me. I've done such classy things as had programs try and talk to DBs which only exist in my brain. I've "fixed" the same problem ~5 times [each time takes ~30 minutes to run through the batch job and get to the same error], etc. The problem turned out to be that my c wrapper was calling /us/home/holden/ninja/crazy rather than /us/home/holden/sadninja/crazy . Also, shortly after fixing the problem the AWSP framework drove off the metaphorical cliff and stopped allowing queries. D'oh :( Hopefully its fixed for tomorrow?

I was hoping to have everything done for tomorrow afternoon, but Sunday afternoon looks like it may be a more reasonable target now.

There isn't all that much more code that needs to be written, its just getting down to an issue of where I can split up the work so I can use EC2 and fixing the bugs that inevitably appear when I do this :)

On the upside, while I was looking for EC2 related information I found out that I could use S3 to host static pages, which is kind of useful (since my dedicated box's pipe is already pretty well loaded doing other things and I don't want to get another one). I'll probably use them to host the static pages for all the code. ninja style!

Thursday, December 07, 2006


That is how much money I've spent [in machine time] trying to figure out why my new and improved parser dies horrible on awsp. I thought maybe there was a problem with the framework I was using, perhaps I wasn't flushing the output to the file properly in my ocaml app, any number of possibilities. Turns out I was having 7 machines all write to the same file [where I thought they were all writing to different files in there local cache]. The reason why it never showed up in my ,albeit limited, testing is I only have on machine, and only on the platform does it run on multiple nodes at the same time. I hope I don't have too many more gotchas like this hanging around since Monday is fast approaching :)

Wednesday, December 06, 2006

What is All the Code?

Recently, I've found my self trying to explain to a few different people what exactly all the code is. . I used to be able to say something along the lines of "its google for code", but then the pesky people at google released Google Codesearch, which inconveniently (or conveniently depending on how you look at it), is quite different from what All The Code is.

The interface for all the code is much more like cpan than google code search in terms of interface. Google codesearch has a very powerful regular expression engine, which (in my obviously biased) opinion is not all that great for finding code. Sure, if you want to find places where people do silly things resulting in possible buffer overflow, its a gold mine, but not so much for finding code that you might want to use.

With All The Code, the idea is you enter in what you are looking for (say csv library), and it does its best to return things that are a good match for csv library. You don't have to sit around and think about the relative importance of class versus ID name versus comment, we have dozens of trained monkeys doing that for you*. Another classy thing that we do, is we consider classy things like which contexts the code is used in, rather than just looking in the library. Generalized the idea is that more popular libraries in certain contexts might be related to those contexts and might be better than the library that no one uses.

Another interesting question is, will it work any better? The answer is a resounding "maybe". At the start, google and others will have an upper hand on us just based on how much code they index, but I'm hoping to close that gap by the end of 2007. For now, I'm interested in getting it out there and trying to get some people to use it so I can try and figure out what areas need the most work. Hopefully the "alpha" label will keep people from juding it too harshly, but I can always launch the next generation version under a different name if I turn out to be wrong :)

*Trained monkeys may or may not exist and may or may not be software

Tuesday, December 05, 2006


One thing which I find universal among people trying to make something is optimism, or at the very least localized optimism.

For me, this means thinking that I can do tasks in 30 minutes when they really take an evening, the database will build on the first try, and all sorts of wonderful things like that. While this can be really useful at times (namely when starting or keeping on going), it can also make things un-necessarily difficult, by setting release dates too early and other fun things like that.

It still looks like I'll be able to hit December 11nth, just 30 minutes to fix this one last bug, rebuild the DB, another half hour to put together a new front end, and few other "minor" things ..... :)

Free Blog Counter