Wednesday, December 06, 2006

What is All the Code?

Recently, I've found my self trying to explain to a few different people what exactly all the code is. . I used to be able to say something along the lines of "its google for code", but then the pesky people at google released Google Codesearch, which inconveniently (or conveniently depending on how you look at it), is quite different from what All The Code is.

The interface for all the code is much more like cpan than google code search in terms of interface. Google codesearch has a very powerful regular expression engine, which (in my obviously biased) opinion is not all that great for finding code. Sure, if you want to find places where people do silly things resulting in possible buffer overflow, its a gold mine, but not so much for finding code that you might want to use.

With All The Code, the idea is you enter in what you are looking for (say csv library), and it does its best to return things that are a good match for csv library. You don't have to sit around and think about the relative importance of class versus ID name versus comment, we have dozens of trained monkeys doing that for you*. Another classy thing that we do, is we consider classy things like which contexts the code is used in, rather than just looking in the library. Generalized the idea is that more popular libraries in certain contexts might be related to those contexts and might be better than the library that no one uses.

Another interesting question is, will it work any better? The answer is a resounding "maybe". At the start, google and others will have an upper hand on us just based on how much code they index, but I'm hoping to close that gap by the end of 2007. For now, I'm interested in getting it out there and trying to get some people to use it so I can try and figure out what areas need the most work. Hopefully the "alpha" label will keep people from juding it too harshly, but I can always launch the next generation version under a different name if I turn out to be wrong :)

*Trained monkeys may or may not exist and may or may not be software

