Tuesday, November 28, 2006

Going Live Dec 11nth

In the spirit of releasing early, All The Code will be going live on December 11nth for the win. Now, everyone seems to be releasing there products in "Beta", but I'm not sure it will be exactly beta quality by then, so I'm going to call it a public alpha. Take that you fancy Web 2.0 "beta" people :)

The planned features for the public alpha is basic search functionality and only Java supported. I will probably set up a poll for what language I should support next. I'm hoping to get searching inside archives and popular source control systems done over Christmas. After that I will probably add about one additional language a month (perhaps a little more depending on how much free time I end up with).

Thursday, November 23, 2006


The most interesting technical suggestion I got back from the demo I gave was to offer people the ability to write there own plugins for various languages.

The most powerful part of All The Code is naturally in the anlysis it does, but because of the level which it goes to it actually requires building an AST to be able to extract all of the intersting bits of information. Writing parsers (or even finding and modifying them so they contain the extra information that most parsers through out that I keep) takes a reasonable amount of time to do. Initially I was planning on implementing support for the big languages only and slowly adding support for the less popular languages.

I'm not sure if anyone would actually write plugins for it, but I think making the plugin system for it will force me to keep (at least that part of the code) clean. On the downside, it gives me a whole lot of potential headache to deal with in addition to all of the other things which need to get done. I haven't decided one way or the other, but I'm leaning towards implementing a basic API and seeing if anyone bytes.

Tuesday, November 21, 2006

DemoCamp Ottawa 2, A look back

DemoCamp Ottawa was quite interesting, although not exactly
what I was expecting.

My spam filter managed to eat the confirmation e-mail from the organizer, so I
didn't appear on the schedule but fortunately I was able to demo anyways. I'm certainly going to have to look into
my spam filter, its eaten a few other decidedly non-spammy e-mails that I much rather have.

All of the other presentations had actual UIs [most of them quite nice] and the technology was way beyond the "I built the DB yesterday/this morning I hope it works", but hey whatever :-)

One of the demos (Context Discovery) looked really cool to me, essentially it attempts to create a summary of a text document based on an enhanced version of an algorithm that came out of the nrc. They seem to be getting ready to do a beta, and I'm certainly going to take a look at it once its released [although the demo was on a WIN32 platform, so there is a reasonable chance they won't release a linux version in there beta, I forgot to ask...]

I think its safe to say that the Race Dv people had the slickest user interface there. There product is aimed at racers (which I'm not), but it looks really cool. Spending Profile seems to be aimed as a web-based slimmed down Quicken/MsMoney alternative for people looking to track their spending habbits. The buttons for the UI were a bit hard to make out on the projector screen, but it looked like it could be a useful tool for people trying to make/follow a budget.ChoiceBot seems to be aimed to making it easier for consumers to find what they are looking for, which increase the conversion rate. Being able to specify the relative importance of various features seems like it would be useful for buying electronics and whatnot, but since the majority of my purchases are books I probably won't see it anytime soon.

The All The Code Demo didn't go particularly well, but the database didn't fall over dead either which is a good start. Some people wanted to know what All The Code does differently than Google Code and similar, and the best way I've found to summarise it is that it considers the relations between pieces of code. At some point in the future there are so many other things that I would like to implement as well to make it even more useful to people, but for now its important to get the foundation well built. Hopefully, eventually, it will do a bit more than that as well. I'm going to have to get better at giving demos, although I should probably make a reasonable user interface before I try and give another one :).

After the Demo one of the other presenters suggested that I e-mail him and that he might know someone who I could try and sell an early version to, but he cautioned me that they might just try and steal it. For some reason I'm not particularly worried about people trying to steal All The Code, I think its one of those things which you need someone really crazy to implement it, there are so many places where I think a lot of people would be put off and just give up [and hopefully buy my product :)]. Of course, I could be wrong, but such is life.

I actually met up with a number of people from a computer camp that I used to volunteer at back in the day, it was nice to see them again.

Monday, November 20, 2006

All The Code Pre-Alpha DB Built

The All The Code database finished building at about noon today, which means that I can give a reasonable demonstration of it this evening at Demo Camp Ottawa (2). This is a big relief :-)

If it goes reasonably well, I'll see if anyone is interested in seeing an improved version on December 2nd (when Bar Camp Ottawa meets again). On the todo list is generalizing the Ocaml DB back end and moving as much of the computation as possible onto the Amazon Web Search Platform and Amazon Grid and putting a usable user interface on it. Not to mention adding support for cvs/svn and compressed archives :-)

Depending on how much time I have I may add support for C/C++, which is actually more challeging than java thanks to templates, macros, and a reasonable amount of variation between different implementations. I probably won't add any more languages before the beta, since I'm expecting to find out that I've missed something during the beta :-)

Either way All The Code will probably go public beta near the middle of December.

Sunday, November 19, 2006

Final Stage

The Final Stage is now building, albeit with a whole bunch of limits placed on it. It would appear that I definitely need to figure out how to paralyze the build process in the future, I've got a few ideas, but I'll also check with the people at AWSP to see if they have any recommendations on how to do it [since they have much more experience with that sort of stuff than I do]. I'm not really sure if the final stage is going to be built in time for my presentation tomorrow, but in a worst case scenario I can just go with the phase two stuff I have from before, it doesn't really demonstrate the unique parts of ATC so well, but I suppose that is life.

Top 5 Signs that your IDE is bloated

I've been playing around with a lot of new languages recently as part of getting All The Code ready, and I've tried numerous IDEs (camelia, radrails/eclipse, etc.) for different languages (ocaml , ruby , c, etc.) and they all seem to suffer from extreme bloat.

#5 It uses more ram memory FireFox with 30 tabs
#4 You can see a refresh rate as you type
#3 It takes longer to load than the rest of your operating system
#2 If someone replaced it with a web-based application you wouldn't notice the delays
#1 The developers claim it does more than emacs.

Whats the lesson of this story? There isn't one, except that writing blog posts while waiting for batch jobs to run is good way to pass the time.

Phase 1 Done, Writing Phase 2

So the Phase 1 build finished at about 5am this morning and I started writing the Phase 2 build code at about 8am.

The Phase 2 code is responsible for most of the heavy lifting, but to be able to get things finished up for Monday I'm limiting the scope to three critical components and expand the scope later.

Why am I writing this blog post rather than codeing? I'm waiting for the coffee to be ready so I can think clearly again :-)

Saturday, November 18, 2006

Phase 0 Done, Phase 1 Started

So Phase 0 of the ATC build process is complete. What is phase 0 you may ask? Its a cool sounding name for I finally have the data on my computer ready to be processed. Since I'm only indexing java programs for monday (I haven't written parsers for anything else yet) I wasn't expecting there to be too much data I'd be able to pull from AWSP, maybe 700mb or something in that range. However, without doing any of the fancy things that I want to do later (like looking inside archives, csv/svn/git) I now have 2.4Gb of data to shift through for monday. Sadly, the code I wrote for the Phase 1 build code has a few bugs, which I need to get fixed.

Monday, November 13, 2006

Amazon WebServices, or why I might get a chance to sleep after all

I wasn't expecting such a fast response from amazon with regards to my problems using AWS, but I must say I'm pleasantry surprised. Monday afternoon my time (Monday morning there time) I got a conference call, and we went over how I was trying to use the Alexa Websearch service, and they explained the reason for all my frustration. Namely, that it times out with very large start values, which is why I couldn't get more than a few hundred records. They didn't place a hard limit on the start values, since when the problems will start to occur is apparently dependant on a large number of factors.

On the upside, it looks like I won't have to write my own crawler since they also enabled my Amazon Web Search Platform account, which should not have these problems [AWSP is designed for people trying to build search engines, and has a different model than the regular AWS system]. The next seven days are going to killer finish my demo, but it should be possible :-)

Saturday, November 11, 2006

Amazon Web Services [or why I won't be sleeping anytime soon]

As part of All The Code I was planning on using AWS Alexa Websearch to find the bits of code to put into the database, saving me having to write a crawler. Sadly, it would appear that Amazon's Alexa Web Search still has a long way to go before being usable.

So over the course of about an hour my program as able to pull 900 results out of AWS, which is totally not enough. AWS fails ~60% of the time to return any data. Another ~5% of the time it returns an error message [normally along the lines of "Timed Out"] and the remain 25% of the time it returns some useful data. I could be more understanding if the service was in "beta", but it is supposedly release quality. In my books, a 60% failure rate is nowhere near release quality. I could live with the 60% failure rate, my program automatically retries failed requests using an exponentially delayed backoff, but I can't seem to get beyond 920 results, even for something with 144000 results.

I suppose it serves me right for trusting another companies products to work, but now it looks like I'm going to have to write my own crawler, which I will probably have to limit to svn/cvs/git on sourceforge so I can get something done in time for the demo I agreed to give. No rest for the wicked :-)

Thursday, November 09, 2006


So, as part of DemoCampOttawa, I'm going to be giving a brief demo of AllTheCode. Hopefully I can get it working well enough before then :-)

Free Blog Counter