Tuesday, October 29, 2013

I've written a book "Fast Data Processing with Spark" which covers Python, Scala, and Java

I recently finished writting "Fast Data Processing with Spark". Apache Spark is a framework for writing fast, distributed programs.


Fast Data Processing with Spark covers how to write distributed map reduce style programs with Spark. The book guides you through every step required to write effective distributed programs from setting up your cluster and interactively exploring the API, to deploying your job to the cluster.


Personally, while the fast nature of Spark is not to be understated, I really enjoy its functional style APIs and find it a lovely environment to code in.

Sunday, August 11, 2013

The ICFP 2013 contest is over :)

For the past few years I've been participating in the ICFP contests as a way of keeping in touch with some friends from university. This years was run by MSR http://research.microsoft.com/en-us/events/icfpcontest2013/ . The vague "plan" for next year is to try and be more awesome next year during the lightning round as we only seem to actually write much in the way of the code during the first day. Our code is up at https://github.com/abarbu/icfp2013 (not that you actually want to read it, but if you are crazy :)).

Monday, July 22, 2013

Whats new in Spark this week #3

Its been a busy week in the world of Spark with some interesting announcements. First off there is another AMP camp being hosted at Berkeley http://ampcamp.berkeley.edu/amp-camp-three-berkeley-2013 from August 29th to 30th . Which, fair warning, is in the middle of burning man for those of you who participate in that. More importantly there is a new Spark stable release (0.7.3) and you can read the release notes at http://spark-project.org/spark-release-0-7-3/. This release is mostly bug fixes, however heavy users of the spark shell may wish to usenew shell environment variable ADD_JARS to add JARS to the spark shell workers.
In this weeks checkins

Monday, July 15, 2013

Whats new in Spark this week #2

I've been somewhat distracted and well there has been a several week lapse between updates for what I intended to be weekly (and that is only after the first one). That being said I'm going to take another crack at this at hopefully get a streak of longer than one.
Lets look at what has happened in the past week on spark:

Now to hoping next week I remember to do this again :)

Monday, May 06, 2013

Whats new in Spark this week #1

Whats new in Spark will look the activity in the Spark commit logs every week and attempt to summarize what new features and bug fixes have occurred. This not intended to summarize everything, mostly things that might be useful to application developers. Without further ado lets get started: 


That is all that I found interesting in skimming this weeks commit logs, if I missed something important feel free to let me know :)

Sunday, August 28, 2011

Automatic spelling corrections on Github

English has never been one of my strong points (as is fairly obvious by reading my blog), so my latest side project might surprise you a bit. Inspired by the results of tarsnap’s bug bounty and the first pull request received for a new project(slashem - a type safe rogue like DSL for querying solr in scala) I decided to write a bot for github to fix spelling mistakes.



The code its self is very simple (albeit not very good, it was written after I got back from clubbing @ JWZ’s club [DNA lounge]). There is something about a lack of sleep which makes perl code and regexs seem like a good idea. If despite the previous warnings you still want to look at the code https://github.com/holdenk/holdensmagicalunicorn is the place to go. It works by doing a github search for all the README files in markdown format and then running a limited spell checker on them. Documents with a known misspelled word are flagged and output to a file. Thanks to the wonderful github api the next steps is are easy. It forks the repo and clones it locally, performs the spelling correction, commits, pushes and submits a pull request.



The spelling correction is based on Pod::Spell::CommonMistakes, it works using a very restricted set of misspelled words to corrections.



Writing a “future directions” sections always seems like such a cliche, but here it is anyways. The code as it stands is really simple. For example it only handles one repo of a given name, and the dictionary is small, etc. The next version should probably also try and only submit corrections against the conical repo. Some future plans extending the dictionary. In the longer term I think it would be awesome to attempt detect really simple bugs in actual code (things like memcpy(dest,0,0)).



You can follow the bot on twitter holdensunicorn .



Comments, suggestions, and patches always appreciated. - holdenkarau (although I’m going to be AFK at burning man for awhile, you can find me @ 6:30 & D)

Sunday, October 18, 2009

I <3 Topatoco

I just had a wonderful customer service experience with them, and I feel the need to share and let everyone know how awesome Topatoco is.

Topatoco sent me what has to be one of the best packages I've received in a long time. After some initial problems with my first bear-monster hoodie, they mailed a replacement, which (thanks to USPS) got mixed up, but no worries, they re-sent it. With buttons! and stickers :) And most of all a hand written note.

Thursday, August 06, 2009

Translation of Pigs Can Fly Site Monitor

Pigs Can Fly Site Monitor is now available in Spanish and French. The French translation is a bit more shifty than the Spanish one. If you find a translation error, drop me an e-mail , holden@pigscanfly.ca .

Sunday, August 02, 2009

Pigs Can Fly Site Monitor Notification

I'm pleased to announce the launch of Pigs Can Fly Site Monitor for the Google Android platform, a free application. Pigs Can Fly Site Monitor (pcfsm) behaves like other regular site monitoring software, polling your website at customizable intervals to ensure it is online. Since the site monitor runs directly on your phone, you don't have to worry about haveing a second computer to act as the monitoring station, or any difficulties with slow SMS delivery.

Pigs Can Fly Site monitor is available from the Google Market for free. Users with the Google Market on their Android can download it by click here or looking under the Tools section in the Google Market on their Android phone. For users without access to the Google Market place (like NeoFreeRunner uses such as my self) you can install it from pcfsm.com.

In addition to the traditional polling, PCFSM also handles basic regular expression matching, and can optionally check if your site is linked from slashdot or reddit (as being linked to from there may cause massive spikes in traffic).

If you don't have an unlimited data plan, I'd recommend setting the polling interval to a very high number, on the other hand if you do have an unlimited data plan go wild :)

PCFSM is still pre-1.0, so there may be some bugs. If you find any please e-mail me at holden@pigscanfly.ca , make sure to include PCFSM in the subject so that I notice it.

Friday, July 24, 2009

DeviceScape now available on the OpenMoko

I'm pleased to be able to post the DeviceScape ipkg's of DeviceScape for download. The binaries consist of two packages devicescape (also mirrored on the csc) and a different wpa version(or on the also mirrored on the csc). It has been tested on the ASU software image of the OpenMoko.

While I will respond to bug reports ( holden@pigscanfly.ca ), this is likely the end of the line for this software package.

It seems like the OpenMoko software stack doesn't have a lot of life left in it, and I've got an application (involving site monitoring) I'd like to write for the Android. As such I'm going to be putting Android on my FreeRunner and hopefully crank out that application that this Saturday/Sunday.

Wednesday, July 15, 2009

xkcd404 - the xkcd that wasn't

I was reading the xkcd archives to switch my mind away from math (temporarily) and I was reminded of xkcd #404 . xkcd comics are of the from http://xkcd.com/[comicnumber] and #404 just gives back a 404 error.

Wednesday, July 08, 2009

update to web2.0collage


my web2.0collage
Originally uploaded by dmcopernicus
I've updated web2.0collage, it now uses stateless web servelets (thanks to a patch in plt scheme :)), which gives the load balancer a much easier time, sine session are no longer sticky. It is covered in a few other places, besides slashdot, now (like gigaom/nyt) and being the hobo that I am, I'm trying to see if I can get the story up on digg ftw :)

If you run into any bugs with it (which is likely) or have suggestions I'd love to hear them :)

Thursday, July 02, 2009

Your browser history is showing (an open source web application in scheme)


my web2.0collage
Originally uploaded by dmcopernicus
Over the course of last weekend I wrote web2.0collage, a browser history sniffing collage generator in scheme. Web2.0collage is designed to graphically illustrate just how easy it is for sites to determine what your browser history is. When you visit the site it sniffs your browser history, and creates a collage of the (safe for work) sites that you visit. It is an interesting application of potentially scary technology (imagine a job application site using this to screen candidates). Ideally, given some time in my schedule, I'd like to make it a bit more user friendly and robust so that I could perhaps show it to the general public to increase awareness of privacy issues on the web.

The code, while not good since I was learning how the plt-webserver & imagemagick bindings worked at the time, is available under the agpl. Today it hit the front page of slashdot, causing some less than fortunate scaling issues to be discovered. Hatguy & myself managed to fix them (sort of) without too many interruptions.

Tuesday, June 23, 2009

Devicescape, OpenMoko, StarBucks & Boingo mobile

I finally got a replacement battery for my FreeRunner allowing me to perform a rather important test, namely Starbucks support. Unfortunately the Canadian Starbucks use a different Wi-Fi provider than the American Starbucks, so the free wifi login support with Devicescape doesn't currently work. However, Boingo has a free 30 day trial for boingo mobile, which is a roaming partner with Bell (one of the Canadian Starbucks wireless providers) and Devicescape does support boingo hotspots.

Much to my pleasant surprise, the existing code worked with only a few minor modifications. I came across and fixed a minor bug involving not being able to stop the connection process, so you can take back over manual control if you so desire. Once again, if you are interested in testing this release give me a shout ( holden@pigscanfly.ca ), make sure to include openmoko in the subject somewhere so it gets through.

Now that my FreeRunner is working again I'm hoping to get a UI prototype up at the end of next weekend or two.

Free Blog Counter