92385 items (84594 unread) in 19 feeds
Friends
(839 unread)
Build
(58822 unread)
Heads
(589 unread)
News
(24137 unread)
fun
(207 unread)
Friends (100 unread)
Later this month I am starting a new job with the Mobile Firefox team at the Mozilla Corporation. I'll be staying in Seattle, with occasional trips to the Mozilla offices in Mountain View. I'll work mostly at home but I plan to come downtown about once a week to work at the library, coffee shops, StartPad, or friends' offices. Let me know if you are in Seattle or the Bay Area and want to get together sometime. After I start working from home I should be eager for company.
Kiha is still doing good things and I'll be watching their progress eagerly. I learned a lot there and I like the direction Kiha is taking, but Mozilla's offer was too good to pass up: working on free software, open standards, new platforms, and with many hackers whose work I know and admire.
Trivia: I've now had more non-student jobs in eight years than my dad has in thirty-eight.
Nathanael Boehm wrote a nice essay last month called The Future of Employment?, about a disconnect between workers' and employers' views of social networks. (This post is based partly on the ensuing Hacker News thread.) Boehm wrote:
When I need help with a challenge at work or need to run some ideas past people I don’t turn to my co-workers, I look to my network of colleagues beyond the walls of my workplace. Whilst my co-workers might be competent at their job they can’t hope to compete with the hundreds of people I have access to through my social networks.
The late Sun Microsystems taught us that the network is the computer. It's true: we still use non-networked computers for specialized tasks, but nobody wants one on their desk – it's just so useless compared to one that talks to the entire world. Boehm could have titled his essay The Network is the Employee. There are still tasks that people do in isolation, but the ability to contact a network of peers and experts makes the difference in my job, and many others.
Alone togetherThe lone computer programmer in a small business has thousands of colleagues on Stack Overflow, Reddit, and so on. It's a messy way to find answers, but it's sure better than the days when your only choice was to call tech support – or smack the box with your fist, whichever seemed more useful. I can't begin to list all the problems I've solved and things I've learned by Googling for others with experience, and getting help from a different expert for every problem.
Decades before the web, computer geeks had virtual communities on mailing lists, Usenet, and IRC. Now every job in the world has its corresponding forum. Even the night clerk at the gas station has Not Always Right.
Teaching has long been a solitary profession. Despite working in a crowded classroom, teachers are isolated; they rarely have colleagues observing or participating directly in their work. This has such an impact that teachers are sometimes trained in meditation or reflection techniques, to make up for the lack of external feedback. So I'm curious what happens when teachers start to work together remotely the way programmers do.
You will be assimilatedBoehm's essay also reminded me of a vague sci-fi idea I've been kicking around: the first group minds will evolve from the intersection of Mechanical Turk, virtual assistants, social networking, and augmented reality.
Starting around the 1990s, it was possible to instantly "know" any fact that was published online. Since then, we've increased the amount of content online, our tools for searching it, and ways of connecting to the network. Today we have instant access to almost any published knowledge, anywhere.
There are more people on the net too, and more ways to find and talk to them. Most of us can contact dozens of friends at any given moment, plus friends-of-friends, co-workers, fellow members of communites like Hacker News or MetaFilter, and also complete strangers. Along with raw facts, we have access to vast amounts of human judgement, experience, and skill.
One product of this is the "virtual assistant," who provides a service that was once exclusive to high-powered executives. Now personal assistants can work remotely (often overseas), spread costs by serving many masters, and leverage the internet superpowers listed above. Their services are mostly targeted at small business owners and the Tim Ferriss crowd, but I'm sure someone soon will market virtual assistance to all sorts of other creative workers, teachers, even stay-at-home parents.
So, how long before I can touch a button to let a remote assistant see what I'm seeing in real-time and help me make transportation plans, translate foreign signs and speech, look up emails related to whatever I'm doing or thinking, or even advise me on what to say? Some of these queries will go to my circle of friends, others to the general public, and some to a personal assistant who is paid well to keep up with my specific needs. And that assistant of course will subcontract portions of each job to computer programs, legions of cheap anonymous Turkers, or his or her own network of helpers. At that point, I'm augmenting my own perception, memory, and judgement with a whole network of brains that I carry around, ready to engage with any situation I meet.
If nothing else, I hope someone writes a good sci-fi thriller story in which a rogue virtual assistant manipulates the actions of unsuspecting clients, leading them to some unseen end.
It’s trivial to wrap your C implementation so that it can be called directly from C++. So translating your implementation from C to C++ provides no real benefit, it’s more like busy work. Plus C++ is such an ugly language that its hard to imagine that it won’t die out over the next decade, replaced by C#/Objective C/Java etc. C, on the other hand, has virtues of simplicity, clarity, portability, and performance that other programming languages have yet to supplant.
Vim keybindings extension for Chrome, based on vimlike-smooziee, but much more features.
Urbit: an operating function
[patch] dns API docs are out of date
I'm working on some ideas for finance or news software that deliberately updates infrequently, so it doesn't reward me for reloading it constantly. I came up with the name "microhertz" to describe the idea. (1 microhertz ≈ once every eleven and a half days.)
As usual when I think of a project name, I did some DNS searches. Unfortunately "microhertz.com" is not available (but "microhertz.org" is). Then I went off on a tangent and got curious about which other SI units are available as domain names.
This was the perfect opportunity to try node.js so I could use its asynchronous DNS library to run dozens of lookups in parallel. I grabbed a list of units and prefixes from NIST and wrote the following script:
var dns = require("dns"), sys = require('sys');
var prefixes = ["yotta", "zetta", "exa", "peta", "tera", "giga", "mega",
"kilo", "hecto", "deka", "deci", "centi", "milli", "micro", "nano",
"pico", "femto", "atto", "zepto", "yocto"];
var units = ["meter", "gram", "second", "ampere", "kelvin", "mole",
"candela", "radian", "steradian", "hertz", "newton", "pascal", "joule",
"watt", "colomb", "volt", "farad", "ohm", "siemens", "weber", "henry",
"lumen", "lux", "becquerel", "gray", "sievert", "katal"];
for (var i=0; i<prefixes.length; i++) {
for (var j=0; j<units.length; j++) {
checkAvailable(prefixes[i] + units[j] + ".com", sys.puts);
}
}
function checkAvailable(name, callback) {
dns.resolve4(name).addErrback(function(e) {
if (e.errno == dns.NXDOMAIN) callback(name);
})
}
Out of 540 possible .com names, I found 376 that are available (and 10 more that produced temporary DNS errors, which I haven't investigated). Here are a few interesting ones, with some commentary:
To get the complete list, just copy the script above to a file, and run it
like this: node listnames.js
Along the way I discovered that the API documentation for Node's dns module
was out-of-date. This is fixed in my GitHub fork, and I've sent a pull
request to the author Ryan Dahl.
I keep almost all of my notes and to-do lists in plain text files, so I can
edit and search them with Vim, grep, and other standard Unix tools. I often
indent lines in these files to create a simple outline structure, and use the
autoindent and foldmethod=indent options to make Vim into a simple
outliner.
To get useful output when searching through these outline-structured files, I
wrote a simple grep replacement. Given a text file with a Python-style
indentation structure, ogrep searches the file for a regular expression. It
prints matching lines, with their "parent" lines as context. For example, if
input.txt looks like this:
2009-01-01
New Year's Day!
No work today.
Visit with family.
2009-01-02
Grocery store and library.
2009-01-03
Stay home.
2009-01-04
Back to work.
Remember to set an alarm.
then ogrep work input.txt will produce the following output:
2009-01-01
New Year's Day!
No work today.
2009-01-04
Back to work.
You can download ogrep from the outline-grep repository on GitHub, or just read the literate Haskell file. The code is almost trivial (40 lines of code, plus imports and comments); I'm publishing it just in case anyone else has a use for it, and because some of my friends were curious about how I'm using Haskell. I've now written a few "real-world" Haskell programs (compleat was the first). I'm finding Haskell very well suited to such programs, though this particular one would be equally easy in a language like Perl, Python, or Ruby.
This is a one-off tool to fill a gap in my workflow; there are no configuration options or useful error messages. It would be fairly easy to extend it, though. For example, an option to include children (as well as parents) of matching lines might be handy. I recently realized that ogrep often works for searching through source code too, which may generate some more unexpected use cases.
Hi. Don’t forget to check the return value of malloc(). You might also enjoy this:
[iq0.com]
You should name it “Gazelle Forever”.
I kid
I vote to stick with C for another reason, integration with other languages. If your project is a nice C library, it’s vastly easier for people using other languages, C, Python, whatever, to interface with your library rather than having to re-implement it.
C++ is imo an awful choice for a library because it ends up restricting this to other things written in C++ (unless of course you wrap it up in a C abi but that is messy).
I’d prefer a base in C with a C++ wrapper as required.
A Restful front controller built on Hack.
Generate command-line completions from simple usage descriptions.
GCC 4.5 should get link-time optimization which will hopefully help binary size a lot. ultimately though, i still find myself reaching for C, for the ease of binding to other languages and the smaller size. i still like to keep my code C++ compatible and run it through the C++ compiler occasionally though, just because it’s much stricter on typing.
I have a hard time accepting the comment that C++ compilers aren’t very good at keeping things small. Which compiler have you used? I would agree that C++ nuances are complicated and that you need to read volumes of books to better grasp the details of optimization.
I’ve also been lazily contemplating upb for use in my projects. Since I tend to write my code in strange languages like D and Haskell, I’ll second the other’s comments! I would not mind having a thin C++ wrapper though.
great choice!
I am having a very difficult time deciding whether to go through with the C++ port of upb or to stay in C.
I’ve ported about one third of upb to C++, on a branch, to see how it would turn out. It was a ton of work. Here are my current observations:
I’m leaning towards sticking with C, for the following reasons:
I’ll try to take some of the lessons I learned from my partial C++ port to make the C more readable.
Have you checked with the security team about this yet? C++ might reduce the number of potentially unsafe manual casting that goes on in the code, but it also introduces a whole host of (often very subtle) security risks of its own… (For example, see: [chargen.matasano.com] and some of the links there) Depending on how much you’re planning on changing — and depending on your security folks — it might well be faster to tighten up the current codebase than to rewrite it. (It’s been a little while since I looked at the code, and I certainly haven’t tried to do a formal review, but I didn’t notice too many scary things going on in there…)
On a more selfish note — I too was eyeing upb for use in a (pure C) project of mine, and would just as soon see it stay C… So take my comments with a grain of salt.
-sq
assuming the generated C code is good, why not?
thanks!
@Alejandro: Wow, I didn’t realize there was already someone wanting to use upb on an embedded platform! I definitely want to keep supporting your use case if I can. I think the best way forward will be for me to get a C++ to C translator like Comeau C++. Then I can keep writing in C++, but ship C files that you can build with a plain C compiler. Do you think that will work for you?
@Josh: Hi! You mentioned my reason… I work on a CXX-free embedded controller. Moving from ASN.1 to protocol buffer at IPC (hate threads) level thanks to your awesome upb and I was starting to write the Lua bindings to finish the migration integrating the http front-end and some Lua based helpers running inside the controller. So porting ucb to C++ because will hit me… badly
… your implementation is great!. I understand porting it to C++ will save you some lines… but maybe some refactoring can help to reduce the complexity without having to switch to C++? you can do all the bindings you aim keeping it in C… with all it’s current benefits
@Alejandro: If you want to influence my decision, you should tell me why you don’t want me to.
I definitely have resisted, but practical reasons are demanding that I do. But maybe you are thinking of practical reasons that I’m missing.
please don’t :’(
This summer my friends Ben and Mike gave me grief about never releasing anything. Their criticism is definitely valid to some degree. I’ve been working on Gazelle for about two years now, and upb for almost one. Gazelle has had four releases in that time, but they have mostly focused on moving Gazelle to where I think it ought to be, as opposed to releasing something hacky that people can actually use now. There is a class of problems that Gazelle is useful for now, but it is pretty small in comparison to the amount of work I’ve put in.
I haven’t released upb at all yet, and my last message indicating I’m thinking of porting it to C++ will probably make skeptical readers think I’m moving farther away from a release rather than closer to one.
Since I agree that my progress doesn’t look too promising to someone observing from the outside, let me say where I think these projects currently are, where they’re going, and when they’re likely to release.
First of all, Gazelle is currently pushed on the stack until I have upb released. The reason is that I realized that Protocol Buffers are the answer to two big problems I was facing with Gazelle:
Since Gazelle is gated on upb, the question then is: when will upb release? Why hasn’t it released already?
A few months ago I was working on upb for 100% of my time at work. I had banked 20% time for a while, and I was also a bit burned out on my 80% project, so my manager very graciously gave me the liberty to work on upb for all of my working hours.
During that time upb made progress in several areas. It got some better benchmarks and tests, and I fleshed out the upb compiler so that it wasn’t dependent on the official Protocol Buffers compiler for bootstrapping. Maybe most importantly, I worked a lot on the in-memory message format to figure out how to make it work well with dynamic languages.
My goal during that time was to write a Python extension that a few initial internal-to-Google customers could use. The value proposition is that it would be API-compatible with what they were already using, but many times faster. I wrote said extension, which was incomplete (supported decoding only, not encoding), but looked complete enough to use for this case.
By this time I was approaching the amount of time I could reasonably ask from my manager at work, so I had to tie up the loose ends and get it into my initial customer’s hands. I put all the pieces together and tried it out, but then ran into a problem; I hadn’t realized that this initial customer was using an old deprecated feature of Protocol Buffers called MessageSet. There was no way I could support MessageSet without significant changes. I was defeated for the moment. I had to take a break for a few months and re-devote my time to my 80% project.
I mention this all just to illustrate that I do have actual customers that I am targeting, and I have had aggressive pushes to deliver something to those customers, but unfortunately my work wasn’t complete enough for them yet.
This brings us up to now. In the last week or two, I have made several strides, including executing on part of a design that will get me MessageSet support. I have also developed an interface for a “pick parser”, which lets you pull only a small subset of fields out of a protobuf. This will be a big win for use cases that only need a few fields from a very large proto, and I have a customer internal-to-Google who is very interested in this interface.
Meanwhile I’m very interested in trying to get the upb Python extension into AppEngine, because I think it could be a huge win there since users aren’t allowed to load custom Python extensions. This means that currently, people trying to use protocol buffers on AppEngine are limited to pure-Python extensions that are much slower than a C extension can be. But to get into AppEngine I will need to get a security audit, which is part of the reason I am leaning towards C++ at this point. I think C++ will make the code shorter and less gnarly (fewer casts), which should lead to easier verifiability. I converted one header file so far, and it got 38% smaller and much easier to read.
I hesitate to make schedule estimates, but my main purpose is to impress on my possibly-impatient audience that:
Yours,
Josh
I am on the verge of trying something I never thought I’d do. I’m considering porting upb to C++.
My reasons aren’t ideological, they are highly practical. Basically I am realizing that while object-oriented C is OK for a while, it’s very weak at inheritance. Inheritance in C involves a lot of casting, duplicated code and/or macros, and careful discipline. The main problems with this are:
Both of these problems make the code ultimately more difficult to audit for security. And getting upb audited for security is something I plan to do very soon.
I am coming to believe that porting to C++ would make upb smaller (in lines of code) and easier for verify for security. However, there are a few major disadvantages that are giving me pause:
When I look at the downsides though, they don’t seem to pertain to my initial goals of making upb useful for Python, Lua, Ruby, etc. extensions, and for use inside Google. Being useful for really restricted embedded systems is a far-off use case. So it’s sounding like porting to C++ is the right thing to do.
I hope it significantly reduces the line count, as I expect it will. That will make me feel better about giving up the minimalism of C. I will definitely be compiling with -fno-exceptions -fno-rtti -fvisibility-inlines-hidden on gcc. I also won’t be using any of the C++ standard library (not even <string>).
As mentioned previously – you’ll get SSA ‘for free’. Write everything in load/store manner, then run opt -std-compile-opts and it will try to optimize your bytecode hard (e.g. convert to ssa form with phi’s, etc).
Hi Josh,
I used docbook for a couple of cafepy articles and even though it tends to be fairly verbose because of the XML tags, I would use it again. It has some built in features (such as callouts: [www.docbook.org], multiple output formats from same source, etc.) which make it attractive.
All the XML tags do slow you down so for shorter articles where I don’t need all these features, I stick with Markdown or reST based stuff. It should be possible to convert from any format to any other so I’m not worried about being locked-in. Also look at Sphinx ([sphinx.pocoo.org]) – it is reST based and was written for the Python language documentation. Looks like it can even do multiple output formats.
Sinatra-inspired JavaScript node.js web development framework -- insanely fast, insanely sexy
Markdown Vim Mode
It's been a while since I posted a personal update here. Sarah and Eleanor and I have been in a nice routine for a while now, so it usually feels like there's nothing new to report. Unlike last year, we haven't moved or changed jobs (or changed jobs again). Eleanor is now in the long steady climb through toddlerhood, so her milestones and breakthroughs are not as frequent as they were. But over a year, things do add up.
Eleanor turned three last month. This fall she started going to a nearby co-op preschool, four hours a week. She chatters constantly, and likes singing and rhyming. She has started drawing specific things, like people, flowers, and cookies. She likes taking baths but not having her hair washed. For Halloween Eleanor was a monkey and I was the man with the yellow hat. (Sarah was a firefighter, which also fit the Curious George theme.) She enjoys sharing her Halloween candy with us.
Eleanor still spends one day a week at my mom's house while Sarah and I are at work. When Sarah's school started this September, I moved to a four-day work week so I can stay home with Eleanor on Sarah's other work day. This continues a pattern: When Eleanor was born, I worked four-day weeks for a few months at Amazon after Sarah went back to work; before that at GoTech I worked four-day weeks to spend more time on side projects.
I'm still working at Kiha and we're still in stealth mode. The work itself has changed quite a bit, not surprisingly. I feel much more productive than just one or two years ago, thanks to improved sleep at home and a focus on habit- and skill-building at work. I've also started doing more studying and programming outside of work again. My recent side project Compleat got a nice reception on Hacker News and Reddit a couple of weeks ago. I did put a lot of work into that write-up, hoping for more people to read and share it.
That was also the first post at my new weblog. I'll post programming-related articles there instead of LiveJournal or my old Advogato diary, so please subscribe if you want to know what I'm working on. Or if you are subscribed to Planet Matt then you'll see my blog posts along with all my other feeds.
Google has not yet released most of the Android 2.0 source code, but they did publish source for a very small number of components, including a WebKit snapshot. I was excited to see that the snapshot includes Google's V8 virtual machine. (Previous Android releases used Safari's JavaScriptCore/"SquirrelFish Extreme" VM.) But without the rest of the source tree, there was no way to build and run this on a real Android phone. The SDK includes a binary image that runs only in the qemu-based emulator.
Today I got to try out a Motorola Droid. Here's how its browser compares to Android 1.6 on my HTC Dream (Android Dev Phone / T-Mobile G1) in the V8 Benchmark Suite:
| Test | Dream (1.6) | Droid (2.0) | Change |
|---|---|---|---|
| Richards | 13.5 | 15.6 | 16% |
| DeltaBlue | 5.23 | 12.9 | 147% |
| Crypto | 13.2 | 10.9 | -17% |
| RayTrace | 10.9 | 80.1 | 635% |
| EarleyBoyer | 23.5 | 74.7 | 218% |
| RegExp | did not complete | 16.5 | – |
| Splay | did not complete | did not complete | – |
Some tests (Richards, Crypto) see little or no improvement, while others (DeltaBlue, RayTrace, EarleyBoyer) are dramatically faster. Just for comparison, let's run the same benchmark on Safari 4 (JavaScriptCore) and a Chromium 4 nightly build (V8) on a Mac Pro:
| Test | Safari 4 | Chromium 4 | Change |
|---|---|---|---|
| Richards | 4103 | 4640 | 13% |
| DeltaBlue | 3171 | 4418 | 39% |
| Crypto | 3331 | 3643 | 9% |
| RayTrace | 3509 | 6662 | 90% |
| EarleyBoyer | 4737 | 7643 | 61% |
| RegExp | 1268 | 1187 | -6% |
| Splay | 1198 | 7290 | 509% |
The precise ratios are different, but the same tests that showed the most improvement from Android 1.6 to 2.0 also show the most improvement from Safari to Chrome. Based on this plus the source code snapshot, I'm pretty sure that Android 2.0 is indeed using V8.
This is exciting news. It makes Droid the first shipping product I know that uses V8 on an ARM processor, although V8 has included an ARM JIT compiler for some time now. [Correction: Palm Pre was first; see the comments below.] For mobile web developers like me, it means we're one step closer to having desktop-quality rich web applications on low-power handheld devices.
Android still lags behind the iPhone in at least one important way for web developers: CSS animation. The iPhone (and Safari on the desktop) provides hardware acceleration for CSS transforms, like this falling leaves demo. On Android, CSS animation is done in software, making it much, much slower. (Even outside the browser, Android's Skia 2D graphics API lacks hardware acceleration. OpenGL is the only way to for Android developers to take advantage of the GPU.) Accelerated animation would really make it possible to write interactive web pages that match the smoothness and responsiveness of native apps.
Final thought: Although the Motorola Droid is still 100 times slower than Chromium on a Mac Pro, it's already faster at some benchmarks than IE8 or Firefox 2 on desktop hardware from just a few years ago.
Path completions for your shell that will let you navigate like lightning.
Compleat is an easy, declarative way to add smart tab completion for any command. It's written in Haskell but requires no programming knowledge. See the GitHub repository for a quick description, or read on for a complete explanation.
BackgroundI'm one of those programmers who loves to carefully tailor my development environment. I do nearly all of my work at the shell or in a text editor, and I've spent a dozen years learning and customizing them to work more quickly and easily.
Most experienced shell users know about programmable completion, which provides smart tab-completion for supported programs like ssh and git. You can also add your own completions for programs that aren't supported—but in my experience, most users never bother.
At Amazon, everyone used Zsh (which has a very powerful but especially baroque completion system) and shared the completion scripts they wrote for our myriad internal tools. Now that I'm in a startup with few other command line die-hards, I'm on my own when it comes to extending my shell.
So I read the fine manual and started writing completions. You can see the
script I made for three commands from the Google Android SDK. It's 200
lines of shell code, and fairly straightforward if you happen to be familiar
with the Bash completion API. But as I cranked out more and more case
statements, I felt there must be a better way...
It's not hard to describe the usage of a typical command-line program.
There's even a semi-standard format for it, used in man pages and generated by
libraries like AutoOpt. For example, here's the usage for android, one
of the SDK commands supported by my script:
android [--silent | --verbose]
( list [avd|target]
| create avd ( --target <target> | --name <name> | --skin <name>
| --path <file> | --sdcard <file> | --force ) ...
| move avd (--name <avd> | --rename <new> | --path <file>) ...
| (delete|update) avd --name <avd>
| create project ( (--package|--name|--activity|--path) <val>
| --target <target> ) ...
| update project ((--name|--path) <val> | --target <target>) ...
| update adb )
My idea: What if you could teach the shell to complete a program's arguments just by writing a usage description like this one?
The SolutionWith Compleat, you can add completion for any command just by writing a
usage description and saving it in a configuration folder. The ten-line
description of the android command above generates the same results as my
76-line bash function, and it's so much easier to write and understand!
The syntax should be familiar to long-time Unix users. Optional arguments are enclosed in square brackets; alternate choices are separated by vertical pipes. An ellipsis following an item means it may be repeated, and parentheses group several items into one. Words in angle brackets are parameters for the user to fill in.
Let's look at some more features of the usage format. For programs with complicated arguments, it can be useful to break them down further. You can place alternate usages on their own lines separated by semicolons, like this:
android <opts> list [avd|target];
android <opts> move avd (--name <avd>|--rename <new>|--path <file>)...;
android <opts> (delete|update) avd --name <avd>;
...and so on. Rather than repeat the common options on every line, I used a
parameter named "opts". I can define that parameter to be a sub-pattern,
which will be used wherever <opts> appears:
opts = [ --silent | --verbose ];
For parameters whose values are not fixed but can be computed by another
program, we use a ! symbol followed by a shell command to generate
completions, like this:
avd = ! android list avd | grep 'Name:' | cut -f2 -d: ;
target = ! android list target | grep '^id:'| cut -f2 -d' ' ;
Any parameter without a definition will use the shell's built-in completion rules, which suggest matching filenames by default.
The source code is on GitHub. I've been using it for just a week and I'm now writing new usage files for myself almost every day. The README file has more details about the usage syntax, and instructions for installing the software. Give it a try, and please send in any usage files that you want to share! (Questions, bug reports, or patches are also welcome.)
Future WorkFor the next release of Compleat, I would like to make installation easier by
providing better packaging and pre-compiled binaries; support zsh and other
non-bash shells; and write better documentation.
In the long term, I'm thinking about replacing the usage file interpreter with a compiler. The compiler would translate the usage file into shell code, or perhaps another language like C or Haskell. This would potentially improve performance (although speed isn't an issue right now on my development box), and make it easy for usage files to include logic written in the target language. Another idea for the future: What if option-parsing libraries like AutoOpt or the Ruby/Perl/Python equivalents generated completion scripts for every program you wrote?
Final ThoughtsI realized recently that some things I do are so specialized that my parents and non-programmer friends will probably never get them. For example, Compleat is a program to generate programs to help you… run programs? Sigh. Well, maybe someone out there will appreciate it.
Compleat was my weekends/evenings/bus-rides project for the last few weeks (as you can see in the GitHub punch card), and my most fun side project in quite a while. It's the first "real" program I've written in Haskell, though I've been experimenting with the language for a while. Now that I'm comfortable with it, I find that Haskell's particular combination of features works just right to enable quick exploratory programming, while giving a high level of confidence in the behavior of the resulting program. Compleat 1.0 is just 160 lines of Haskell, excluding comments and imports. Every module was completely rewritten at least once as I compared different approaches. (This is much less daunting when the code in question is only a couple dozen lines.) I don't think this particular program would have been quite as easy to write—at least for me—in any of the other platforms I know (including Ruby, Python, Scheme, and C).
I had the idea for Compleat more than a year ago, but at the time I did not know how to implement it easily. I quickly realized that what I wanted to write was a specialized parser generator, and a domain-specific language to go with it. Unfortunately I never took a compiler-design class in school, and had forgotten most of what I learned in my programming languages course. So I began studying parsing algorithms and language implementation, with Compleat as my ultimate goal.
My good friend Josh and his Gazelle parser generator helped inspire me and point me toward other existing work. Compleat actually contains three parsers. The usage file parser and the input line tokenizer are built on the excellent Parsec library. The usage file is then translated into a parser that's built with my own simple set of parser combinators, which were inspired both by Parsec and by the original Monadic Parser Combinators paper by Graham Hutton and Erik Meijer. The simple evaluator for the usage DSL applies what I learned from Jonathan Tang's Write Yourself a Scheme in 48 Hours. And of course Real World Haskell was an essential resource for both the nuts and bolts and the design philosophy of Haskell.
So besides producing a tool that will be useful to me and hopefully others, I also filled in a gap in my CS education, learned some great new languages and tools, and kindled an interest in several new (to me) research areas. It has also renewed my belief in the importance of "academic" knowledge to real engineering problems. I've already come across at least one problem in my day job that I was able to solve faster by implementing a simple parser than I would have a year ago by fumbling with regexes. And I'll be even happier if this inspires some friends or strangers to take a closer look at Haskell, Parsec, or any problem they've thought about and didn't know enough to solve. Yet.
This site is powered by Jekyll and based on styles and markup by Tom Preston Werner. Comments are run by Disqus and hosting is by Dreamhost.
Before starting this site, I had an Advogato diary for writing about software. I also have a personal journal (mostly interesting to my friends and family).
Last year I wrote about my backup strategy. Here are some new developments.
I'm still using rsnapshot for local backups, and duplicity for encrypted remote backups. I switched to the S3 backend for duplicity. (My meager data costs less than $0.10 per month to upload and store there.)
Things got a little more complicated when Sarah and I each got our own laptops. (Before that we shared a desktop computer.) I still use a git repository for my own home directory, but I wanted something simpler for files we share. And I want those files backed up even when we're away from our home network and storage server. Dropbox is the perfect solution. It syncs folders over the network, integrates perfectly into the Mac/Windows/Gnome desktop, and can share files publicly over the web or privately between Dropbox users. Most importantly, it has a simple way to view and restore previous versions of files that change. And (for my meager needs again) it's free!
The one part of my system that wasn't automated before was backing up data from various online apps. I wrote, "There's a business opportunity here for someone who can make this easier." In fact, I was seriously thinking about creating that business myself, but Backupify did it for me. They back up my data from Flickr, Twitter, Google Docs, Delicious, and GMail to my own S3 account. (They'll handle the S3 stuff if you don't have an account, but you can save money by registering your own.)
The reason I'm writing this now is that it's the last day of their blogging contest, so I can get some free stuff for telling them the feature or service I wish they'd support. Well, my number one service is LiveJournal (where I'm publishing this post). But more importantly, my number one feature would potentially allow them to support many services, even ones they've never seen.
Right now, each major web app has a different custom API for getting access to data. This means that Backupify doesn't support brand new or obscure services, because they need to write different code for each one. What I'd like to see is a standard for how services can use standard protocols like OAuth and the Atom Publishing Protocol to support export and backup of users' data. Then, as a web developer, I could support that spec and know that my users could employ existing software like Backupify to keep their data safe. There's an obvious chicken-and-egg problem here, but if Backupify could partner with a group like Google's Data Liberation Front to champion a new standard, then they could get some real momentum. (Google is a good candidate since they already support OAuth, and their GData API is based on APP.)

This Thursday, Oct 15, Metrix Create: Space will open its doors in Seattle (at 623A Broadway East). It’s hackerspace meets an indie coffee house. They’ll have tools and equipment for building projects, 3D fabbing machines, classes on various types of high-tech makery, coffee and snacks. They even have a vending machine that’ll dispense Sun Chips, M&Ms, Clif Bars, and Arduinos, breadboards, jumper wires, etc.
I live in the wrong city.
I live in the right city. And only a few blocks from this spot.
Command-line completion for Ruby-related commands under Bash: rake, gem, rails, ruby, jruby
Vastly improved vim's javascript indentation.
Have you heard of easy-git? It’s a porcelain with a much simpler interface. ([www.gnome.org])
Mercurial has the same “You can’t merge upstream changes into your local, uncommitted modifications” issue, and it’s also annoying as hell. I have yet to find a workflow that doesn’t piss me off. If git offered one, I’d almost certainly switch.
Stashing is not a great solution, either – unless un-stashing lets you do a nice 3-way merge – with all the same flexibility as a normal merge. Mercurial’s equivalent (shelve) sucks, because you end up running some external tool like patch – which leaves little unapplied patches around when a patch fails, instead of the traditional merge – which lets you see both versions of the code.
oh my, more people appying their old habits to new tools, there’s reasons for all of these things — and good ones
get over it
It looks like someone implemented my exact suggestion from comment #3 above:
Like awk, but for JSON.
I will never apologize for the United States — I don't care what the facts are... I'm not an apologize-for-America kind of guy.
I'm coming before you tonight about the Korean airline massacre, the attack by the Soviet Union against 269 innocent men, women, and children aboard an unarmed Korean passenger plane. This crime against humanity must never be forgotten, here or throughout the world.
[...]
This was the Soviet Union against the world and the moral precepts which guide human relations among people everywhere. It was an act of barbarism, born of a society which wantonly disregards individual rights and the value of human life and seeks constantly to expand and dominate other nations. --Address to the Nation on the Soviet Attack on a Korean Civilian Airliner.