Feeds

100971 items (93180 unread) in 19 feeds

Friends Friends
Build Build
Heads Heads
News News
fun fun

Items by josh

2 Star

Comments for Josh the Outspoken

2 Star

Josh

2 Star

Josh

2 Star

Comments for Josh the Outspoken

2 Star

Comments for Josh the Outspoken

2 Star

Comments for Josh the Outspoken

2 Star

Comments for Josh the Outspoken

2 Star

Josh

2 Star

Comments for Josh the Outspoken

2 Star

Josh

2 Star

Comments for Josh the Outspoken

2 Star

Comments for Josh the Outspoken

2 Star

Comments for Josh the Outspoken

2 Star

Comments for Josh the Outspoken

Josh

2 Star

Comments for Josh the Outspoken

2 Star

Josh

  • Permalink for 'Josh/2008/11/16/habes___2008_11_16T09_34_00'

    habes @ 2008-11-16T09:34:00

    Posted: November 16th, 2008, 10:45am MST by Josh
    I went to an anti-prop-8 rally and parade today, and it was so awesome. It was just full of love and positive energy. For a bunch of people who just got a royal "fuck you" from the state of California, it was a surprisingly upbeat event.

    The weather was beautiful, and from what I hear was even more stunning in San Francisco, which also had a rally. God clearly voted today, but unfortunately s/he is not a citizen of the state of California.

    I climbed many trees and other city habitat to get good photos (feeling a bit like Zaccheus in the process), temporarily putting myself in more danger of falling than I normally would. But it all turned out ok.

    From Marriage Equality March


    From Marriage Equality March


    From Marriage Equality March


    From Marriage Equality March


    My favorite sign:

    From Marriage Equality March


    The only people I could find who weren't having a good time:

    From Marriage Equality March
  • Permalink for 'Josh/2008/10/23/habes___2008_10_23T09_32_00'

    habes @ 2008-10-23T09:32:00

    Posted: October 23rd, 2008, 10:37am MDT by Josh
    Check out my pictures from Phil and Mary's wedding. They turned out awesome, thanks in no small part to this snazzy $1500 lens that I rented for $30. I never could have gotten those low-light portrait shots without it.
  • Permalink for 'Josh/2008/10/09/habes___2008_10_09T10_54_00'

    habes @ 2008-10-09T10:54:00

    Posted: October 9th, 2008, 12:25pm MDT by Josh
    Whoa this is getting ugly from the Republican side:
    At rallies this week in Florida, crowds jeered and taunted members of the news media. One man hurled a racial epithet at a black television crewman, telling him, "Sit down, boy".

    Yesterday, for the second time in three days, a speaker at a McCain rally in Pennsylvania referred to Obama's middle name, Hussein, in an effort to cast doubt in his religion and background. Obama is a Christian.

    At the same rally, shouts of "terrorist" and "liar" could be heard following references to the Democratic candidate. On Saturday, McCain running mate Sarah Palin sought to link Obama to Ayers. "Kill him!" one man in the crowd shouted, not specifying who.

    --The Guardian
    Holy crap, it's the two minutes hate! It is sad and truly frightening that this is what the Republican party has come to. They have turned themselves into the party of xenophobia and hate.

    They are spreading insinuation about Obama's link to a 1960s "domestic terrorist" without recognizing that they are breeding a new brand of extremism themselves! If guys at their rallies are shouting "terrorist," "liar," and "kill him," how far are they from actually doing something violent? The McCain campaign is taking a scorched earth strategy here, which is dishonorable and wrong.

    One lesson I take from this is that no matter how decent a guy is when he's nominated to the Republican ticket, he'll never survive the experience with his decency intact. Because winning on the Republican side requires "mobilizing the base," and the base demands red meat. The base thrives on culture wars. The base has this need to find an "other" and demonize them.

    It's a dark moment on the Republican side of the aisle.
  • Permalink for 'Josh/2008/10/05/habes___2008_10_05T12_56_00'

    habes @ 2008-10-05T12:56:00

    Posted: October 5th, 2008, 2:10pm MDT by Josh
    Dear Sarah Palin,

    You want to talk about questionable associations? How about addressing a political party whose founder openly hates America, and who died buying plastic explosives on the black market?

  • Permalink for 'Josh/2008/10/03/video_candy'

    video candy

    Posted: October 3rd, 2008, 1:07am MDT by Josh
    Stop winking at me Sarah!!



    Yep, you really said all those things:


    Why does Senator Obama believe that McCain and Bush agree on almost everything?


    Finally, FOXNews attacks Obama with the same attacks they used on Kerry:
  • Permalink for 'Josh/2008/10/01/habes___2008_10_01T10_26_00'

    habes @ 2008-10-01T10:26:00

    Posted: October 1st, 2008, 11:36am MDT by Josh
    Check out this really interesting video from the New York Times about the debating styles of Sarah Palin and Joe Biden. It shows some highlights of their previous debates (Palin for Governor of Alaska, Biden in the primaries).

    One thing to take away from this is that despite her embarrassing performance over the last week or so, Sarah Palin has in the past been a formidable opponent who has a maddening way of being being cutting and aggressive in a way that viewers love. The LA Times article Underestimate Palin at your own risk, former rivals say comes to the same conclusion. That article ends with:
    "When you try to prove she doesn't know anything, you lose, because audiences are enraptured by her," Halcro said. "And her biting comments give you a sense of how competitive she is. Anybody who doesn't take her seriously does so at their peril."
    So playing the expectations game that the campaigns love to play, I'd say that we should expect to see an aggressive and in-control Sarah Palin, judging by her previous performances.
  • Permalink for 'Josh/2008/09/24/habes___2008_09_24T09_32_00'

    habes @ 2008-09-24T09:32:00

    Posted: September 24th, 2008, 1:04pm MDT by Josh
    Not quite two weeks ago I stepped on an airplane that would eventually (after two layovers) take me to Montpellier, France, a town on France's Mediterranean coast halfway between Spain and Italy. The occasion was the Renaissance Consort Workshop, a week-long course of singing Renaissance Polyphony one on a part. The course is hosted in a small town in the south of France called Roujan, which is not particularly close to anything, but is full of charm and is not at all catered to tourists. This part of France is solidly wine country, which is abundantly clear not only from the rows of grapes you see everywhere, but also the tractors that are frequently driving through the streets pulling carts full of harvested grapes behind them.



    Though the picture doesn't show it, the building where this tractor is depositing its grapes is the same building where the finished wine is being sold. This single building quite literally takes grapes out of a tractor's cart on one end and sells them as bottles of wine on the other!

    The house in Roujan where the course is hosted is called La Maison Verte, and it is owned by Anne and Fran who use it to host courses like the one I attended. La Maison Verte doesn't particularly stand out from the rest of Roujan when viewed from the street, but once you get inside you're transported into a sort of fairy tale.








    So the course is essentially a chance to spend a week singing some of the world's most beautiful music in a fairy tale setting with world-class musicians as your tutors.

    The tutors this year were Robert Hollingworth, Erik Van Nevel, and Francis Steele. Robert Hollingworth looked terribly serious to me in his publicity photo that I saw before I arrived, but within two seconds of meeting him I realized he was a quite charismatic and personable chap with a boyish charm that instantly puts people at ease. In every situation he had something amusing or insightful to say (and often both). I didn't get to work with him in sessions as much as the other tutors, but I did sing a Monteverdi madrigal in one of his sessions, which inspired me to give them another look -- they had never resonated with me before. Robert also led warm-ups, and really focused on getting rid of tension in our tongue and throat.

    Erik Van Nevel was full of passion; I'll never forget his coaching on "Christe, adoramus te" by Monteverdi. It is a relatively short piece that on the page doesn't look particularly dramatic, but Erik's interpretation of the piece had an extreme amount of emotion: one moment humble, the next desperate, the next pained. I am quite confident that even in the end we gave him only a fraction of the emotion he was asking for. Erik also offered insight about how to make sure our interpretations were stylistically appropriate to the music. Erik pointed out times when we were doing things that were musical, but using "tools" that were Baroque rather than Renaissance.

    And finally Francis Steele, who brings a deep reverence to the music and never-ceasing attention to the text. I have to admit that I've never given the text the care or attention it deserves in this music. I shame myself to say so, but in the past I've put forth the opinion that Latin serves primarily as a way to put a variety of vowels to polyphony. But Fran shows us all first-hand how much the music can improve when when you are sensitive to the text. He also demonstrated that resisting the urge to let the music "take off" volume-wise can make the music clearer and ultimately more enjoyable to sing.

    We all arrived on Saturday afternoon, a beautiful sunny day as nearly every day was. Anne directed us to our rooms so we could deposit our luggage, and I spent a while just wandering around with my jaw on the floor. My room was quite dark when I walked in, giving me the opportunity to bring it to life by opening the doors onto my own private balcony that looked out over the town of Roujan.




    We had our first singing session that night. We started with introductions, and I was somewhat surprised that the vast majority of the 17 attendees had attended the course before. I could already see that this was going to be a great experience, but it was still surprising that so many people were motivated enough to make the time and financial investment year after year. Nearly everyone was English, though some had been living in France for extended amounts of time. I was one of three Americans, and the only one currently living in America -- my trip from Seattle was by far the longest anyone had come for the course. However, to my surprise, the cooks (the exquisite, AMAZING cooks -- I'll get to that later) were originally from Seattle also.

    In our first session Fran led us all in Diliges Dominum by Byrd, an eight part piece of tall chords that move in a slow but steady half-note (or "minim" as I would concede to the Brits) pulse. I almost immediately acted in an instinct to add intensity to the piece by really leaning into rising lines, but Fran stopped us and said "don't try to come out of the texture." He pointed us to the text for the piece, which is "Thou shalt love the Lord thy God from thy whole heart, and with thy whole soul, and with thy whole mind. Thou shalt love thy neighbour as thyself." He argued that it is a corporal, corporate text that demands total togetherness. I should mention that my recounting of Fran's remarks cannot do justice to the way that he delivers them. He speaks of the music and the texts to which they are set with such sensitivity and humility that it inspires us to sing with an entirely different attitude toward the music than we might have walked through the door with.

    For me that opening session set a tone for the week: this was not going to be a week of powering through big pieces, it was going to be about sensitivity and awareness. These were lessons that were probably overdue for me as a musician: I've always been primarily motivated by melody, harmony, sonority, and musical brilliance, and haven't given text or meaning the attention it deserves. So this week was going to expand my horizons.

    After that session we had our first supper, made by our resident cooks who were a couple originally from Seattle but now living in France. I have to say that I was totally unprepared for what an effect the food would have on me. I eat out a lot and I definitely enjoy food, but food has never been a motivating factor in my life. At La Maison Verte I found myself eagerly awaiting every course, reveling in every new flavor. The sorts of things that foodies always say suddenly started making sense. I ate things that on paper I never would have imagined liking, like tomato and peach soup. The food was exquisite.

    The next several days followed a predictable format. There were three sessions of singing per day: two hours in the morning, one and a half hours in the early afternoon, and another hour and a half in the early evening. In each session we were divided into three groups among three rooms and three tutors. Everything was one on a part. Fran would lay out the schedule for these group sessions every morning, attempting to mix us up among all the singers, all the tutors, and all the rooms.

    The rehearsal spaces were a sitting room (the room where we ate breakfast), a small art gallery, and La Maison Verte's main rehearsal room. I had gone with some worry that I would find the rehearsal spaces frustratingly dry in their acoustic. What I found is that though they were certainly not reverberant like a church is, I never found myself frustrated about them. They didn't force one to work disproportionately hard. And they made our opportunity to sing in the church at the end of the week a great reward.

    Each evening at 6pm we would have a brief "sharing" of three or so of the nine pieces that had been sung that day. This was a chance to see what everyone else had been up to. I also much appreciated the repeated opportunities to "perform," since I have performance-related tension problems that manifest themselves even in the most informal sort of "performance." All in all I found that the course's itinerary worked very well. I even conceded once our day off came that it was good and necessary to let our voices rest for a day, even though I recoiled in horror when I first learned that we would have a whole day in the middle of the week without any scheduled singing!

    There is one part of the course's format that doesn't make it onto the itinerary but is nevertheless an absolutely essential part of the experience. You see, the last thing on the itinerary every day is supper at 8pm. What is not captured are the hours of talking, drunken singing, and general silliness that extend well into the morning, for as long as anyone will stay up. And one thing you can be sure of is that Fran will be the last one to retire, and until the very end will share with you a mix of silly banter and profound moments that it's hard to describe unless you know the guy. One night when there were only four of us remaining he whipped out a set of part books written in old notation, and we attempted to read through them despite the strange notes and unfamiliar clefs that can change from line to line! The next morning Fran tacked an extra event on the day's schedule: "12am: Josh lectures on singing from sixteenth century part books (with full illustrations by Josh)," which was especially devious given that I was clearly the most clueless about anything having to do with part book singing. But naturally by 12am that night I was suitably drunk that I heartily delivered a pseudo-intellectual treatise on the subject.

    Wednesday was our day off, and I was lucky enough to be able to tag along with a fellow participant who was taking a drive through the French countryside. She had an interest in Geology, and had marked on her map several places that had some geological significance. We stopped in several beautiful villages (again, none of which had any tourist bent whatsoever) and walked a ways up the Gorges d'Héric. On one of the more rural roads our way was blocked at one point by a cow and later by an entire herd of sheep that were being moved from one field to another. As I mentioned, I was initially disappointed that we had a day off, but in the end I was quite happy for the opportunity to see some of the surrounding area.

    The week culminated in a concert on Friday night at the town church, which was only a short walk from La Maison Verte. Despite the fact that we had been sharing every day, the concert was a great opportunity to have a bit more formal and prepared presentation of our music, and to sing for other people. The acoustic also felt very rewarding after the week of singing in smaller spaces.

    My week at La Maison Verte was fantastic, and I hope I get a chance to go back. The people were lovely, the tutors fabulous, the food exquisite, and the music deeply satisfying.
  • Permalink for 'Josh/2008/09/02/habes___2008_09_02T02_37_00'

    habes @ 2008-09-02T02:37:00

    Posted: September 2nd, 2008, 4:25am MDT by Josh
    Any fans of The Usual Suspects? I love this movie, but I'm always confounded by trying to piece together the plot given the ending.

    Ok, so Verbal Kint is Keyzer Soze. But there are stil so many questions I have about what really happened.

    1. Why did Keyzer/Verbal even get caught after the ship blew? He's such a bad-ass that he killed everyone on the ship and then blew it up, but he couldn't figure out a way to escape before the authorities arrived? Seems implausible.

    2. So Arturro Marquez ("The one guy who can identify Keyzer Soze") was the target of the whole operation on the ship. The story is that he was a smuggler who knew all about Keyzer Soze. He was arrested for drug trafficking in New York, but escaped to California where he was captured again. In California they were setting up extradition. He didn't want to go to jail so he named 50 names (including Keyzer Soze), but then escaped and fell into the hands of his own people, who decided to sell him into the hands of some Hungarians who wanted to stick it to Keyzer Soze.

    But his story doesn't really make sense. If he was going to be extradited (presumably to Argentina) then why is he doing a deal with the American authorities? Doesn't seem like that deal would do him much good once the extradition went through.

    3. Who killed Edie and why? Kujan thinks it was Keaton, but Keaton pretty clearly died on the boat. Verbal/Keyzer doesn't seem to have any reason to do away with her -- everything she knows about Arturro would be public record.

    4. How did the Hungarian who survived with burns know that the person he saw was Keyzer Soze?

    5. Who brought Edie onto the extradition case? It must have been someone related to Keyzer Soze (even if "Kobayashi" was not his real name) -- it couldn't have just been a coincidence. Whoever it was, can't Kujan use that connection to go straight to Soze?

    I'm sure the real reason is "relax, it's a movie." But I want everything to make sense in the end!
  • Permalink for 'Josh/2008/08/15/habes___2008_08_15T16_17_00'

    habes @ 2008-08-15T16:17:00

    Posted: August 15th, 2008, 5:18pm MDT by Josh
    To add to the thoughts about the Georgia/Russia thing, here's a video on YouTube where FoxNews interviews South Ossetians who take over the time slot by condemning Georgia and thanking the Russians.

    12-year-old girl: "but before I say anything else I just want to say that I was running from Georgian troops bombing our city, not Russian troops. I want to say "thank you" to the Russian troops that were helping us out."

    Not saying this thing is simple or one-sided, but there definitely is another side to this.

    Sarah, did you get in touch with your Georgian friend?
  • Permalink for 'Josh/2008/08/13/habes___2008_08_13T13_02_00'

    habes @ 2008-08-13T13:02:00

    Posted: August 13th, 2008, 2:24pm MDT by Josh
    What the heck is going on with our American response to this Russia/Georgia thing? Now maybe I'm just getting bad/biased information from the Wikipedia article on the South Ossetia war, but to me the situation is:

    * South Ossetia is a part of Georgia that is ethnically different from the rest of Georgia.
    * South Ossetia has been de facto independent from Georgia since the early 90s (though not internationally recognized)
    * South Ossetia overwhelmingly favors independence from Georgia, as evidenced by a 2006 referendum in which 99% of registered voters voted for independence with voter turnout at 95.2%.
    * after escalating tensions and small-scale fighting, Georgia launches a full-scale attack against the South Ossetia capital in the middle of the night.
    * Russia retaliates by pushing back the Georgian invasion and attacking Georgia.

    This is a complicated situation, but all we as Americans are offering in our media and from our leadership is a simplistic "evil Russia invades democratic Georgia" story. Our leaders and our presidential candidates give tough talk against Russia, without acknowledging the fact that Russia was retaliating against a Georgian invasion and attack against a city.

    At first I thought I was just crazy and missing something major, but apparently apparently I'm not the only one who feels this way.

    Geez, the last thing we need is to introduce more tension in our relationship with Russia. How would we react if Cuba or Venezuela invaded the Bahamas? You'd better believe that we'd not only drive them out, we'd launch strikes against military targets in their home countries. And how cynical would we be at that point if Russia spun it as "US aggression"?

Comments for Josh the Outspoken

2 Star

  • Permalink for '2_Star/2008/08/11/A_syntactic_dilemma__and_an_intro_to_Gazelle%e2%80%99s_ambiguity_resolution_'

    A syntactic dilemma (and an intro to Gazelle’s ambiguity resolution)

    Posted: August 11th, 2008, 1:55am MDT by josh

    As I’m adding more capabilities to the Gazelle grammar description language, I’ve come up against a problem for which I can’t find any solution I like. So I’m putting it to you, my esteemed readers, to suggest a solution I will like more than the ones I’ve already brainstormed.

    Here’s the problem. I’m at a point where I want to support prioritized choice in Gazelle grammars. Prioritized choice is a way of letting grammar writers explicitly resolve ambiguities that have crept into their grammar. Take the following grammar rule:

    s -> "X" | "Y";

    The two alternatives in this rule (X and Y) have equal priority. That doesn’t actually matter in this case, since no text can match them both, but now consider changing the rule a bit:

    s -> "X" | "X";

    Now there is a problem, because an X matches both alternatives. There are two legal parse trees that are both correct according to the grammar. As a result this grammar is ambiguous. A parser doesn’t have any reason to choose one over the other, because they are of equal priority.

    The solution to the problem is to let the user specify which alternative should be taken when both match. If we allow users to do this by introducing prioritized choice, our solution looks very much like how Parsing Expression Grammars (also known as PEG’s) work. And the syntax that has become very standard in the PEG literature is to use “/” as the operator for prioritized choice. So our ambiguous grammar would be made unambiguous by using prioritized choice:

    s -> "X" / "X";

    Now the parser will always choose the first alternative, and the grammar is no longer ambiguous. You might wonder “what’s the point?”, since the second alternative will never be taken, but in many other cases of ambiguity, the choice of lesser priority will be taken. The classic example of ambiguity (which can be resolved in this way) is if-then-else:

    stmt -> "if" expr "then" stmt |
            "if" expr "then" stmt "else" stmt |
            expr;
    expr -> "true" | "false" | /[0-9]+/;

    This grammar is ambiguous, because the fragment:

    if true then if true then 1 else 2

    …can be parsed in two different ways. The “else” could be assigned to either “if” — both are valid according to the grammar.

    So I definitely know I will be introducing prioritized choice. The question is what syntax to use. Like I mentioned before, I would really really like to use “/” since there is a strong convention from the PEG world for it. But unfortunately this introduces a major ambiguity in Gazelle’s own grammar, because it conflicts with the syntax for regex literals. What does this mean?

    s -> a / b;
    a -> c / d;

    Is that two rules that both use prioritized choice, or is it one rule that has a giant regular expression in it?

    s -> a / b;
    a -> c / d;

    I really don’t want to give up using slashes to introduce regular expression literals, because that is a really strong convention that a lot of programmers are familiar with. But I also really don’t want to give up using the really strongly established convention of using slash for prioritized choice.

    Anyone have an idea for a compromise I might like? The only one I can think of is adding a prefix to the regex literals sort of like Perl’s m{} form for regular expressions. It could be something like m/regex/. But I hate to add that uglification to the grammar, especially since regular expressions are going to be SO much more common than prioritized choice.

    Any ideas, anyone?

    Bonus: the other form of ambiguity resolution

    Choice/alternation isn’t the only possible source of grammar ambiguity. Gazelle’s constructs for optional and/or repeating grammar fragments can also be ambiguous. The if-the-else example is perhaps more idiomatically written in Gazelle like so:

    s -> "if" expr "then" stmt ("else" stmt)? | expr;

    It’s written differently, but it’s just as ambiguous as before. So we need the same sort of tool to resolve this ambiguity — a way to say “prefer to match the optional part” or “prefer to NOT match the optional part.” This is equivalent to defining a “greedyness” to these operators like Perl lets you do in regexes. The syntax I am planning for these is:

    s -> a b?; // no preference to match the b or not
    s -> a b+; // no preference to keep matching b or not
    s -> a b*; // no preference to keep matching b or not

    non-greedy variants
    s -> a b?-; // prefer to NOT match b (non-greedy)
    s -> a b+-; // prefer to STOP matching b (non-greedy)
    s -> a b*-; // prefer to STOP matching b (non-greedy)

    greedy variants
    s -> a b?+; // prefer to KEEP matching b (greedy)
    s -> a b++; // prefer to KEEP matching b (greedy)
    s -> a b*+; // prefer to KEEP matching b (greedy)

    I’m not following the Perl convention here (Perl uses “?” instead of “-” to make the match non-greedy). But on the other hand:

    1. Perl doesn’t have a way to explicitly make the match greedy — matches are greedy by default. So it has no convention for a “make greedier” operator.
    2. not very many people know/use Perl’s greediness operator anyway

    So to write the if-then-else example with explicit ambiguity resolution, you would write:

    "if" expr "then" stmt ("else" stmt)?+ | expr;

    …because when there is an ambiguity, we want to bind the else to the most recent if.

    An aside: “great, so I can just start slapping greedy and non-greedy modifiers on everything, just to be safe, right?”

    Not so fast there. Ambiguity resolution isn’t something you should take lightly, because a grammar with an ambiguity in it presents a user-facing irregularity in your language. For example, the if-then-else example is a user-facing issue: either “else” pairing makes logical sense, and it’s totally arbitrary how you decide to resolve the ambiguity. You have to tell your users about it. You should strive to absolutely minimize the numbers of such ambiguities in your language whenever you can!

    The danger of using ambiguity resolution operators willy-nilly is that you don’t necessarily know if they’re actually resolving ambiguities. For example, if you wrote:

    s -> "X" / "Y";

    …the prioritized choice here is completely extraneous, since this grammar is unambiguous already. Because ambiguities are so important to keep track of and educate your users about, you don’t want to have to wonder whether a prioritized choice is resolving an ambiguity or not. You want to know that it is.

    To this end, Gazelle will keep track of all ambiguity-resolution operators and error if any of them are never actually used to resolve an ambiguity. So you can’t just sprinkle them around “just to be safe,” you must only use them where they are effective. And this will make Gazelle grammars much more useful, because you will know exactly where all your ambiguities are.

    By the way, Gazelle can’t detect any ambiguity in your grammar; calculating whether an arbitrary context-free grammar is ambiguous or not is undecidable. Basically, all grammars fall into one of several categories:

    1. Gazelle can handle the grammar (ie. the grammar is in the green region of the diagram in my previous entry).
    2. Gazelle can handle the grammar except for the fact that it’s ambiguous.
    3. Gazelle cannot handle the grammar, and would still not be able to even if you resolved the ambiguities.

    If your grammar falls into category 3, Gazelle doesn’t know whether the grammar is ambiguous or not. But for grammars in categories 1 or 2, Gazelle knows exactly where the ambiguities are.

    Bonus #2: a word about Parsing Expression Grammars

    Something you might say (especially if you’re a PEG fan) is “gosh Josh, if you’re going to all this work to introduce ambiguity resolution, why not just use PEG’s flat-out and abandon all this LL(*) nonsense?” Because essentially what I’m doing is supporting both PEG-based constructs (like ordered choice) and context-free grammar constructs like non-ordered choice. And PEG’s support a larger set of grammars than Gazelle ever will (I believe they support any non-left-recursive PEG, though I don’t have a reference for this).

    There are two really major reasons why I don’t think PEG’s are the way to go. The first one is related to the point I was making before. With PEG’s all choice is prioritized choice. However, with PEG-based tools you have no idea if there are real ambiguities in your grammar or not. PEG’s are defined such that ambiguity does not exist, but this does not make if-then-else style issues go away, it just sweeps them under the rug. Even if you don’t call it “ambiguity,” it is still a user-facing issue that you as a language designer definitely want to know about! If you have the following in a PEG:

    s -> a / b;

    …you have no way of knowing if this represents an ambiguity, which means that you don’t know if changing the grammar to:

    s -> b / a;

    …will make a substantive difference or not! You’re flying blind.

    The second reason is that Gazelle’s parsing algorithm will be far, far faster than packrat parsing (which is the algorithm used to parse PEG’s). Both algorithms will have linear asymptotic complexity (though as I briefly mentioned before, I believe that LL(*) grammars can degrade to n^2 in degenerate cases; though I will argue that this will never happen in any sane language). But though both are linear, packrat parsing’s constant factor is far higher, and packrat parsing uses O(n) space where n is the size of input.

    This might all sound like hot air, but check out these real, hard numbers. On the homepage of the Aurochs parser generator, which uses PEG’s and packrat parsing to parse JavaScript, they say:

    How much memory does it use?

    An awful lot.

    Grammar Input size Memory consumption CPU time Memory overhead
    Javascript 140kb 380Mb 1s 2700
    Javascript 71kb 180Mb 0.45s 2500
    Javascript 1717 5.6Mb 0.01s 3200

    I can’t vouch for whether this is a great implementation of packrat parsing, but the authors sure seem to know what they’re doing. I hope you’ll agree that 380Mb of memory just to parse 140kb of text is horrendous, and 1s is quite slow too. Gazelle can’t parse JavaScript yet, but here is a rough ballpark estimate of what the performance will be like to parse 140kb of text once it can:

    • memory: Gazelle uses 24 bytes per entry on its stack, plus 12 bytes for every token of lookahead. These are the only non-fixed costs. If we assume a parse tree that goes 100 deep, and an input that requires 100 tokens of lookahead (both ridiculously large assumptions), Gazelle’s memory footprint is 3.6kb. This is 0.0009% of the packrat parser’s memory requirement.
    • CPU: Gazelle currently parses JSON at 18.8Mb/s. JavaScript will be more complicated than JSON to parse, but Gazelle will also undergo a lot of optimization that hasn’t been done yet, so I’ll go with the 18.8Mb/s figure. That’s 134x faster than the 140kb/s Aurochs is quoting.

    Like I said, the Aurochs authors seem to be very smart — I’m not saying they’ve done a bad job. I’m just saying that if Aurochs is any indication of what we can expect from packrat parsing, it is orders of magnitude slower, and takes enormous amounts of memory even for modest amounts of input text. If anyone has better numbers to show for packrat parsing, I’m all ears.

    So even though it’s more work, I’m 110% convinced that using this combination of LL(*) with explicit ambiguity resolution is the way to go, and is the best way to achieve the goals that I have set out for this project.

  • Permalink for '2_Star/2008/08/07/Clarification_and_apology'

    Clarification and apology

    Posted: August 7th, 2008, 3:53am MDT by josh

    Terence Parr (author of ANTLR) pointed out a mistake I made when I was describing how ANTLR deals with non-LL(*) grammars. I said that ANTLR’s option for dealing with non-LL(*) grammars is to do a depth-first search which runs in exponential time. While I believe this is true if you enable “backtrack=true” alone, ANTLR also has a “memoize=true” option that you can use together with backtrack=true to reduce the time complexity to linear, at the expense of more space.

    I also claimed that ANTLR can handle all grammars when backtrack=true is on. Terence corrected me and said that even with backtrack=true, ANTLR does not handle all grammars — for example left-recursive ones. I mistakenly thought that depth-first search parsing would be capable of handling left-recursion, but after some more thought and some time with my trusty Grune and Jacobs I understood why this is not the case. There are other top-down algorithms that can handle this, but backtracking depth-first search (with or without memoization) is not one of them.

    Apologies to Terence for my misstatements about ANTLR!

  • Permalink for '2_Star/2008/08/06/The_grammars_Gazelle_can_handle'

    The grammars Gazelle can handle

    Posted: August 6th, 2008, 1:57pm MDT by josh

    I wanted to follow-up a bit more on my recent posts about the grammars Gazelle and competing parsing frameworks can handle. Here is an Euler diagram showing my current understanding of what grammars Gazelle can handle compared with the well-known Strong-LL language classes as well as competing parsing frameworks.

    GrammarClasses.png

    Gazelle can currently generate lookahead for any LL(k) grammar, all LL(*) grammars that ANTLR can handle, and a few LL(*) grammars (namely tail-recursive ones) that ANTLR cannot. ANTLR can handle all LL(k) grammars and many LL(*) ones. SLK can handle only LL(k).

    Since my last entry, I gave Gazelle the capability to correctly generate lookahead for all LL(k) languages, as long as the user specifies a maximum k. This is in-line with what SLK and ANTLR do, since the problem is undecidable in general. By default I still use my heuristic that I think will detect in most cases whether a grammar will be LL(*) or LL(k) without making the user specify a maximum k, but this heuristic will be foiled in some cases. So if the user really thinks their grammar is LL(k) even though it failed the heuristic, they can specify a maximum “k” and Gazelle will keep searching until it hits that “k”.

    Just in case I wasn’t clear enough in my last entry, I strongly dispute the claims on the SLK website that SLK is the only framework that does true Strong-LL(k) analysis. As you can see from the diagram, both ANTLR and Gazelle can handle not only LL(k) but a superset thereof. Current and potential SLK customers: don’t be fooled by SLK’s claims! They haven’t been true for years — ANTLR has been able to handle LL(k) for a long time!

    Here are some examples of grammars that fall within the different regions of that Euler diagram:

    Strong-LL(*) but can’t be handled by Gazelle or ANTLR:
    s: a | b;
    a -> "(" a ")" | "X";
    b -> "(" b ")" | "Y";

    Can be handled by Gazelle but not ANTLR:
    s -> a "X" | a "Y";
    a -> ("Z" a)?;

    Can be handled by ANTLR but not SLK (and is not Strong-LL(k)):
    s -> a "X" | a "Y";
    a -> "Z"*;

    A note about generalized parsing

    When I have talked about what grammars ANTLR can “handle,” I’m specifically talking about the grammars it can generate LL(*) lookahead for, without the help of syntactic or semantic predicates. Put more plainly, these are the grammars that ANTLR can parse efficiently and without any hints from the user.

    Why the caveats? Well you can put ANTLR and comparable tools in a mode where it parses by depth-first search, using backtracking to abandon search attempts that did not work out. This is equivalent to what most regexp implementations (Perl, Python, Ruby, etc) do, and as Russ Cox has demonstrated this technique has severe (exponential) worst case performance. If you want to trade space for time, you can memoize this depth-first search and get linear time at the cost of some space. While I think these techniques are totally legitimate as a way of expanding the grammars the tool can handle, I personally am singularly focused on parsing in linear time with very low space overhead. I am doing everything I possibly can to parse real languages (everything from JSON to C++) without going outside these bounds. That’s what’s going to make it practical to use Gazelle grammars for things like syntax highlighting in text editors.

    Secondly, ANTLR has features that let the user help the parser out with its decisions, either by writing Java code (semantic predicates) or by specifying syntax fragments that the user promises will properly disambiguate alternatives (syntactic predicates). It’s always useful to have more tools of this sort, and Gazelle will certainly include some with time, but again this is an apples to oranges comparison.

    Finally, Adrian Thurston (author of the popular Ragel state machine compiler) mentioned on one of my previous entries that there exist general (eg. can handle any grammar) algorithms for top-down parsing, which are automatically more powerful than Gazelle. While this is true, none of them are what I consider practical. Gazelle’s time/space complexity are:

    Time: linear
    (actually LL(*) could probably become quadratic in degenerate cases, but LL(k) is definitely linear and LL(*) is too, for all intents and practical purposes).

    Space: max stack depth + max lookahead depth
    Both can be capped. In a bit more detail:

    • The maximum stack depth represents the nesting level (for example, number of nested parentheses).
    • The maximum lookahead depth represents the number of tokens we have to look ahead in the worst case to disambiguate alternatives.
    • Input text that doesn’t respect the maximum depth limits will be rejected (even if it is valid according to the grammar), but that’s the only way to prevent malicious input text from consuming arbitrary amounts of memory. Clients that don’t mind getting DoS’d can just set these limits to unlimited.

    No general method can approach this space/time complexity. And I think it is for this reason that no tool (AFAIK) which is aimed at practical use implements a generalized top-down method.

    I’d like to post a slightly more detailed run-down of the generalized methods Adrian mentioned when I have the time to take a closer look at them.

    So to make a long story short, I stand by my previous claim that Gazelle is (at least briefly) the most powerful practical top-down parsing framework there is.

  • Permalink for '2_Star/2008/08/04/SVG_support_in_Firefox_3'

    SVG support in Firefox 3

    Posted: August 4th, 2008, 11:33am MDT by josh

    Whoa! Am I the only one who didn’t get the memo that suddenly Firefox has really impressive SVG support? Check out these demos and be amazed at what you didn’t think your browser (sans Flash) could do.

    Is there any more point to the HTML <canvas> element?

    Does anyone know if this notion of a “compound document” is standardized? Eg. where I can have an HTML page with an SVG snippit embedded, and I can manipulate it from JavaScript as one big DOM? I know this works in Firefox (as the above linked-to examples demonsrate), but will other browsers that add SVG support work the same way? If so, the standard HTML+CSS+JavaScript platform just got a *lot* more capable.

    There is the capability for doing animation (as this demo illustrates) by using JavaScript’s setTimeout or setInterval to run some JavaScript code periodically that updates the DOM. This seems to work decently, though you have to do all the math yourself and the animation can be a little choppy (since it’s all JavaScript). It appears that SVG also defines animations that you can specify declaratively which seems like it would both let you express animations at a higher level (”set this element into motion based on a spline with these key points”) and would probably get you smoother results since it the animation would be implemented at the browser level instead of JavaScript. But Firefox doesn’t currently support any of the animation module.

  • Permalink for '2_Star/2008/07/29/A_partial_retraction_to_my_last_entry'

    A partial retraction to my last entry

    Posted: July 29th, 2008, 2:25am MDT by josh

    From a theoretical standpoint, my last entry was incorrect when I said that my new algorithm in Gazelle can decide whether a grammar is strong-LL(k) or strong-LL(*). I had proofs against me — these problems have actually been proven undecidable.

    Once I heard this result, I spent hours trying to think of a grammar that is strong-LL(k), but that Gazelle will say is not. I am fairly certain that Gazelle’s algorithm will always terminate, so if the proof is to be believed, there must be a grammar that Gazelle falsely claims is not strong-LL(k) or strong-LL(*).

    After many hours, I came up with one. Here it is:

    s -> "X"* "Y" "Y" "Z"| "X" c;
    c -> "Y" c "Y" | "Q";

    This grammar is strong-LL(5), but Gazelle falsely claims it is not strong-LL(k) for any k. ANTLR can handle it (as long as the recursion depth is set high enough).

    I’m thinking, though, that this isn’t really a problem. My intuition says that it will be rare that anyone actually tries to write a grammar that fails in this way — after all, it took an evening of specifically trying to exercise this case before I could find a grammar that did. I don’t think real grammars will run into this problem.

    So even though I can’t handle all of strong-LL(*), I think I can handle the grammars that will actually come up in practical use, and I still have the extremely desirable property of being able to decide whether a grammar fits or not without forcing the user to tweak recursion depth constants. So although my result isn’t as theoretically significant as I thought, I think that from a practical standpoint I’m still in good shape: I have a terminating algorithm that can handle a very useful set of grammars.

Comments for Josh the Outspoken

  • Permalink for 'Comments_for_Josh_the_Outspoken/2008/07/28/Comment_on_Gazelle_is__at_least_briefly__the_most_powerful_top_down_parser_generator_there_is_by_josh'

    Comment on Gazelle is (at least briefly) the most powerful top-down parser generator there is by josh

    Posted: July 28th, 2008, 11:12am MDT by josh

    @maetl: thanks! It’s funny, when I started writing Gazelle, I believed pretty strongly in that mantra (”parsing and grammars are amongst the most studied and well understood areas of computer science”). I didn’t expect to invent any new algorithms, I mainly just wanted to take existing algorithms and give them different packaging.

    But before long I was trying to figure out if I could improve on the properties of those algorithms that made them less that ideal from a user perspective. I studied the documentation from existing tools and looked at the cases they said they couldn’t handle, and tried to figure out if I could handle them. And before I knew it, I had thought of a new algorithm!

    @Kay: thanks! I agree that having the tool be able to tell you definitively that it is not LL(k) or LL(*) will make things much more user-friendly. I could still use a bit of work on error messages that give you every bit of information about *why* it was not LL(k) or LL(*). In the future I should be able to draw pretty graphs that make the problem obvious. That will take a bit of work, but it’s totally doable, and I think will make for a compelling user experience.

2 Star

  • Permalink for '2_Star/2008/07/27/Gazelle_is__at_least_briefly__the_most_powerful_top_down_parser_generator_there_is'

    Gazelle is (at least briefly) the most powerful top-down parser generator there is

    Posted: July 27th, 2008, 7:27pm MDT by josh

    On Wednesday when I was sitting at home waiting for the cable guy, I thought very hard about top-down (LL) parsing and lookahead generation. After a few hours of thinking I had an epiphany where I very rapidly discovered several major results that I do not believe have ever been discovered before.

    I have implemented several (but not all) of these discoveries in the current Git version of Gazelle, and as far as I know, this makes Gazelle currently the most powerful top-down parser in existence.

    (One caveat for the above statement — when I say “powerful” I am referring to the set of grammars it can successfully generate lookahead for. Other parser generators still have features that Gazelle does not have, like syntactic and semantic predicates. And alas, Gazelle still lacks a way to ignore whitespace until the time that I redesign this capability in a way that I am happy with).

    As far as practical and powerful top-down parser generators go, I am only aware of two serious contenders. In this context, “serious contenders” means “can handle LL(k) for arbitrary k or better.” They are:

    • SLK, a commercial LL(k) parser generator. I dare not download or try it out, because I don’t want to be in any way entangled with their license agreement (which prevents reverse engineering) or tainted by what I learn about it.
    • ANTLR, which is a popular open-source parser generator written in Java. Terence Parr pioneered the LL(*) technique on which Gazelle builds, and Terence deserves a lot of credit for laying the groundwork in this field. It would have taken me a lot longer to get Gazelle where it is if not for his work.

      As a side note, Despite what the tough talk on the SLK homepage says, ANTLR has been LL(k) for years (LL(*) is a superset of LL(k)).

    Here are the key discoveries I made while the cable guy was assisting other customers. Apologies if some of this material is gibberish to people who aren’t deep in the field — I’ll attempt to summarize the high-level impacts of each discovery.

    Computing whether a grammar is LL(k) or LL(*) is decidable, without specifying a maximum k!

    What does this mean, in practical terms? Today, when you use either SLK or ANTLR, you have to specify how long they should keep searching for correct lookahead before they give up. For SLK (which supports only LL(k)) you specify a -k command-line argument which is the maximum number of lookahead tokens to consider before giving up. For ANTLR which supports both LL(k) and LL(*), you specify both a -k and a maximum recursion depth constant. (Correction from Terence Parr).

    Again, what does this mean? As a user of SLK or ANTLR, you have to choose an arbitrary number for these parameters. If you choose too low, you’ll make them give up before they find the answer. If you choose too high, you’ll have to wait longer before they fail in cases where the grammar is simply not LL(k) or LL(*). This is annoying and unintuitive for a user.

    What I discovered is that this problem is decidable: you can write an algorithm that will always terminate that either finds the lookahead successfully or tell the user that the grammar is not LL(*) (and why). This is much better for users, because you don’t have to worry about those constants and have to worry that you set them too low. Big win for users.

    Tail recursion is a-ok

    ANTLR’s algorithm currently cannot handle rules of this sort:

    s -> ("X" s)?

    This is a simple tail-recursive grammar that matches any number of X’s. ANTLR balks at this, but if you apply a tail-recursive optimization very much like the one implemented in imperative languages (don’t bother making a new stack frame when you re-enter the tail-recursive rule), a top-down parser can handle this just fine. Which is what Gazelle does.

    This isn’t a huge deal, since the grammar is better written:

    s -> "X"+;

    But it’s nice to be able to handle more cases.

    LL(*) can be extended to support full-LL(*) parsing

    This probably won’t make much sense to anyone who’s not deep in this field. At a high level, full-LL supports more grammars than strong-LL (which is what everyone means when they say “LL”), but full-LL parsing has not ever been considered practical. I haven’t tried it out yet, but I’m pretty sure I have a way of extending LL(*) to support full-LL(*) while still being practical and efficient. Bottom line: again, we could support more grammars. For example, here is a grammar that is full-LL(2) but not LL(2):

    s -> "A" a "A" "A" | "B" a "B" "A";
    a -> "B"?;

    Will full-LL be that important in practice? I’m guessing not. But from a theoretical perspective this result is important because I believe it will be the first practical, efficient method anyone has ever proposed for parsing full-LL.

    LL(*) could handle even more grammars by using regular supersets of nonregular lookahead

    LL(*) currently can only handle situations where the part of the grammar directly following a decision is regular. In some cases, that part of the grammar is not regular because it counts (for example, nested parentheses), but it doesn’t matter because all you’re really trying to do is skip past that part to a distinguishing token. For example, this grammar is from “The Definitive ANTLR Reference”:

    se -> e "%" | e "!";
    e -> "(" e ")" | "ID";

    In this grammar, “e” is nonregular because it matches parentheses. Regular expressions cannot count and therefore cannot match parentheses. However, if we’re trying to make a prediction at the beginning of “se”, all we’re really trying to do is skip past all that junk to get to either a “%” or a “!”. A regular expression would suffice — something like /\(*ID\)*/. The regex doesn’t count, but it skips past all those nested parentheses just fine. If the parentheses aren’t properly nested, we’ll figure that out later.

    What this means for Gazelle

    I made all these discoveries on Wednesday. On Saturday I had a day of mind-blowing productivity and implemented the first two of them, in addition to fixing all the previously-existing shortcomings that 0.2 had. So like I mentioned, Gazelle is currently (as far as I know) the most powerful top-down parser generator there is. Specifically, no other existing top-down parser generator can, at the moment:

    • handle tail-recursive grammars
    • tell you definitively when a grammar is not LL(k) or LL(*) (others make you tweak search depth constants)

    A few days ago I emailed Terence Parr (the author of ANTLR) telling him about these findings, and I expect that he will probably add these enhancements to ANTLR until he has feature parity with Gazelle. But I just wanted to take a moment to enjoy that I made a significant discovery in an important field. The second of the above points especially is just huge.

    As one final closing remark, I wanted to mention a grammar that the SLK documentation presents to boast its power. It is an LL(27) grammar with 26 terminal types. Here is what they have to say about it:

    The following grammar is LL(27), and has an alphabet of 26 terminals. So the possible number of lookahead strings is 2627, or about 160000000000000000000000000000000000000. SLK can analyze this grammar and construct a parser for it in about one second.

    [snip grammar]

    The -y option needs to be used because the grammar is in yacc syntax. And -k=27 or larger also needs to be specified. The output of SLK for this grammar is large, so it should be redirected to a file.

    Just to demonstrate that Gazelle can handle this, I checked this grammar in Gazelle format into sketches/ll27.gzl in the source tree. If you run gzlc on it with the --no-minimize-rtns option (minimization is a nice feature that is convenient for real grammars but too inefficient for this diabolical grammar), Gazelle generates correct lookahead in half a second (and this is totally unoptimized code written in Lua, a byte-code interpreted language), doesn’t require you to specify a -k=27 option, and the entire output is about 5k.

    I normally wouldn’t seek to put other parser writers down, but the SLK web page rubs me the wrong way when it falsely claims that:

    SLK is the only LL(k) parser generator available. All others that claim to be LL(k) are not really LL(k) by any of the classical definitions of LL(k) found in the literature. SLK does a full LL(k) analysis of the grammar. The proprietary SLK algorithm is the only known near-solution to this NP-complete problem. Other deterministic algorithms must either limit k to a very small value, usually two, or use approximate lookahead methods that are very weak imitations of LL(k).

    As I mentioned before, this has not been true for a while because of ANTLR, and it is even less true now with Gazelle, which is strictly more powerful than SLK.

    Update: I made a small correction that I received from Terence Parr. He also says he’s pretty sure that determining the k for an LL(k) language is undecideable, so we’ll see, maybe there’s something I’ve missed. I feel pretty confident about my solution, but I certainly could be wrong.

  • Permalink for '2_Star/2008/07/26/Good_experience_with_Comcast'

    Good experience with Comcast

    Posted: July 26th, 2008, 10:35am MDT by josh

    There’s a story on Slashdot right now about how Comcast is reading blogs that mention them as a way to try and improve their image among customers. I guess Comcast has a generally bad reputation when it comes to customer service.

    I just wanted to mention that I personally had a good experience with Comcast this week (not perfect, but quite good). I think that if a company is really trying to turn itself around and provide good service, they should be recognized for it, so this is my part in doing so.

    This week I needed to move my internet service from my old apartment into my new condo. I called Comcast customer service on Sunday at 6PM (+1 for having 24/7 customer service) to schedule an appointment. The 800 number made me choose 2 or 3 menu options (+/-0) after which I talked to a human being almost immediately (+1). The person I talked to was friendly and helpful (+1), though the first available appointment wasn’t until Wednesday 8-12 (+/-0). The cs rep gave me a discount on the visit which is apparently $100 and cut it down to $10 for me (+1!).

    On Wednesday I stayed home from work from 8-12 waiting for the rep. I didn’t wake up until about 8:30am and then I took a shower, so by nine I was slightly concerned by the possibility that I could have missed the field tech. I just didn’t want to find at noon that I had waited four hours in vain and would have to stay home from work for another four hours on a different day. So I called the 800 number again, again was able to talk to a rep almost immediately, who again was friendly and helpful and able to answer my question right away — they could confirm that the tech hadn’t visited me yet (+1). So I waited at home until noon.

    By noon the field tech still hadn’t arrived (-1). I called the 800 number, again almost immediately talked to someone friendly and helpful, and the rep said they’d contact the field agent and have him call me. 10 minutes later the field agent calls me and says he’ll be there in 5 minutes. 5 minutes later he’s at my doorstep.

    The field agent was also friendly and helpful (+1) — he was happy to explain a lot about what he was doing, he took the time to do the job right, and he even ran speakeasy’s bandwidth test on my computer once he was done. When he left he gave me his card and said I could contact him with any problems.

    So it looks like Comcast is really doing the right things to turn their image around. And I think that should be recognized (at least for as long as the level of service is like what I experienced this week).

  • Permalink for '2_Star/2008/07/17/416_Random_People_with_RoR_on_their_resume___Reply_All___Reverse_Flash_Mob'

    416 Random People with RoR on their resume + Reply All = Reverse Flash Mob

    Posted: July 17th, 2008, 11:22pm MDT by josh

    The awesomest thing ever happened to me today. I was apparently one of 416 lucky people who looked promising to a probably-clueless recruiter who’s looking to hire for a Ruby on Rails job.

    Date: Thu, 17 Jul 2008 21:57:59 -0500
    From: “Pradipta [lastname]”
    Subject: Ruby on Rails position…
    To: [416 people]

    I have a couple of Ruby on Rails position, wanted to know if you are interested?

    Max [lastname]
    Technical Recruiter

    Of course, as the number of people on a thread increases, the chances that one of them will hit “reply all” approaches 1 asymptotically:

    Date: Thu, 17 Jul 2008 20:10:37 -0700
    From: Zack [lastname]
    To: [416 people]
    Subject: Re: Ruby on Rails position…

    What are the positions?

    Which triggers the absolute inevitability that there will be follow-ups about how people should stop hitting “reply all.”

    Date: Thu, 17 Jul 2008 22:15:39 -0500
    From: “Mark [lastname]”
    To: “Zack [lastname]”
    Cc: [416 people]
    Subject: Re: Ruby on Rails position…

    this is fun. it’s like a message thread i did not subscribe to. please no more ‘reply all’s. thanks.

    Best Regards,

    Mark [lastname]

    This was amusing already. But then people started taking advantage of the fact that they had a list of 416 people who were ostensibly interested in Ruby on Rails, and started posting other job offers, conversing about who’s going to OSCON, and volunteering their efforts to remove bad email addresses from the list of 416 that the thread started with. It wasn’t long until people started suggesting that this be made into an actual email list.

    It is such a fascinating thing to watch. It’s like a flash mob, except the surprise is on us, the mob participants. It’s like we were all beamed into the same virtual room by one single person who chose the group of us, and left us to figure out what to make of the situation. I can’t wait to see where this ends up.

    Update (10:30pm, a scant 2.5 hours after the initial mail went out): there’s a Google Group now — they’re calling it Pradipta’s Rolodex, after the recruiter who emailed us all to begin with.

    Update (1am): Pradipta’s Rolodex now has a logo — pretty hilarious.

    Update (July 18, 10:40am): First off, sorry the blog was down for a while. I enabled caching (which isn’t enabled by default in Wordpress) so hopefully the site can handle the load now. Also, the real Pradipta has spoken up and apologized for the whole incident. Though I’d like to think that only the exceptionally grumpy among us are really all that upset.

Comments for Josh the Outspoken

2 Star

  • Permalink for '2_Star/2008/07/12/100_lines_of_C_that_can_parse_any_Protocol_Buffer'

    100 lines of C that can parse any Protocol Buffer

    Posted: July 12th, 2008, 4:44am MDT by josh

    There’s lots of misinformation flying around the blogosphere about Google’s Protocol Buffers. One common claim is that you can’t parse a protobuf without having the .proto file. This is false, as demonstrated by This 100-line C program that does just that. It can parse an arbitrary protobuf into its field numbers and wire types. This is pretty closely equivalent to what you get from a generic XML parser, except that with XML you get names for the keys (elements) instead of numbers and strings instead of the four or so wire types that are defined by protocol buffers.

    In both the XML and the Protocol Buffer case, you want to have more information if you’re going to actually write programs that consume application-domain data. You want documentation that specifies exactly what all the fields mean, and enough information to turn the on-the-wire values into actual numbers where appropriate. It just so happens that Protocol Buffers specify this information in a structured format called a .proto file.

    Update, 12:50PM July 12: apparently I wasn’t clear enough: my 100 lines of C does not use any Protocol Buffer library. It implements the decoding itself. My point is that the format is so simple that you can parse it generically in 100 lines of C. (If you’re wondering: you’d be lucky to get within a factor of 10 for a bare-bones XML parser).

Josh

  • Permalink for 'Josh/2008/07/11/habes___2008_07_11T11_34_00'

    habes @ 2008-07-11T11:34:00

    Posted: July 11th, 2008, 12:36pm MDT by Josh
    My phone is currently a brick. I made the mistake (in retrospect) of trying to update it this morning for the new iPhone App Store. It succeeded in installing the new software, but it appears that it is now needs to be reactivated or something. But unfortunately the iTunes store is completely h0rked and unavailable. So I now have no phone, on a day when I was going to try to sign papers on my condo. This sucks!

Comments for Josh the Outspoken

  • Permalink for 'Comments_for_Josh_the_Outspoken/2008/07/09/Comment_on_Why_Gazelle_Matters__part_2_by_josh'

    Comment on Why Gazelle Matters, part 2 by josh

    Posted: July 9th, 2008, 4:06pm MDT by josh

    Hi Don,

    I’m not sure that I agree that languages like Haskell are the best for compiler writing. But then again, my project (which I intend to be very very good at it) is still partially vapor, so I can’t make any grandiose claims about being better than Haskell in this problem domain. Yet. )

    But for your second point, you seem to be missing the N^2 vs. N+N argument I was trying to make. Sure, of course you can write bindings. I’ll ignore for a moment the practical matters of what it means to embed Haskell into your process and just assume that bindings are a practical and compelling solution.

    Even if I grant you that, writing bindings for each language pair that you want to parse from/to is madness! It takes N^2 work. If you do things my way, you have to write:

    - a parser for C
    - a parser for Haskell
    - a parser for JavaScript
    - Gazelle bindings for C
    - Gazelle bindings for Haskell
    - Gazelle bindings for JavaScript

    At that point, I can parse Haskell from JavaScript, or C from Haskell, or any of the N^2 combinations, but it only took me N+N work. If I do things your way, to get the same functionality, I have to write:

    - a parser for C
    - a parser for Haskell
    - a parser for JavaScript
    - C bindings for the C parser
    - C bindings for the Haskell parser
    - C bindings for the JavaScript parser
    - Haskell bindings for the C parser
    - Haskell bindings for the Haskell parser
    - Haskell bindings for the JavaScript parser
    - JavaScript bindings for the C parser
    - JavaScript bindings for the Haskell parser
    - JavaScript bindings for the JavaScript parser.

    Bindings aren’t trivial. So at the end of the day, since no one is actually going to do the N^2 work, the Language.C person will say “sure, you can use my library, all you have to do is write bindings!” and the person will decide that bindings are too much work for the return you get (because they are), or they’ll be partially written and sporadically maintained bindings that aren’t useful to anyone else, and we’ll keep having the fragmented parser landscape we have today.

    Or I’ll finish Gazelle and that pain will be a thing of the past.

2 Star

  • Permalink for '2_Star/2008/07/09/Why_Gazelle_Matters__part_2'

    Why Gazelle Matters, part 2

    Posted: July 9th, 2008, 9:56am MDT by josh

    Every day when I read the programming reddit, I see things that reaffirm to me why Gazelle matters.

    Yesterday it was an “ask reddit”: Need a C library for parsing C files, suggestions?. Responses include:

    • “How about gcc?”, clearly not realizing that gcc is (1) not a library and (2) ridiculously complicated.
    • “Here are an ANSI C grammar for lex and yacc. […] (NB: These were last updated in 1995, so I don’t know if you’ll need to tweak them any. But, at least they’ll get you close […])”. I am constantly surprised that people do not realize how utterly useless it is to have software of unknown quality. Can you imagine asking for an implementation of SHA1, and having someone hand you some code and saying “I’m not sure if it’s quite right, but it should at least get you part of the way there.” You don’t know where the possible problems are, you don’t know anything about its design process — you might as well start from scratch, that’s how useless it is to have half-written code.
    • “Language.C: manipulating and generating C abstract syntax from Haskell”. So here you have someone who’s doing the hard work of parsing C and making sure it’s correct, but he’s writing his library in Haskell so it only works for Haskell. Sadly useless to 99.9% of the programming world.

    Someone also did mention Elsa, which is probably the best solution to the original guy’s problem, but then someone else replied:

    elsa looks really good. I need code detail that the preprocessor doesn’t know about, but elsa can probably get me there. Any interest in a python wrapper?

    There you go again — even when someone has done a good job (like Elsa has), it’s still useless to people parsing from other languages unless you write specific “bindings” for each language you want to parse from. Madness, pure madness! The idea that it should take N^2 work to parse N languages from N languages is madness.

    The two most important design goals of Gazelle are:

    • reusable grammars: grammars that can be used by anyone without modification. Grammars that can have test suites, to ensure quality and give people confidence that they are correct.
    • grammars that you can use from any language, without bindings. Sure, you have to write bindings for Gazelle itself, but that only has to happen once, not once per language you want to parse. So parsing N languages from N languages takes N+N effort, not N^2.
  • Permalink for '2_Star/2008/07/07/Protocol_Buffers'

    Protocol Buffers

    Posted: July 7th, 2008, 8:59pm MDT by josh

    Today Google open-sourced a component we’ve used in-house for a long time called Protocol Buffers. It’s a binary format that we use for almost all of our on-the-wire messages and lots of disk-based long-term storage as well. For many (maybe most, though not all) uses, Protocol Buffers kick XML’s ass. In a big way. Seriously, if you’re XML this is the part where you sulk home with your tail between your legs.

    Why not just use XML?

    Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

    • are simpler
    • are 3 to 10 times smaller
    • are 20 to 100 times faster
    • are less ambiguous
    • generate data access classes that are easier to use programmatically

    Protobuf Developer Guide

    If you or your company needs a very compact, very fast, extensible format for structured data, you should give Protobufs a good look!

    (P.S. Of course I don’t speak for Google. The attitude is all me talking and my personal disdain for XML. Google’s official attitude is, of course, much more diplomatic).

  • Permalink for '2_Star/2008/06/29/Gazelle_v0.2_is_here_'

    Gazelle v0.2 is here!

    Posted: June 29th, 2008, 3:56am MDT by josh

    It’s been a long time coming, but Gazelle v0.2 is finally here!

    To me, Gazelle 0.2 represents a significant shift. With 0.2, Gazelle is finally in a place where I think it’s ready for people to tinker with. In 0.2 there is enough documentation to figure out what’s going on, and the command-line programs like gzlparse have reasonable –help messages and can do useful things. Starting with Gazelle 0.2, your problems are my problems: things that don’t work right should either be fixed or be written down as TODO items.

    Gazelle 0.1 to 0.2 was a major overhaul — implementing LL(k) lookahead took major surgery. Half the time the code didn’t actually work, because major rewrites were only partially done. I expect all that to change with Gazelle 0.2 — I want future releases to be far more incremental, and for every commit to leave the repository in a working state. I want a 0.2.1 and 0.2.2 that fix a lot of the edge cases that still aren’t right in 0.2 (you can read more about these shortcomings in the “Tour” section of the manual or in the TODO).

    There are still many major features to add to Gazelle in the future — you can see a list in the TODO. But again, I think these can be added without breaking the tree in the meantime.

    There’s one major bummer about 0.2. I had to completely remove the “@ignore” feature, which was Gazelle’s answer to letting you ignore whitespace/comments without having a separate lexer. I removed it because I realized that the abstraction I had invented for expressing this concept was not quite right, and that a more general-purpose abstraction was the right answer — the abstraction I have in mind will also handle things like languages embedded in other languages (like Ruby inside HTML: RHTML). But the bummer is that for the moment, Gazelle has no answer for how to ignore whitespace/comments. So it’s clearly not useful for real work yet.

    So try it out and send any feedback you have to gazelle-users. Thanks!

Josh

  • Permalink for 'Josh/2008/06/15/habes___2008_06_15T14_58_00'

    habes @ 2008-06-15T14:58:00

    Posted: June 15th, 2008, 3:59pm MDT by Josh
    Not sure exactly how this happened, but it's strangely appropriate. There's been a lot of parsing techniques, compilers, optimization, and unicode in my life -- not so much getting things done.

Comments for Josh the Outspoken

  • Permalink for 'Comments_for_Josh_the_Outspoken/2008/05/27/Comment_on_Defending_RPC_by_josh'

    Comment on Defending RPC by josh

    Posted: May 27th, 2008, 1:10am MDT by josh

    Matt,

    You have wonderfully substantive arguments, as always! To address them fully I’ll need to write a full blog entry. There’s just one thing I want to answer at the moment:

    I wasn’t trying to claim that there isn’t software that implements RESTful services, proxying, caching, wire formats, etc.; of course there is. I just wanted Steve to pick an actual implementation of one that he thinks fairly represents the REST ideal, so we could compare apples to apples (one request/reply implementation to another) and make objective comparisons like:

    - what does “Hello, world!” look like?
    - how do you document and support your APIs?
    - how does the system handle various failure modes?
    - how much work does it take for others to interoperate with you?
    - how well can the system support adjusting the caching strategy?

    What I wanted to avoid is going down a path of discussion where I would say “REST doesn’t support scenario X very well” and hear a retort like “the REST stack could support a feature that does X perfectly.” I want to talk about reality and not Plato’s world of forms. I want to compare actual implementations and not architectural ideals.

    Anyway, I’m on the fence about whether I want to take the time to really thoroughly explain my point of view right now. On one hand, this has been on my mind for a while. On the other hand, this discussion has started to wear on me a bit and I really desperately want to get the next version of Gazelle out the door.

  • Permalink for 'Comments_for_Josh_the_Outspoken/2008/05/24/Comment_on_Defending_RPC_by_josh'

    Comment on Defending RPC by josh

    Posted: May 24th, 2008, 4:28am MDT by josh

    @Rajith and @Matt: to add to my last comment, consider the information Google has published about BigTable and Chubby, major systems on which much of Google is built. The APIs to this systems are based on RPC. Could they be shoehorned into REST? Of course, as XML has so painfully taught us, anything can be shoehorned into anything. But is it a good fit? The caching and proxying surrounding these systems is far too system-specific for any standard HTTP functionality to get right.

    If what you’ve got sitting around is a bunch of libraries that do HTTP, I can see why REST looks appealing. But when you’ve got a mind-blowingly good RPC system sitting around, it seems pretty obviously the best answer to your request/reply needs. Would it be nice to have it more widely available and standardized? Yes. But all else being equal, on purely technical merits, I don’t think HTTP and REST can stand up as a competitor to a good RPC system for programmatic request/reply needs.

  • Permalink for 'Comments_for_Josh_the_Outspoken/2008/05/24/Comment_on_Defending_RPC_by_josh'

    Comment on Defending RPC by josh

    Posted: May 24th, 2008, 4:13am MDT by josh

    @Rajith: I may have only been at this for four years professionally, but in that time I’ve been deep into the guts of some of Amazon and Google’s most scalable systems. I’m not making stuff up based on theories, I’m observing what the brilliant people around me actually do when they are engineering scalable software.

    Since nothing you write is a substantive critique of my blog entry, there’s not really anything more to say. BTW, I’m definitely not defending SOAP, CORBA, or EJB: if the existing open-source RPC systems were any good, companies wouldn’t be reinventing this stuff all over again. XML-RPC is the only one of the open-source systems that I can stomach.

    @Matt: I was pretty open-minded about REST for a while, but I’ve come to believe that it’s a lot of hot air. Standard data formats are better than proprietary ones obviously, but I just don’t buy the “tunnels, proxies, and caches” argument. Nor do I buy that complex systems are naturally modeled as resources. But yes, we should talk more about this another time!

2 Star

  • Permalink for '2_Star/2008/05/23/Defending_RPC'

    Defending RPC

    Posted: May 23rd, 2008, 7:09pm MDT by josh

    Steve Vinoski has come out very vocally against RPC in the last few days: see this blog entry and this mailing list post. The blog entry (which I read first) made him sound like someone who just hasn’t been around large systems much, but then I was surprised to see that he’s a senior fellow or architect or something along those lines at a company that does distributed systems.

    His blog entry basically makes fun of Cisco for inventing/releasing another RPC system. It’s not clear exactly what he thinks they should have done instead. What is strange about this criticism is that tons of technology companies have developed their own RPC system — Facebook and Cisco publicly, and other technology companies I am familiar with in a not-so-public way. Guess what: large commercial distributed systems are built largely on RPC. Is he arguing that all of the engineers at these companies simultaneously got the bad idea of investing in something they don’t need? If RPC is such a bad idea, then why is everybody doing it?

    “Everybody’s doing it” obviously isn’t a justification alone, but it definitely puts the onus on the person making the critique to show why it’s a bad idea. I got a better idea where he was coming from when I read the mailing list post. Here’s the heart of his argument:

    the fundamental problem is that RPC tries to make a distributed invocation look like a local one.This can’t work because the failure modes in distributed systems are quite different from those in local systems, so you find yourself having to introduce more and more infrastructure that tries to hide all the hard details and problems that lurk beneath. That’s how we got Apollo NCS and Sun RPC and DCE and CORBA and DSOM and DCOM and EJB and SOAP and JAX-RPC, to name a few off the top of my head, each better than what came before in some ways but worse in other ways, especially footprint and complexity. But it’s all for naught because no amount of infrastructure can ever hide those problems of distribution. Network partitions are real, timeouts are real, remote host and service crashes are real, the need for piecemeal system upgrade and handling version differences between systems is real, etc. The distributed systems programmer *must* deal with these and other issues because they affect different applications very differently; no amount of hiding or abstraction can make these problems disappear.

    Finally something we can agree on! Yes, on a network shit happens, and no sane RPC system will try to hide this from you.

    But then again, I don’t know of any RPC system that tries to hide this from you except possibly CORBA. Maybe there’s a horrible history here I don’t know about, but no RPC system I have ever encountered tries to hide from you the fact that on a network, shit happens.

    So what are his other criticisms?

    RPC systems in C++, Java, etc. also tend to introduce higher degrees of coupling than one would like in a distributed system. Typically you have some sort of IDL that’s used to generate stubs/proxies/skeletons — code that turns the local calls into remote ones, which nobody wants to write or maintain by hand. The IDL is often simple, but the generated code is usually not. That code is normally compiled into each app in the system. Change the IDL and you have to regenerate the code, recompile it, and then retest and redeploy your apps, and you typically have to do that atomically, either all apps or none, because versioning is not accounted for.

    Yay, we can agree again. RPC systems that make you do an “all at once” upgrade are a bad idea. But again, no RPC system I have encountered makes you do this. Does this mean that the RPC system guarantees for you that the old and new protocols are compatible? Of course not — you don’t want your framework to be some big “I know what’s best for you” mommy that does really expensive things to solve this problem, like loading both versions of your code at the same time. But any RPC framework worth its salt makes it possible to have different interface versions interoperate. Adding a new parameter? No problem, old servers simply won’t see it. Completely changing the semantics of your call? No problem — just give the new call a new name.

    Steve’s criticism amounts to “sucky RPC systems suck.” Yes Steve, yes they do. But a lot of the technology world is running on non-sucky RPC systems, and from time to time you get a glimpse of that when a company like Facebook or Cisco releases their internal RPC system to the outside world. Did Steve check to see if Cisco’s new RPC system is subject to any of his critiques? I haven’t, but I would suspect it isn’t.

Josh

  • Permalink for 'Josh/2008/05/16/Vocal_woes'

    Vocal woes

    Posted: May 16th, 2008, 10:29am MDT by Josh
    My voice isn't horrible right now, but it's definitely not great. My standards of "great" are higher now, because the day I got back from Iowa my voice felt way better than it ever had. It was amazing! It was so clean, so resonant, was much better in tune (from having much better control), and felt so good! I was determined to figure out what a week in Iowa had done to make it so much better. That evening, it still felt really good but not quite as much, and by the next day things were pretty much back to normal.

    I have to figure out what happened in Iowa!

    Whenever I wake up in the morning my voice and nose feel pretty dried out. At first I thought this was due to a lack of humidity, but I've been running a cool mist humidifier in my room for almost a year. I bought a hygrometer and it consistently tells me that I'm running 55-60% relative humidity, which is pretty high. I also run an air purifier constantly, so hopefully that means that I'm filtering out airborne allergens.

    I feel like the status quo for me is a less intense version of what happened two summers ago when I "lost my voice" for four months. The symptoms are similar, though less intense: my voice feels bulky and hard to control, I feel dried out in the morning, and hitting high notes is especially hard.

    It's time to take no prisoners and do absolutely everything "right" in an attempt to recreate my post-Iowa voice:
    • Drink water constantly. At least six glasses per day, every day
    • absolutely no eating within three hours of going to bed, to reduce the possibility of acid reflux. I can't decide about drinking -- I've heard it said no, but it seems like it would help me stay hydrated going into the most dehydrating part of my day (sleeping)
    • run the humidifier within a couple feet of my head (one doctor's website I saw recommended this)
    • use saline nasal spray before going to bed, to keep my sinuses lubricated
    • change all my bedding and wrap the mattress in plastic, in case there are allergens hanging out there. it recently occurred to me that I got that mattress from a cat owner, so there could totally be allergens in it that are irritating my sinuses.
    I have to recreate the day after Iowa!
  • Permalink for 'Josh/2008/04/30/Chanticleer'

    Chanticleer

    Posted: April 30th, 2008, 1:30am MDT by Josh
    I saw Chanticleer last night in Portland with E*beth. It was a really enjoyable concert, and an educational experience about how choir singing can me made accessible. And to top it off I got to talking with some of the singers afterwords, and they answered some burning questions I had about countertenor singing. But I'm getting ahead of myself.

    For people who don't know them, Chanticleer is one of the best-known choirs in America. Consisting of 12 men (they sing full SATB -- more about this later), they are one of the few American choirs that employs its singers full time, and the choir tours extensively nationally and internationally. Their repertoire is quite varied, ranging from early music to spirituals to pop to contemporary music by living composers. All that aside, they know how to please a crowd and I think this is a big element of their success.

    My first exposure to Chanticleer was when I was 12 years old and singing in the American Boychoir. In the spring of 1994 we went on a tour of the west coast and joined Chanticleer for a series of joint concerts in the bay area. Our joint concert in Grace Cathedral actually became two concerts in a single night when it was discovered that the cathedral had been oversold according to the fire marshall. But what truly made the experience memorable for me was singing for the first time much of the music that I would later focus my adult musical life around. It was my first time singing out of the Oxford Book of Tudor Anthems, it was my first time singing Herbert Howells, Thomas Weelkes, Robert White, and John Tavener. These concerts were a formative experience for me as a young singer. I still have my program from this concert, autographed by all the members of Chanticleer at the time.

    I risk diverging too far from the point of this post to mention it, but one other thing I'll never forget about those two Grace Cathedral concerts is that they are the precise night I first was able to use my falsetto. I remember straining to hit the high notes for the first concert of the night, but for the second concert I had this brand new voice that felt a little different but could hit those notes with ease. It seems somehow poetic that I first used my falsetto singing with a choir full of some of the country's best falsettists.

    But that concert was almost 15 years ago, and I hadn't heard Chanticleer since. My ear has heard many other things in the meantime -- lately it's been mostly early music choirs like The Tallis Scholars -- so I was really interested to hear how Chanticleer would sound to me now.

    What struck me most about the concert is how good they are at pleasing a crowd. I understand much better now how they manage to be popular enough to have all full-time singers: a feat other choirs would probably love to emulate. Their formula is a lot like the American Boychoir's: a rich, varied program, enough transitions and surprises that people have reason to keep paying attention, and topped of with some tunes people will recognize. They are very animated and engage the audience visually when they sing. They change formations for almost every piece. They charismatically talk to the audience every few songs. The audience was thoroughly won over by the end; it was one of the quickest standing ovations I'd ever seen.

    I don't want to paint them as an act of empty showmanship though; they were technically brilliant. Their ensemble was impeccable, their tuning nearly flawless, and their production at all times sounded so very easy. They had extremely good vocal control: one moment they were singing polyphony in straight tone, and the next they were singing juicy solos of popular tunes with lots of vibrato. Even very quiet entrances were very solid and perfectly together. Every voice of the ensemble could stand on its own as a soloist, and by the end nearly all of them had. And they were extremely well-rehearsed; every detail of every piece was deliberate and artistic.

    So at this point you may wonder if I have anything bad to say about Chanticleer, and if I'll now make it my life's goal to join them someday. As much as I enjoyed the concert, I find something about Chanticleer lacking, which prevents them from being my musical Mecca. What they seem to lack is an emotional intensity. This might sound unfair at first, because they where clearly emoting visually, and the concert had some emotional moments to be sure. But something that I can't quite put my finger on prevents Chanticleer from having the depth of emotion that I perceive from other choirs. It's almost that their production is too easy, their tone too light and airy for their phrases to feel real weight. It's a lot like listening to the countertenor Andreas Scholl, a famous and very successful soloist who nonetheless strikes me as a bit sucrose and flippant compared to someone like James Bowman.

    Speaking of countertenor singing, with Chanticleer I heard countertenors unlike anything I had heard before. I have sung with 5 or so countertenors in Seattle, and I have listened to recordings of many others, mainly in the context of early music. The countertenors I have heard have been almost exclusively in the alto range: A below middle C to soprano E or F. Not these guys. These guys laugh at your measly E or F. They blow right past it on their way up to As, Bs, and I-swear-I-heard-a-C above the treble clef. And it didn't sound hard or strained in the least -- it sounded easy.

    I really wanted to chat with a few of their countertenors afterwords and get some answers to some burning questions I'd had recently. I managed to start conversations with Dylan Hostetter and Michael Match who both turned out to be super nice guys. I asked them both if they considered it important that a potential voice teacher be a countertenor specialist, and both of them replied with a resounding "no," which was great to hear. I was also inspired by Michael's story, which was eerily similar to mine: in college he was a frustrated tenor, then someone suggested he try countertenor, and suddenly all the things his voice teacher had been trying to get him to do made sense. At that point he said he discovered "where his voice belonged."

    So all in all, it was a very enjoyable and educational experience. And while I have a great deal of respect for the group, I don't think it will be my long-term ambition. Everything else aside, I don't think I'm built (mentally or physically) to perform the same program night after night, year-round. I remember that touring with the boychoir was sometimes tiresome. I'm happy with my current mix, though I will consider starting voice lessons.

2 Star

  • Permalink for '2_Star/2008/04/10/Gazelle_Grammar_Visualization'

    Gazelle Grammar Visualization

    Posted: April 10th, 2008, 10:53am MDT by josh

    I’ve been quiet about Gazelle news lately, but since I wrote last I’ve hit 3 of my 6 goals for Gazelle 0.2, and one that I hadn’t thought to include. To review those goals and see which ones I’ve completed:

    • complete Strong-LL(k) lookahead support. (it’s not 100% complete yet, but it’s definitely solid enough for a 0.2 release)
    • a command-line compiler program (gzlc) that takes reasonable options and is simple enough to use by reading its –help
    • a “tour” section for the manual
    • a command-line program (gzlparse) that can output the parse tree in a useful format, so you can see how Gazelle parses your input text.
    • a test suite, so that when people report bugs I can add the bugs to the test suite and not regress.
    • (stretch): make Gazelle self-hosting, so that the parser is more robust and easier to understand than the hand-written recursive descent parser I’m currently using. I don’t want people to have to deal with corner-case parser bugs.
    • a way to visualize grammars, to spot-check them against your expectations

    It’s the grammar visualization that I forgot to include. I mentioned parse tree visualization a few blog posts ago, but this is different — one is visualizing how a bunch of text got parsed, the other is visualizing the grammar itself.

    It still has room for improvement, but here is what my grammar visualization currently looks like for JSON. You can see an NFA for each one of your rules, a DFA for each state of lookahead, and the DFAs that do the lexing.

    The latest code from Git (note that I recently moved from repo.or.cz to Github) can generate these grammar dumps — just pass ‘-d’ to gzlc.

Comments for Josh the Outspoken

2 Star

  • Permalink for '2_Star/2008/04/09/The_future_of_automatic_memory_management'

    The future of automatic memory management

    Posted: April 9th, 2008, 8:44pm MDT by josh

    Observation #1: stop-the-world garbage collection is a thorn in the side of latency-sensitive applications.

    Observation #2: we will very soon have more cores than we know what to do with.

    Prediction: fully concurrent garbage collection is the future of automatic memory management. I’m talking garbage collectors that run in other threads and clean up after me without ever stopping me in the middle of what I’m doing.

    It will almost certainly be more expensive in terms of total CPU time, and probably can’t be as aggressive in terms of what it can reclaim at any point in time, but for most applications the latency guarantees will far outweigh.

    Discuss.

Comments for Josh the Outspoken