Apparently it could be worse.
105363 items (97572 unread) in 19 feeds
Friends
(1021 unread)
Build
(68091 unread)
Heads
(716 unread)
News
(27537 unread)
fun
(207 unread)
Apparently it could be worse.
Hi Josh, you should have a look at the generalized top-down parsing algorithms. These algorithms handle any grammar except those with left recursion. Some of them also include additional hacks for dealing with left recursion. Algorithms/technologies to look at: GRDP, Unger’s Method, Combinatorial Parsing, Packrat Parsing, DCGs and TXL, Not all of these are practical, but since they are general (barring left recursion) and LL(*) is not, any tool that implements these parsing methods must be more powerful than Gazelle.
“So although my result isn’t as theoretically significant as I thought, I think that from a practical standpoint I’m still in good shape: I have a terminating algorithm that can handle a very useful set of grammars.”
If you’re after theoretically significant, all you need to do is come up with a name (”LL(haberman)”) and ideally some proofs or properties for the set of grammars your new algorithm does recognize.
@maetl: thanks! It’s funny, when I started writing Gazelle, I believed pretty strongly in that mantra (”parsing and grammars are amongst the most studied and well understood areas of computer science”). I didn’t expect to invent any new algorithms, I mainly just wanted to take existing algorithms and give them different packaging.
But before long I was trying to figure out if I could improve on the properties of those algorithms that made them less that ideal from a user perspective. I studied the documentation from existing tools and looked at the cases they said they couldn’t handle, and tried to figure out if I could handle them. And before I knew it, I had thought of a new algorithm!
@Kay: thanks! I agree that having the tool be able to tell you definitively that it is not LL(k) or LL(*) will make things much more user-friendly. I could still use a bit of work on error messages that give you every bit of information about *why* it was not LL(k) or LL(*). In the future I should be able to draw pretty graphs that make the problem obvious. That will take a bit of work, but it’s totally doable, and I think will make for a compelling user experience.
That’s great news!
I just love “tell you definitively when a grammar is not LL(k) or LL(*) (others make you tweak search depth constants)” which can be a hassle currently, especially if the cause is not immediately obvious.
All power to innovators! ![]()
Well done Josh…
Great to see that you’re not content to let the standard dogma that “parsing and grammars are amongst the most studied and well understood areas of computer science” get in the way of making new implementations in this field…
thats hillarious! It seems that it is a great thing that you guys all were able to get together.
You’re on StumbleUpon now…
Wow! I was on the list, but I didn’t realize until the blog posts came how many people I knew (or knew of) were on there too. For example, this is my second random encounter with Andres Conbere, who I met once on the bus when he spotted me using my XO. And I had just recently started reading the Enfranchised Mind blog by fellow 416er Robert Fischer.
I was on this failbot’s list too. As I was hitting reply all this morning to send him a “nice” message, I then saw the flood of emails from like minded developers.
Hilarious.
Another item on the long list of why I hate recruiters.
Thanks for blogging it. BTW, your ladybugs are hawt. Clicking them made me feel almost as dirty as hitting reply all.
truly epic. at first it read like a WoW forum thread but as people began to realize that it was mostly other coders they began to grasp the potential. Law of unintended consequences ftw!
Whoo, I was one of the elite, what, bazillion on that email list? I just joined the Google Group.
Actually, this was especially amusing for me as I just put a message out I was looking for a new gig.
I didn’t expect that big a response ![]()
Hadn’t other folks been spammed by prodigus tech before? I’ve gotten a couple of other emails like this, but never had the pleasure of meeting my compatriots.
[…] morning, my inbox melted into a thin runny gruel due to the amazing incompetence of Prodigus Tech. Also, to feed google a bit, Max Archie is a […]
Agreed - most fun I’ve had from a cascading spam.
My life was on a downward spiral. Then Pradipta’s Rolodex came along and changed everything! In just hours, I had 416 friends and millions in cash. Thanks, Pradipta!
Thank you so much for capturing the surprise and joy that this little gem was ![]()
Haven’t had this much fun online since the World Series playoff game when the IRC channel created the first “Screwing your team out of world series berth? Priceless” photo of that kid interfering with the last out for the Cubs, costing them the playoffs.
I was one of the 416 on that posting… It was pretty fun. When you don’t live in one of the big tech cities, it’s easy to feel isolated, not going to conferences, not having killer user groups, not having any clueful employers….
Then all of a sudden, THIS happens!
Just goes to show that even incompetence can be helpful sometimes!
See you on Pradipta"s Rolodex!
One wit among the recipients has created a Google Group called “Pradipta’s Rolodex,” to which all spammed are invited.
Yeah, this is pretty damn hilarious. Best spam I’ve had in a while. ![]()
I, too, was on the Pra-dex list, and what a cool thing to see! I’m glad I decided to check my email when I did.
I made this URL the homepage of the Google Group; hope that’s cool.
Haha. It was fun. We have a Google Group now. ![]()
This is awesome. Don’t miss the google group!
Hello there, fellow PR member! This is amazingly entertaining. I should have been asleep an hour ago! Can’t wait to see where it goes.
@Matthias,
Maybe I wasn’t clear enough: my 100 lines of C does not use any protobuf library. It actually implements the decoding of a protobuf from scratch, and has no external dependencies except for the standard C library.
This should be much shorter. Create a new FileDescriptorProto with one empty DescriptorProto. From that create a FileDescriptor, lookup the “empty message” descriptor, create a DynamicMessage from that and parse the input. The unknown fieldset contains the fields.
Actually, in Java you can directly parse an UnknownFieldSet.
Hi Don,
I’m not sure that I agree that languages like Haskell are the best for compiler writing. But then again, my project (which I intend to be very very good at it) is still partially vapor, so I can’t make any grandiose claims about being better than Haskell in this problem domain. Yet.
But for your second point, you seem to be missing the N^2 vs. N+N argument I was trying to make. Sure, of course you can write bindings. I’ll ignore for a moment the practical matters of what it means to embed Haskell into your process and just assume that bindings are a practical and compelling solution.
Even if I grant you that, writing bindings for each language pair that you want to parse from/to is madness! It takes N^2 work. If you do things my way, you have to write:
- a parser for C
- a parser for Haskell
- a parser for JavaScript
- Gazelle bindings for C
- Gazelle bindings for Haskell
- Gazelle bindings for JavaScript
At that point, I can parse Haskell from JavaScript, or C from Haskell, or any of the N^2 combinations, but it only took me N+N work. If I do things your way, to get the same functionality, I have to write:
- a parser for C
- a parser for Haskell
- a parser for JavaScript
- C bindings for the C parser
- C bindings for the Haskell parser
- C bindings for the JavaScript parser
- Haskell bindings for the C parser
- Haskell bindings for the Haskell parser
- Haskell bindings for the JavaScript parser
- JavaScript bindings for the C parser
- JavaScript bindings for the Haskell parser
- JavaScript bindings for the JavaScript parser.
Bindings aren’t trivial. So at the end of the day, since no one is actually going to do the N^2 work, the Language.C person will say “sure, you can use my library, all you have to do is write bindings!” and the person will decide that bindings are too much work for the return you get (because they are), or they’ll be partially written and sporadically maintained bindings that aren’t useful to anyone else, and we’ll keep having the fragmented parser landscape we have today.
Or I’ll finish Gazelle and that pain will be a thing of the past.
Of course, Language.C is done in Haskell because strongly typed,
functional languages with algebraic data types and pattern matching are
particularly well suited to the task of language processing. They’re are
a sweet point for compiler writing.
That said, Haskell’s perfectly interoperable with C, or Python, or
anything that understands the C calling convention, so you could just
bind to Language.C from other languages (much as others bind to CIL, yet
another C parser/analyser, this time in OCaml).