Tim Sweeney (Epic) says that type inference doesn’t scale (to many types) in reference to Haskell on the last slide:
[tinyurl.com]
Still, it seems like a nice idea for static typing.
105363 items (97572 unread) in 19 feeds
Friends
(1021 unread)
Build
(68091 unread)
Heads
(716 unread)
News
(27537 unread)
fun
(207 unread)
Tim Sweeney (Epic) says that type inference doesn’t scale (to many types) in reference to Haskell on the last slide:
[tinyurl.com]
Still, it seems like a nice idea for static typing.
[...] my first reaction be annoyance. After all, I’ve been using the “Gazelle” name for over a year. For the same reason I was a bit annoyed when a BitTorrent tracker named Gazelle was released last [...]
Hi I agree totally about the original comments of lack of ruby documentation/specification/reference document. It has been so frustrating to learn this language.
when you go the ruby doc site, to download the core ref, you have to literally look around to find a downloadable ref doc, for offline perusal. Yes, there are still millions of people around the world, who don’t have unlimited broadband connections to the internet !!!
compare that with the other languages like PHP , etc..
there’s always that extra tedious steps you have to go through to find more about the language.
It really feels amateurish in terms of community or “existing infrastructure” when compared with the other “open source” languages.
My belief is that without a drastic change in the culture and philosophy of the ruby community, ruby will become a marginal language some years down the road.
some time back,I did post some comments at the following site :-
have moved to PHP , found it is a lot easier to do what I want to do with it and also easier to find information about the language. which equates with less time wasted on the non productive part and more time focusing on my goals.
I can appreciate your quest for a language that lies somewhere between lua and java. I was on a similar quest few years ago trying to embed a scripting language inside my app server, after much research I found lua to be the easiest to embed at a cost to programming style, however after using lua for a bit as an extension (an embedded script language) I found it to be just fine, since the function extensions I wrote almost forced structure since the data was already defined in C and merely exposed via lua. Since then I have grown to really like lua (but not as a standalone language).
Java on the other hand, while not too difficult to embed, consumed almost 3x the entire memory footprint while executing a simple HelloWorld class. I come from an 8-bit world of 64KB with 8KB used for video, that kind of bloat offends me (not to mention WebSphere using almost 1GB of ram just to start empty).
Java had so much potential in the 90s (when it was introduced), but when all those large corps started weighing in with their JSRs and ‘enhancements’ it quickly became the poster child for ‘designed by committee’…bloatware.
Anyhow, good read, glad there are other people out there who take memory usage seriously.
You should definitely check out Haiku, an open source modern reimplementation of BeOS:
Or check it out on youtube:
If you liked BeOS, I am sure you will like Haiku too.
Hmm, how about Javascript? [Ducking...]
I wonder how much of your “static lua” could be implemented as a simple compiler or preprocessor that outputs Lua, sort of like Objective-J to JavaScript:
[cappuccino.org]
It wouldn’t help with optimization, but you could at least get some added compile-time safety without sacrificing run-time performance.
I second this. Explicit type declaration should be optional, not mandatory. Type inference isn’t exactly new technology.
I’d like to see more languages outside of the “purely” functional space (Haskell, OCaml) using type inference. Typed, but no types to type. Boo and hAxe are the only imperative/oop language I know of that do this.
Thanks for chiming in Matt — actually part of the reason I wrote this entry was in hopes that you’d offer your insight about it. I think that with a combination of the strategies you mention as well as just a tiny bit of metaprogramming that only lets you assign to a predefined list of keys, I can get enough of what I need that I’ll be satisfied that the compiler can grow but stay understandable. Or maybe I’m just telling myself that because I have no other real choice at this point.
A statically-typed language that lets you load run code on the fly as easily as today’s scripting languages seems totally possible, especially now that LLVM is getting so good. I think there’d be a market for this, especially if it was paired with a dynamic language that worked together with it (a la Java and Groovy, but without the enormous and high-overhead virtual machine).
I haven’t worked in Lua, but I’m writing hundreds of lines a week in JavaScript, which is essentially another “table-based” language.
For avoiding the “prediction[1]” problem, I think the best discipline is: never use integer-indexed arrays in place of structs or tuples. You can use a same table, but always use meaningful names as keys:
prediction = {edge=predicted_edge, dest=predicted_dest_state}
If you start losing track of which keys are available for which object, simulating classical OOP can be useful. I see you’re already doing this in gzlc for some objects, but not for smaller structs like “prediction” above. It looks like Lua also supports Lisp-style closures-as-data-hiding, which I often use for small bits of data in JavaScript where setting up a “real” class or protytype isn’t worth it.
One more thing I’ve done is use JSDoc (in your case, LuaDoc) to document function and variable types. It gives some of the readability benefits of static typing, but of course with no real guarantees since it’s not enforced by the compiler.
I’m definitely with you on static typing, even though I’m a long-time JS/Ruby/Python user. Especially after playing with Haskell recently, I’m pretty unsatisfied by the lack of static analysis in those languages.
@Antonio Cangiano: The benchmarks you reference are quite old. For a recent comparison between Ruby VMs, take a look at the shootout I ran in December.
The benchmarks josh referenced are 6 months old.
For the recent (31 Jan 2009) comparison take a look at these Ruby 1.9
For a comprehensive comparison of Ruby implementations take a look at the Ruby shootout.
@thedarky: I would much rather see extensive Japanese documentation of Ruby, with volunteer translations in English, than no docs at all. Sure, documentation in any language is time-consuming, but it should be part of a “professional” release.
If the core Ruby dev team’s problem really is lack of English fluency (and I have no reason to doubt you), then I would prefer they just wrote their docs in Japanese.
I would have stuck with Python were it not for the hideous syntax… I mean, __len__? Really? :’s? Really? I just can’t stomach it, I feel like I’m writing BASIC or something.
Ruby is like grown up Python with a few issues to work through. It needs some work on the VM, primarily. Documentation I don’t find so useful, never really find the need to refer to it.
There’s a bit of equivocation here, unintentional I’m sure.
As other commenters have more or less pointed out, there are two encoding steps in modern character representation systems: the mapping from linguistic symbols to integers (Unicode “code-points”), and the mapping from integers to bitstrings. Let’s call these “logical encoding” and “physical encoding” respectively.
In our increasingly Unicode-standardized world, logical encoding is rarely a question in most new software projects, while physical encoding frequently is. So I don’t blame your commenters for being confused, especially when your two examples included one related only to logical encoding (Han unification) and one related only to physical encoding (space efficiency of Asian scripts).
The difference between Ruby and Python seems to be that, while both support various physical encodings for I/O, Python 3’s string objects are always represented in memory as sequences of Unicode code-points, while Ruby 1.9 can use any supported physical encoding as the in-memory representation. Theoretically this would allow Ruby to support encoding schemes that are not Unicode-compatible, as suggested above. I’m not sure any have been implemented in practice, though. In my Ruby 1.9 installation, all 81 installed encodings are compatible (using Encoding.compatible?).
Regardless of the file encoding, “\uXXXX” in a Ruby 1.9 string literal refers to a Unicode code-point. So the Unicode logical encoding scheme does still have a special place in Ruby’s heart.
You may be interested to know that there *is* an official, comprehensive documentation project for Ruby in Japanese.
Project Home: [doc.loveruby.net]
Usable mirror: [doc.okkez.net]
1.8.x should be fully covered, though 1.9.1 is a work in progress. The project doesn’t yet have sufficient resources to do a ja -> en translation.
UTF-8 is the 8-bit Unicode Transformation Format which is a variable length character encoding.
My favorite character is a ẽ. That is a small e with a tilde over it. Notice that it takes 3 characters in a row to render it.
0xE1 0xBA 0xBD
I really don’t care much for Unicode, but it isn’t going away so I guess I will just have to embrace it.
Curious: if I have two strings: one is UTF-8 and one is Shift-JIS string. I ask Ruby if they match.
Which string is re-encoded? It matters for performance reasons.
What if I do this matching over and over again in a loop with one of the strings varying. Does the unvarying one get re-encoded over and over again?
Is it consistent across operators which string will be re-encoded? Or should I just encode everything to a neutral encoding “to be safe”?
Furthermore, let’s say that a new character is standardized in a Japanese-only standard and also in Unicode and in (let’s say) two more character sets.
How can I inform Ruby that they are really the same character? Do I have to modify the source code of four codec classes? And do I need to define 12 mappings?
It seems that the Unicode standard uses the word “encoding” to mean something different than Ruby and Python (and XML and Java and …) do:
Unicode: “Encoded Character. An association (or mapping) between an abstract character and a code point.”
XML: “The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode”
I’m not personally aware of how the language drifted apart between the Unicode standard and the others. Seems odd.
To determine whether Ruby _really_ supports non-Unicode “encodings” (according to the Unicode definition) better than Python does, I’d need to know what Ruby’s definition of “character” is. i.e. how are characters named, how is equivalence defined, and what properties are available on character objects?
In Python, Java, Javascript, C#, etc. two character objects are equivalent if they have the same code point. Given the code point, you can infer the character’s name, classification and other properties based on the Unicode database
Can somebody please explain Ruby’s definition of “character”?
@Josh: when I build ruby-1.9.1-p0, I get only three encodings:
ASCII-8BIT
UTF-8
US-ASCII
I would have expected at least the Unicode encodings to be built by default but it seems not.
With respect to programmer clarity. It is now the case that a Ruby programmer must know the details of Ruby’s implementation of a particular encoding to predict how their program will perform. In some cases, operations will be O(1) and in others O(N), depending on how clever the encoding implementer is. I know from experience that this is the kind of thing that Guido avoids like the plague — Python explicitly rejected some features of its unofficial predecessor language (ABC) because of some issues like that.
Whether or not you agree with his position, I think it is fair to say that the Ruby way is not clearly better. We’re talking about spending MULTIPLE MINUTES to process less than 100K of data. A programmer could EASILY embed that in a program and the first time it happens to load a non-ASCII file, the performance would grind to a halt. Like “tie up the Web server for an eternity” grind to a halt.
With respect to “lazy decoding”. Decoding is seldom a bottleneck in a data processing program. The data usually has to come from somewhere before it is decoded and often has to go somewhere else after it is re-encoded.
In any case, I would not dispute (and have never disputed) that Ruby’s strategy is superior in some situations. I merely dispute that it is “better” in some platonic sense. It’s possible that a year from now we might have enough information to select one as “better in most circumstances”. Or, perhaps it will never be clear.
BTW: your blog’s “preview” does not seem to actually preview. The post looks different in the preview and in the blog.
[...] Josh Haberman, I was excited to see the changelog for Ruby 1.9, but immediately disappointed by its vagueness and [...]
I have dabbled with both Ruby and Python, and personally I really cannot see what Ruby has over Python
the languages offer similar features (dynamic types, repl, first class functions and closures, OOP…) but …
Python is faster, has a great documentation and a huge amount of libraries
The benchmarks you reference are quite old. For a recent comparison between Ruby VMs, take a look at the shootout I ran in December.
take a look at:
[eigenclass.org] in Ruby 1.9&key=Ruby
Very fascinating Carl, this corroborates with my real world experience and explains a lot of things for me. Also thanks Josh, I didn’t realize that Unicode includes both charset and encodings. It was a bit fuzzy in my mind.
I left Ruby and Rails for Python and Django about six months ago. I loved Ruby syntax but I had problems stemming from lack of documentation, weird defaults, and not so great libraries. I had problems getting things like timezones to work properly — stuff that I did in PHP with no problems.
I found that Python was made for human beings. I think it’s because of Guido’s emphasis on software engineering and making tools for programmers. Python is not difficult to understand, it’s very consistent and does what I think it will do. The quality of its libraries and documentation are very high. I’m not a Python fanboy or anything; I’d ditch it as soon as it started to suck. I’m not a genius programmer. I use Python because it works for me.
When I have time I work on a basic site in Japanese (http://momhawaii.org) and I have no problems with strange characters anywhere. UTF8 might suck at presenting the Japanese language (I’ve had problems before) but with Python and Django I get the characters that I put in. It doesn’t seem to lose any bits of data.
Use whatever works for you. Everything has its pros and cons.
FYI Django Setup
Make sure your templates are in shift-jis encoding in your text editor. (I use Komodo Edit) Add
FILE_CHARSET=’shift_jis’ and DEFAULT_CHARSET=’shift_jis’ to settings.py.
HTML META Tag
I don’t see why writing docs in Japanese is a problem - floss has a huge volunteer translation effort, although mostly in English->other languages, but the reverse does happen.
I think the encoding of a string is the wrong flag to store. You should be storing the language of the string instead, as that’s what matters. Conflating the encoding to mean the language is the problem here.
I’ll note the top two results for [han unification ruby] both reject the notion: [blade.nagaokaut.ac.jp] [www.orbeon.com]
You seem to think that nine years of development is too long, Josh. Tell us, how long should have Parrot Perl 6 specification Rakudo have taken? On what do you base your estimates? How exactly is Perl 6 “too big and too complicated”? What is complicated? What should be removed to reduce complexity?