Wednesday, June 29, 2011

Z2: An introduction

This is going to be a fairly long off topic post about programming languages research and the introduction post for my new series. I understand that people come to this blog to read about DwarvesH, so I will be taking two measures to keep this new series from hogging the spotlight. First, I’ll prefix all posts from the series with “Z2” so people that are not interested can easily skip these posts. This series will be of very little interest to non programmers. Second, development for this research project will have several constrains, the most important one will be related to the length of the programming sessions. When I fell like it, I will enter a heightened mental mode and do a 60/120 minutes long frenzied programming session. On the clock. When the time runs out I’ll stop and I will only do this at most once a week. So plenty of time left for DwarvesH! Damn it, I really need a new name for the game ASAP!

But why programming languages research? The design of programming languages has been my passion for ages, long before I picked up the mantle of the dwarf. I had a previous attempt of fixing programming languages, more precisely C/C++ that I have abandoned for good reasons a few years ago. That attempt was called Z, and I am reviving it as a research project and as a discussion and meditation on the art of programming language design.

Why now? Well it all started a few months ago, when I read that some company has revealed its “Java killer” after two years of development. I don’t remember the name of the company or language and I am not going to Google it since it is better if I don’t name them. Most of the ideas were good. God knows Java is far from perfect. Some ideas were really bad and in my opinion very narrow minded. “Let’s do some drastic change that will probably not affect our work flow but would force this upon other people who might be profoundly affected by this” sums this up very well. But the conclusion is that their language is a nice, but niche step forward. But not after two years of development. And there is no way you can label this attempt as a “Java killer” after two years of development. Under a different label and shorter development time I would welcome their language. But under these conditions, all I can say is: way to completely miss the point guys! Have you ever used Java? Do you understand the principles behind the problematic areas and why they should be fixed? Because it seems like you do not.

That is all I am going to say about that. I did not mean to insult their effort and apologize for any parts of the above that may seem insulting. I just disagree. Probably I could have been more polite in expressing this. And I would find it extremely amusing, yet sad at the same time, if during my efforts I would prove that I do not understand either what is wrong with my language of choice and lack any kind of greater insight. Anyway, after reading their presentations, I started thinking about this series. I did not start it before because I was afraid of taking time away from DwarvesH. But by keeping the discussion largely theoretical and strict development times this won’t happen.

But let us go back in time even more. DOODLY DOODLY DOODLY DOOP! My first programming language was Visual Basic. I started learning Visual Basic (VB from now on) one or two years before my formal training in CS started in high school. BASIC has a very poor reputation and it has been claimed that it “ruined generations”. I can’t say anything about that because I was not there during that period, but I can assure you: VB has nothing to do with that unstructured programming mess that the first versions of BASIC were. VB was a decent language and VB 4 was a great RAD environment, but in retrospect I do find it very verbose and the standard non GUI library lacking. This was way before VB .Net. Back in the day VB and Delphi were IMHO the only true RADs out there, with Microsoft Visual C++ claiming to be that but failing. Sure, it may have been RAD when compared to Windows API, but VB and Delphi were in a league of their own. And MFC has the honor of being the worst API I have ever worked with.

Then my formal training started with Turbo Pascal (TP from now on). A good didactic language, with Borland Pascal actually being a good choice for development with its advanced tools and build modes that were lacking from TP. While they were teaching us the basics of programming, I was experimenting with DOS graphical drivers, VESA modes and GUI. Especially Mode X. There was this great compilation of Pascal sources called SWAG. It contained very advanced stuff that I had troubles understanding at first but it was very fun. I was forced to start learning assembly to truly benefit from all the interesting parts from SWAG. Became obsessed with writing the fastest “putPixel” procedure out there. Everybody should read “The Art of Assembly Language” by Randall Hyde. Even if you are not interested in assembly, there are a few very important and very well handled chapters on data representation. Maybe even give HLA a try, but people experienced with assembly will hate the inverted operand order. I know that it is actually not inverted and what is considered normal can be seen inverted (see binary code representation on x86), but still. OK, side not finished. Now where was I? Ah yes! I even experimented a little with C before they started teaching it at school.

I kept playing around with Pascal and adopted a GUI convention from a Pascal library which I think was called R3. The names are starting to get fuzzy. This library colored all my future GUI API design and was quite compatible in principles with Delphi GUI. I ported R3 to C++ and was doing a lot of STD/STL C++ and a lot more Delphi. Lol, he said STD.

Delphi became my primary development language, especially since it had those very powerful “enterprise” components and also several third party online component repositories. The language is not extremely expressive, but unlike C/C++, it had very few bad design decisions and very little to object about. A lot of nitpicking for sure. And please, do not even try and link me “Why Pascal is Not My Favorite Programming Language” by Brian W. Kernighan. That text is from 1981 and is so outdated that basically every single point he made can be rebutted by Delphi, which is actually not Pascal, but a dialect of Object Pascal, Object Pascal being an OOP dialect of Pascal. So a dialect of a dialect. It may have been true when it was written, but after so many years, that paper is more outdated and irrelevant to the state of Delphi than a paper entitled “Why America will never have a black president” written by the fourth future black president of America. Sure, Delphi is not perfect and it has been having a hard time in past few years, switching owners and all, but still I never understood why it did not become very popular.

On the other hand, I stopped using Delphi! I have gotten bitten by a very nasty bug that caused a very serious sickness: Linux! Delphi was not cross platform! Kylix was a mess with its Wine dependency, difficulty to install and the fact that it was abandoned by Borland, making increasingly harder to install and run as the Linux platform was moving ahead at a brisk pace. C++ was portable, but lacked a viable GUI library and the STL was as pleasant to use as a bear for dental flossing purposes. I could write a book on the “failure” of STL and I will probably write a post sometime. But it is not awful and not really a failure. Just very unpleasant. Please use it and do not roll your own general use container library. I’ll go mental if I see another custom list implementation that is inferior to STL. Every open source library seems to roll its own. And ranges fix most of the problems with iterators (except one that I do care about), so check out Andrei Alexandrescu’s papers on ranges. I did not follow the recent development of ranges, so please please please do not let it be a dead idea. With the underwhelming new C++0x standard, all that is missing is that ranges get abandoned for some nonobjective reason.

So without a main programming language, I was penduling between C++, Delphi, Java and other on a case by case basis.

So in my free time I started to work on Z, symbolically named as the language that would end all other languages. Naïve, I know, but still a great programming and design exercise. But not long before it would reach a critical feature set to take off, I discovered something on the Internet: the D language. I was awestruck and spent that day and late into the night to around 5 o’clock reading the design documents and the forums. Most of the time was spent nodding my head in approval, as around 80% of D’s design was identical to Z. Sometimes even the syntax and semantics were 100% the same. Now Walter Bright is a person who truly understand the problems with C++, an understanding derived (I wager) from experiencing its faults first hand and reacting to them, the same way I reacted. This is why our designs were so similar. But D was light years ahead of Z and had a lot more ideas. It did use garbage collection, a feature Z did not have and one that I was not particularly fond off, but still totally worth it. I have this old saying: “The world does not need yet another programming language”. So a few days after the discovery I killed Z and started using D. I did not regret my time spent on Z or its untimely death. This was way before D 1.0. Meanwhile I stopped using D. The long awaited version 1.0 seemed lackluster, version 2.0 took the language on the path of concurrency, a path I am not that interested about personally (and I use concurrency basically every day professionally) and most importantly, getting D to use other C and especially C++ libraries was extremely difficult. The main reason for this is that the object files were not compatible, so you needed to recompile those libraries with Digital Mars C++, probably a good compiler, but one that constantly chocked on sources that GCC and MSC had no problems with. And Digital Mars C++ was not actively developed.

So I was out of a main language again, even before I have gotten to use D enough to call it “main”. So I did the only reasonable thing that would allow me to find a cross platform language with great expressive power, great compatibility with other libraries and a fully featured GUI library: I tried them all. The big ones anyway. I think I went through 20-30 GUI libraries/language combos, researching their pros and cons. Another requirement was the language/library combo needed to have support and a fair user base, so I unfortunately had to discard a few very promising candidates that were apparently used only by a handful of people. I am not going to list all candidates and my conclusions now, maybe someday. In the end I settled for C# and .Net for when a virtual machine is desired and Ruby just to play around and do some light webdev for my secondary languages. C++ with Qt was the strongest candidate (even with the unfortunate use of MOC) for GUI work, but in the end I settled on U++. It is more modular and easier to tweak, while still having immense power. So in the end I still stuck by C++. But C++ in conjunction with U++ is a very clean C++, one that could be considered almost as clean syntax wise as a Java with operator overloading (and slight operator overloading abuse).

When I am saying main programming language (and library), I mean it as the language that I use for all my personal projects and the one I find that has the best principles put into practice. It is all about design, expressive power and most importantly elegance. If the language is not elegant or is not capable of providing an elegant subset it is wasted potential. This is why C++ can cover these requirements. It may be a horrible and ugly language in general, but one can use a subset from it, have a strong and consistent programming style and if you squint your eyes a little, it may even seem the perfect programming language (alcohol helps during the squinting process to facilitate the creation of this illusion).

How egotistical of me, filling three pages about what an experienced and great programmer I am. And I assure I censored myself and only included the most important key events. This is not an autobiography after all. I just felt like giving some background. Sharing is caring. Or scaring.

 

“The world does not need yet another programming language” 

So why am I doing a programming language? Is this not hypocritical? Well, no! I am not doing a programming language as much as researching and having theoretical discussions not as much about the failings of some specific construct in some languages, but more about the process of finding such failings, determining why they are failings and solving them while keeping the solution general. Any coding and actual language implementation is a simple exercise in grounding the theory a little in reality. And while the principles of this theoretical language, Z reborn, called Z2 (I know, I am not very good at picking names) may not always express this, the central idea is that C is a horrible language not fit off any practical purpose and its seed has corrupted C++. There are two good reasons to use C: you need very high performance low impact real time code very close to the machine or you are maintaining a C code base.

So the first principle I have already named above.

The second is also the second constrain on development. The code base will have limits put upon its maximum size. For starters it is going to be maximum 3000 lines of code. I will not allow it to go over only when some huge new feature comes. Then I’ll increase it to 5000. Some future increases after these are possible, but not too many. The source will always be short, very lightweight, easy to understand, didactic and very hackable. The Squirrel scripting language was at one point a fully featured language with only about 9000 lines of code. I don’t know how many lines a more recent version has, but I doubt it has blown up exponentially. Needless to say, any standard library code for the language will not be counted for this purpose.

All code will also be open sourced eventually. Every few months when I have time I’ll drop the code somewhere. And maybe I can do some binary releases more regularly, but DwarvesH has the top priority.

The language will be influenced by scripting languages. These languages have made a deep impact in the programming languages ecosystem and ignoring their influence would be silly. So even thought at its heart Z2 will be a very C++ styled language, its syntax will be more Ruby/Python/others inspired. But this does not matter. A lot of near religious buzz is generated by programmers around some conventions that are not important. One of the goals of Z2 will be to point a specific finger from its hand in the general direction of this nonsense. So no “indentation wars” or other minor yet blown up controversies will be given any attention. Indentation wars are particularly silly and a pet peeve of mine. Are we programmers or mice? No self respecting and professional programmers should ever have troubles following any kind of indentation from his favorite style to a style where everything is on the same line and never should he or she become profoundly disturbed by such styles or any kind of conventions. Especially not at the levels some people stick to their styles, turning them into crusades against everybody else. I can read any kind of indentation. I am not equally comfortable with all styles, but after a day or two I get over it and reach full performance. And I think no programmer with enough experience in the field has any excuse not to be fluent with all styles, or at least not become angry about them. My personal pet peeve it prefixing member variable with “m_”. I hate this convention and I think that proponents of it are completely missing the point of OOP. They are in one universe and the point is in a parallel universe, with the only thing making these two universes parallel is the fact they exist at the same time, not that they have even the slightest thing in common. Yet I do not get angry when seeing this convention. I avoid it though like the plague J.

So when there are two takes on the same idea, and one cannot objectively be declared a winner, Z2 will intentionally do both or neither, depending on case, in order to make o a point.

I will talk about one last topic and end this extremely long post right here, even though I did not systematically list all the principles of Z2. The compiler will take Z2 code and output C/C++ code. This will make it very portable and keep the source code small. This solution is not optimal, but perfect for a toy language. And yes, Z2 is a toy language. I will not be using it for real development and will not port DwarvesH to Z2. It is just the practical result of the theoretic talks related to programming languages and my two cents related to fixing C. An ideal solution would be to use LLVM. I have very high hopes for LLVM and I hope that one day it will be the messiah that will unite all languages the way JVM or CLI could never do. Muad'dib! Muad'dib! Muad'dib! But we are not there yet.

1 comment: