Thursday, July 21, 2011

Z2 – 02 – How to start? (part 2)

In order to break the trend of huge Z2 posts I’ll dive right into it, going for the benchmark explanation and skipping most theory and other things I wanted to mention in part two.

For this test, I’ll consider 26 master sets of classes, each contained in its own file. Each master set is named after a capital letter in the English language: so we’ll have master set “A” in the file “A.zsr”, master set “B” in the file “B.zsr” and so on until “Z” in “Z.zsr”. A master set file will be using the next file that comes in alphabetic order, so “D” will be using “E”, except for “Z” who won’t be using any other file. Taking the master set’s name, we add three other characters, again from “a” to “z” (lower case) in order to get every possible permutation. So we’ll get the names: “Aaaa”, “Aaab”, … “Aaaz”, “Aaba”, …, “Aabz”, …, “Zaaa”, …, “Zzzz”. All names that start with the capital letter corresponding to a master set will be in the same file. Each name is the name of a class. Each class contains 26 constants, named form “A” to “Z”. The constants form a master set are initialized with the same constants from the similarly named class from the very next master set, plus one, so Aaaa.A = Baaa.A + 1, Fghj.O = Gghj.O + 1 and so on, except for the constant from the “Z” master set, which will be initialized with values from 0 to 25.

So we have 26 files, with 17,576 classes each (26 * 26 * 26). Every class has 26 constants, so every file has 456,976 constants. In order to initialize this number of constants, we need the next 456,976 constants from the next master set/file. So the total number of constants is 11,881,376. Now clearly, this test is completely ludicrous and no compiler on Earth is expected to be able to compile this, and if it is somehow capable of actually compiling it, it will take a lot of time and use an astronomic amount of resources.

Each file corresponding to a master set has a size of 10.3 MiBs, except for the last one, which is 6.6 MiBs. The main program file, one that prints a subset of these constants is 7.4 MiBs and tries to print only the Z subset. The total amount of disk space used by the test suite is 272.6 MiBs. Ludicrous amount of constants. Let me add some screenshot, first from “A.zsr”:



Then “Z.zsr”:



And finally “main.zsr”:


But when firing up the Z2 compiler, with a “main.zsr” that only uses the “Z” master set, we get very interesting results: the first time it takes roughly two seconds and on successive tries it goes down to 1.5 seconds (caching must set in). The execution time is very good, but we are not asking the compiler to do complicated stuff, only a lot of simple tasks. Memory wise, during compilation it eats up between 150 and 180 MiBs. When I set out to experiment with this project, I knew that I wanted to achieve the paradoxical goal of getting better compilation times with Z2 than with C++, and while I am not there yet, at this early stage at least the Z2 compiler is not hindering me. And Z2 compiles the constants without any requirement on their order and while checking for circular dependencies.

But the interesting part is related to the resulting C++ file. And here is where I encounter my first huge roadblock: the resulting C++ file is 23.1 MiBs. Notepad needs a good deal of seconds to open it and my C++ IDE has some troubles with editing the file with syntax highlighting. And when I tried to compile it the compiler gave me this message:

fatal error C1128: number of sections exceeded object file format limit : compile with /bigobj
test2: 1 file(s) built in (7:31.77), 451779 msecs / file, duration = 454057 msecs, parallelization 0%

After working for over 7 minutes, it gave up and said that it goes over some internal limit it has related to object file size. This was in debug mode. Let me try in optimal mode:

fatal error C1128: number of sections exceeded object file format limit : compile with /bigobj
test2: 1 file(s) built in (5:22.68), 322688 msecs / file, duration = 322782 msecs, parallelization 0%

Now it takes less time, probably because it does not have to generate debug information, but still fails. I am going to have to think seriously about this problem. Is it worth it to fix? Can it be done? Does using the option “/bigobj” help? I can try and insert the constants inline into the statements where they are needed so the backend compiler does not need to parse the constant definitions. But the resulting C++ code will be less readable. Using header files goes against the principles of this project. I can also try and break up the resulting C++ file into a lot of small ones. I would probably need to use a combination of methods.

Anyway, I can’t continue the official testing phase until I can get the resulting C++ code to compile.

But I can test Z2 a little bit more. I’ll change “main.zsr” to use “Y.zsr” and print the constants from that file. And the results are quite predictable: double the number of constants, double the time and memory use on the worst case scenario. After caching sets in, time goes down to even 2.3 seconds. Memory consumption does not go down. It is important to note that the “Y” set uses more RAM then the “Z” set, because “Y” constants are “Z” constants plus 1, while “Z” constants are just plain integers.

Repeating the experiment with using the “X” set is not possible on this computer. Either I do not have enough RAM or there is a problem with the memory allocator. Anyway, the compiler handled 913,952 constants and chocked before reaching the next milestone, 1,370,928 constants. Not that bad for some fresh code without any actual optimizations or a lot of development time sunk into it.

And such a short Z2 post! After more tests I’ll put up the test suite and the compiler somewhere on the Intewebz. See you next time!

No comments:

Post a Comment