Different Versions of Yale Haskell Compared ------------------------------------------- There are currently three different platforms running Yale Haskell. Yale Haskell runs on Lucid Common Lisp, CMU Common Lisp, and AKCL. This document describes the differences between these systems. Differences in performance between the different versions of Yale Haskell reflect the underlying Lisp systems. The better the Lisp system, the better the Haskell system built on it. However, getting optimal performance from our Haskell system on top of a Common Lisp system requires careful attention to the underlying compiler. Small changes in the optimization settings or the addition of crucial declarations can make significant differences in performance. We have been doing most of our work using the Lucid system and have tuned it more than the others. These comparisons are greatly influenced by the amount of time we have spent tuning the system: the CMU version has been tuned only a little and the AKCL version hardly at all. Methodology The following timings are only approximate. They were obtained using the timing functions provided by the Common Lisp system. All timings were done on an unloaded Sparc 1. No attempt was made to account for garbage collection, differences in heap size, or similar factors. We don't intend these benchmark results to be taken as an exhaustive comparison of the different Lisp implementations involved. Portability We have had no trouble moving our system to different hardware platforms under the same Lisp system. Since the release is in source form, we expect that users will be able to build on any hardware platform supported by one the Lisps we have ported to. Probably the only real constraint on portability is the requirement for a large virtual memory space. From the comp.lang.lisp FAQ: Lucid Common Lisp runs on a variety of platforms, including PCs (AIX), Apollo, HP, Sun-3, Sparc, IBM RT, IBM RS/6000, Decstation 3100, Silicon Graphics, and Vax. CMU Common Lisp is free, and runs on Sparcs (Mach and SunOs), DecStation 3100 (Mach), IBM RT (Mach) and requires 16mb RAM, 25mb disk. Kyoto Common Lisp (KCL) is free, but requires a license. Conforms to CLtL1. It is available by anonymous ftp from rascal.ics.utexas.edu [128.83.138.20], cli.com [192.31.85.1], or [133.11.11.11] (a machine in Japan) in the directory /pub. AKCL is in the file akcl-xxx.tar.Z (take the highest value of xxx). To obtain KCL, one must first sign and mail a copy of the license agreement to: Special Interest Group in LISP, c/o Taiichi Yuasa, Department of Computer Science, Toyohashi University of Technology, Toyohashi 441, JAPAN. Runs on Sparc, IBM RT, RS/6000, DecStation 3100, hp300, hp800, Macintosh II (under AUX), mp386, IBM PS2, Silicon Graphics 4d, Sun3, Sun4, Sequent Symmetry, IBM 370, NeXT and Vax. A port to DOS is in beta test as math.utexas.edu:pub/beta2.zip. We have not yet completed ports of Yale Haskell to any other Lisp implementations, although we are likely to do so in the future. System Size The overall size of the Haskell system depends on the size of the underlying Lisp system and how much unnecessary Lisp overhead has been removed for the system. We have removed large Lisp packages (like CLOS or CLX), but have not attempted to do any tree shaking. The size of the saved images (including the Lisp system, the Haskell compiler, and the compiled prelude) is Image Size: Lucid 10 meg CMU 18 meg AKCL 11 meg The larger size of the CMU system is probably an artifact of their way of saving the system. Compilation Time There are three possible ways to compile a Haskell program. All Haskell programs must be translated into Lisp. The generated Lisp can then be interpreted, using no additional compilation time; compiled with a `fast' but nonoptimizing Lisp compiler; or compiled with the `slow' compiler that aggressively attempts to perform as many optimizations as possible. To time the `fast', nonoptimizing compiler, we have been using (PROCLAIM '(OPTIMIZE (SPEED 3) (SAFETY 0) (COMPILATION-SPEED 3))) and for the `slow', fully optimizing compiler, we have been using (PROCLAIM '(OPTIMIZE (SPEED 3) (SAFETY 0) (COMPILATION-SPEED 0))) so that the only difference is in the COMPILATION-SPEED quality. Lucid does, in fact, provide two completely different compilers that correspond to these optimize settings. For all three implementations, it appears that that the effect of a higher compilation speed setting is primarily in being less aggressive about inlining and making use of type declarations. The Haskell system itself (including the Prelude) is normally built with the fully optimizing compiler. To show just the Haskell to Lisp compilation time, here are the times needed to compile the Prelude (about 2500 lines of Haskell code). This does not include the time in the Lisp compiler or starting up the system. Time to compile the Prelude into Lisp: (CPU times) Lucid 111 sec CMU 87 sec AKCL 576 sec Running the Lisp compiler on the generated code takes far longer than running the Haskell compiler to produce the Lisp code. For example, the optimizing Lucid compiler takes 47 minutes to compile the Prelude (about x20 slower than Haskell -> Lisp). The nonoptimizing compiler is significantly faster but generates poorer code. The following times are the Lisp compilation time for the Prolog interpreter (found in the demo directory of our release): Lucid - interpreted 8.8 sec Haskell -> Lisp Lucid - nonopt 20.0 sec Lisp -> Machine code Lucid - optimizing 320.0 sec Lisp -> Machine code CMU - interpreted 12.4 sec Haskell -> Lisp CMU - nonopt 121.0 sec Lisp -> Machine code CMU - optimizing 152.8 sec Lisp -> Machine code AKCL - interpreted 47.8 sec Haskell -> Lisp AKCL - nonopt ~180 sec Lisp -> Machine code AKCL - optimizing ~360 sec Lisp -> Machine code The AKCL timings are only approximate, because the Lisp timing functions do not capture the time spent in the C compiler. Code Speed The speed of the Haskell program depends on whether the Lisp code has been compiled with the optimizing or nonoptimizing compiler, or is running interpretively. The first benchmark is nfib, which indicates the basic speed of function calling and Int arithmetic. module Main where nfib :: Int -> Int nfib 0 = 1 nfib 1 = 1 nfib n = nfib (n-1) + nfib (n-2) nfib 20 nfib 30 Lucid (Interpreted) 116 sec * Lucid (nonopt) 0.14 sec 9.4 sec Lucid (optimizing) 0.08 sec 4.8 sec CMU (Interpreted) 23.8 sec * CMU (nonopt) 0.24 sec 6.9 sec CMU (optimizing) 0.11 sec 7.0 sec AKCL (Interpreted) 141 sec * AKCL (nonopt) 0.20 sec 21.3 sec AKCL (optimizing) 0.15 sec 18.2 sec * Too slow to benchmark For other data types, there was no significant difference betwen optimizing and nonoptimizing compilation in any of the systems. Changing the signature of nfib to Integer -> Integer: nfib 20 nfib 30 Lucid (interpreted) 140 sec * Lucid (compiled) 0.18 sec 10.2 sec CMU (interpreted) 24.2 sec * CMU (compiled) 0.16 sec 10.5 sec AKCL (interpreted) 145 sec * AKCL (compiled) 1.07 sec 127 sec Nfib with signature Float -> Float: nfib 20 nfib 30 Lucid (interpreted) 222 sec * Lucid (compiled) 16.4 sec 2416 sec CMU (interpreted) 44.2 sec * CMU (compiled) 1.61 sec 352 sec AKCL (interpreted) 161 sec * AKCL (compiled) 103 sec * Overloaded functions run considerably slower than nonoverloaded functions. By allowing nfib to remain overloaded, Num a => a -> a, and using the Int overloading the same benchmarks run much slower. Again, there is no real difference between the different compiler optimization settings. nfib 15 nfib 20 Lucid (interpreted) 14.2 sec 156 sec Lucid (compiled) 0.97 sec 9.3 sec CMU (interpreted) 23.8 sec 155 sec CMU (compiled) 0.89 sec 15.6 sec AKCL (interpreted) 30.8 sec 387 sec AKCL (compiled) 10.3 sec 119 sec Basic Haskell data structuring operations (pattern matching and construction) can be tested using another version of nfib which uses natural numbers: data Nat = Z | S Nat The difference betwen CMU and Lucid here is consistent with other benchmarks that manipulate structures. nfib 10 nfib 15 Lucid (Interpreted) 1.39 sec 26.7 sec Lucid (compiled) 0.26 sec 2.28 sec CMU (interpreted) 3.1 sec CMU (compiled) 0.16 sec 0.54 sec AKCL (Interpreted) 4.25 sec AKCL (compiled) 0.21 sec 13.9 sec A Large Program For a final benchmark, we use the Prolog interpreter as a way of getting a feel for general performance of larger programs. This program is typical of symbolic (as opposed to numeric) computation. Time to solve append(X,Y,cons(a,cons(b,cons(c,nil)))): Lucid 12.2 sec CMU 12.0 sec AKCL 69.1 sec My interpretation of this result is that although Lucid is a bit slower on the previous small benchmarks, it makes up for this is larger programs where advantages like better instruction scheduling, register allocation, or memory usage may make a difference. In general, Lucid and CMU are very similar in performance for larger programs. Conclusions Briefly stated, the pluses and minuses of each system are as follows: Lucid (4.0.0): + Development (nonoptimizing) compiler is very fast + Fast Haskell -> Lisp compilation + Generates good code + Very robust - Costs money - Slow floating point code - Fairly slow interpreter - The production (optimizing) compiler is extremely slow. CMU (16e): + Free + As fast as Lucid for Haskell -> Lisp + Good floating point performance + Generated code is very fast + Fast interpreter - Slow Lisp -> machine code compilation - Doesn't run on many systems AKCL (1.615): + Free + Widely portable - Slow (generally 3 - 5 times slower, sometimes much worse) - Flakey (tends to core dump on errors, choke on large programs, etc.) Generally, using the fully optimizing compiler seems to be useful only in code involving Int arithmetic. The fast compiler for Lucid is a big advantage, delivering by far the fastest compilation to machine code with relatively little loss in speed compared to the optimizing compiler. Yale Haskell Group September 25, 1992