diff options
Diffstat (limited to 'notes.org')
-rw-r--r-- | notes.org | 233 |
1 files changed, 233 insertions, 0 deletions
diff --git a/notes.org b/notes.org new file mode 100644 index 0000000..e3ece06 --- /dev/null +++ b/notes.org @@ -0,0 +1,233 @@ +* Problems +- depending on binaries for building compilers and/or build systems is bad for trust +- the scale of the problem is getting larger as time passes: we started with just having GCC as a problem, but as time passed we got more and more new languages that are self-hosted (i.e. need an older version of themselves to be built). +- having a long chain of builds means that build systems have to be kept alive in modern environments. +- Go has its own linker, so to keep the bootstrap working patches have to be backported +- later versions of GCC are written in C++ +- if you’re going to maintain an old version GCC just for a bootstrap chain you’ll have to backport new architectures. This means that our bootstrap chain is forever tied to x86 (PowerPPC users cannot bootstrap on their own). +- generated C code is not source — it may make portability possible, but it’s not trustworthy +- bootstrapping is not a *functional* feature, so the value isn’t immediately obvious (very much like reproducible builds) +- the whole toolchain has bootstrapping issues (including linker and kernel…) + +* Ideas +** Consensus +- languages with multiple implementations are great, because diversity makes bootstrapping easier. For single-implementation languages we need alternative ways to get started (even if it’s inefficient) +** Rather than depend on more binary blobs, throw more CPU time at it, e.g. by emulating an x86 CPU with qemu and then work from there. +** Need to reach out to compiler developers: make sure that there’s a non-self-hosted path to build the first compiler — find cooperative people in compiler projects to “bootstrap” a bootstrapping project +** try to depend only on the smallest C compiler possible +- e.g. [[http://www.landley.net/code/tinycc/][tinycc]], [[http://pcc.ludd.ltu.se/][pcc]], [[http://www.landley.net/qcc/][qcc]] +- coreboot folks have a simple C compiler RAMCC(?) +** register http://bootstrappable.org, collect stories there! +*** motivation: collect examples of backdoored compilers +- toy example: https://manishearth.github.io/blog/2016/12/02/reflections-on-rusting-trust/ +- ken thompson: reflections on trusting trust +- "Defending Against Compiler-Based Backdoors" +http://blog.regehr.org/archives/1241 +- PoC||GTFO +https://www.alchemistowl.org/pocorgtfo/pocorgtfo08.pdf + +- TODO: need more! +*** Examples + - e.g. the bootstrap chain for GCC –> GCJ –> IcedTea 6 –> IcedTea7 in Guix + - GNU Make doesn’t need make but just a bash script + - Cook + - Guile Scheme: includes an interpreter written in C. + - Bazel needs Bazel to build itself, *but* you can build a minimal variant of Bazel with a shell script that runs =javac= on all Java sources, etc +- Ant, needs itself but can be build with plain Java +*** best practises +- don’t throw old code away (to allow for a bootstrapping chain) +- have an alternative implementation backend (e.g. written in C or in a language that traces back to C eventually) — simplifies porting +*** Call for Action! +- target it to different audiences: “if you’re a compiler writer, do this…”, “if you’re a free software dev, consider …” + + +* Notes for the manifesto +- Don't give "bad" examples, since we don't want to piss off upstreams. Only give "good" examples. + + +* Homepage / overview of problem + +(short summary from the problem - 2-3 concise sentences based on the Problems section?) +(try to clarify what bootstrapping actually is) + +To have trust in our computing platforms, we should be able to follow the +bootstrapping process - how each part was produced from source - to then feel +confident it is built on good foundations. + +(more detail on the intended outcomes and benefits) + +(1. trust/security - most powerful/appealing motivation, mention this one first) + +We want to draw attention to the need for an auditable, repeatable process for +bootstrapping programming languages, compilers, pieces of the toolchain and whole +distributions. + +(2. easier porting (new platforms? languages?) - secondary benefit, important but less people are interested) + +Another benefit would be that it becomes easier to port these things to new hardware +platforms. + +(Motivation / benefits could become a separate section of it gets too big) + +Compilers are often written in the language they are compiling. +This creates a chicken-and-egg problem that leads users and +distributors to rely on opaque, pre-built binaries of those +compilers that they use to build newer versions of the compiler. +We believe that those opaque binaries are a threat to user +security and user freedom since they are not auditable; we believe +the amount of bootstrap binaries should be minimized. + +* Best Practices (incl. examples, success stories?) + +** For compiler writers... + +If you're working on a compiler that is written in a language other than +the one it's compiling, you're all set! + +If your compiler is written in the language that it's compiling +(“self-hosted”), it probably falls in one of the following categories. + +If other implementations of this programming language exist, please make +sure your compiler can be built with one of these. Examples include: + + - The Go programming language has two implementations: [[https://golang.org/][the reference + implementation]] is self-hosted, and that in [[https://gcc.gnu.org][GCC]] is written in C++. + (TODO: check if we can build one with the other) Furthermore, + version 1.4 of the reference implementation was written in a + different language and can be used to build version 1.5. + - Common Lisp has several implementations. Notably [[http://www.clisp.org/][GNU clisp]] is + written and C and can be used to build self-hosted implementations + such as [[http://www.sbcl.org/][SBCL]]. + +If your compiler targets a language for which no other implementation +exists, then please consider maintaining a (minimal) implementation of +the language written in a different language. Most likely this +implementation exists, or existed at the point the programming language +was created. Maintaining this alternate implementation has a cost; +however, this cost should be minimal if this alternate implementation is +used routinely to build the compiler, and if this implementation is kept +simple—it does not need to be optimized. + +Examples include: + + - [[https://gnu.org/software/guile][GNU Guile]], a Scheme implementation with a self-hosted compiler, + relies on an [[http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/eval.c][Scheme interpreter written in C]] for bootstrapping + purposes. + +Please let us know if you’d like to add your compiler to this list! + +** For build systems writers... + +Build systems sometimes have chicken-and-egg problems: they may +need a version of themselves to get built. If you are developing +a build system, this can be avoided. We recommend that you +provide an alternative way to build your build system. + +Examples include: + + - [[https://gnu.org/software/make][GNU Make]] does not require a ‘make’ implementation. It can be built + using a [[http://git.savannah.gnu.org/cgit/make.git/tree/build.template][shell script]]. + - [[http://ant.apache.org/][Apache Ant]] can bootstrap with a [[https://git-wip-us.apache.org/repos/asf?p=ant.git;a=blob;f=bootstrap.sh;h=60b6ece03ce78716bc036a44226f4934b541f326;hb=HEAD][shell script]] + that only relies on the Java compiler. + - [[https://bazel.build/][Bazel]] does not require Bazel to build itself but + can be boostrapped with a [[https://github.com/bazelbuild/bazel/blob/master/compile.sh][shell script]]. + - [[https://buckbuild.com/][Buck]] does not require Buck to build itself. Instead, it can be + built using [[https://github.com/facebook/buck/blob/master/build.xml][Ant]]. + +Build system, compared to compiler, do not need to write a full +language compiler of its language to bootstrap. A really slow and +unefficient build written in shell script or another older +build system (Ant, GNU Make) can generate a minimal version of the +build system to bootstrap a complete version of it. + +** For distros + +It is unavoidable that distributions use some binaries as part of +their bootstrap chain. However, distributions should endeavour to +provide traceacibility and automated reproducibility for such +binaries. This means that: + +- It should be clear where the binary came from and how it was + produced. + +- Users can reproduce the binary to verify that it has not been + tampered with. + +For example, a distribution might use a binary package of GCC to build +GCC from source. This bootstrap binary is in most cases built from a +previous revision of the distribution's GCC package. Thus, the +distribution can label the binary with something like "this package +was built by running <command> on revision <hash> of the +distribution's package repository." A user can then easily reproduce +the binary by fetching the specified sources and running the specified +command. This build will in most cases depend on a previous generation +of bootstrap binaries. Thus, we get a chain of verifiable bootstrap +binaries stretching back in time. + +Bootstrap binaries may also come from upstream. This would typically +be the case when a language is first added to a distribution. In this +case, it may not be obvious how the binary can be reproduced, but the +distribution should at least clearly label the provenance of the +binary, e.g. "this binary was downloaded from +https://upstream-compiler.example.org/upstream-compiler-20161211-x86_64-linux.tar.xz". + +TODO: provide an example of how we do this / are going to do this in +Nixpkgs / Guix / ...? + +http://git.savannah.gnu.org/cgit/guix.git/commit/?id=062134985802d85066418f6ee2f327122166a567 + +* Collaboration projects + +** Continued maintenance of the GNU Compiler for Java (GCJ) + +Until recently the latest Java Development Kit (JDK) could be +bootstrapped in a chain starting with GCJ (the GNU Compiler for Java) +and the IcedTea build system. GCJ was deleted from the GNU Compiler +Collection in October 2016, so it is now unclear how to bootstrap the +JDK in future. To ensure that the JDK can be built from sources +without the need for an existing installation of the OpenJDK we +propose to continue maintaining GCJ. + +** Collectively maintaining GCC 4.7 + +The C and C++ compilers of the GNU Compiler Collection make up the +foundation of many free software distributions. Current versions of +GCC are written in C++, which means that a C++ compiler is needed to +build it from source. GCC 4.7 was the last version of the collection +that could be built with a plain C compiler, a much simpler task. We +propose to collectively maintain a subset of GCC 4.7 to ensure that we +can build the foundation of free software distributions starting with +a simple C compiler (such as tinyCC, pcc, etc). + +* Who are we? Which projects are participating +* Buy-in + +This is nice, but what are the actual benefits of “bootstrappable” +implementations? + +** For users + +As a user, bootstrappable implementations, together with [[https://reproducible-builds.org][reproducible +builds]], provide confidence that you are running the code you expect to +be running. Its source code is auditable by the developer community, +which in turns provides reassurance that the code you’re running does +not have backdoors. + +** For distributors + +Bootstrappable implementations provide clear provenance tracking: the +dependency graph of your distribution packages shows how each binary was +obtained. + +** For developers + +If you are a compiler writer, making your compiler bootstrappable from a +different language will simplify the development process (no need to +carry large pre-built binaries around). It will also make it easier to +port the compiler to a different platform for which no bootstrap +binaries exist yet. + +* Next Steps + +Try building gcc using gcc-4.7 <-- this already works (we used GCC 4.7 some months ago in Guix, but updated later for unrelated reasons) +Try building GCC 4.7 with TinyCC |