aboutsummaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
Diffstat (limited to 'src')
-rw-r--r--src/README6
-rw-r--r--src/boot/README18
-rw-r--r--src/comp/README118
3 files changed, 130 insertions, 12 deletions
diff --git a/src/README b/src/README
index 4d1b431a..3618ee18 100644
--- a/src/README
+++ b/src/README
@@ -1,4 +1,4 @@
-This is preliminary version of the Rust compiler.
+This is preliminary version of the Rust compiler(s).
Source layout:
@@ -11,8 +11,8 @@ boot/util - Ubiquitous helpers
boot/llvm - LLVM-based alternative back end
boot/driver - Compiler driver
-comp/ The self-hosted compiler (doesn't exist yet)
-comp/* - Same structure as in boot/
+comp/ The self-hosted compiler ("rustc": incomplete)
+comp/* - Similar structure as in boot/
rt/ The runtime system
rt/rust_*.cpp - The majority of the runtime services
diff --git a/src/boot/README b/src/boot/README
index 891e1336..e17bfd79 100644
--- a/src/boot/README
+++ b/src/boot/README
@@ -1,15 +1,15 @@
An informal guide to reading and working on the rustboot compiler.
==================================================================
-First off, my sincerest apologies for the lightly-commented nature of the
-compiler, as well as the general immaturity of the codebase; rustboot is
-intended to be discarded in the near future as we transition off it, to a
-rust-based, LLVM-backed compiler. It has taken longer than expected for "the
-near future" to arrive, and here we are published and attracting contributors
-without a good place for them to start. It will be a priority for the next
-little while to make new contributors feel welcome and oriented within the
-project; best I can do at this point. We were in a tremendous rush even to get
-everything organized to this minimal point.
+First off, know that our current state of development is "bootstrapping";
+this means we've got two compilers on the go and one of them is being used
+to develop the other. Rustboot is written in ocaml and rustc in rust. The
+one you *probably* ought to be working on at present is rustc. Rustboot is
+more for historical comparison and bug-fixing whenever necessary to un-block
+development of rustc.
+
+There's a document similar to this next door, then, in comp/README. The
+comp directory is where we do work on rustc.
If you wish to expand on this document, or have one of the
slightly-more-familiar authors add anything else to it, please get in touch or
diff --git a/src/comp/README b/src/comp/README
new file mode 100644
index 00000000..fad13ba6
--- /dev/null
+++ b/src/comp/README
@@ -0,0 +1,118 @@
+An informal guide to reading and working on the rustc compiler.
+==================================================================
+
+First off, know that our current state of development is "bootstrapping";
+this means we've got two compilers on the go and one of them is being used
+to develop the other. Rustboot is written in ocaml and rustc in rust. The
+one you *probably* ought to be working on at present is rustc. Rustboot is
+more for historical comparison and bug-fixing whenever necessary to un-block
+development of rustc.
+
+There's a document similar to this next door, then, in boot/README. The boot
+directory is where we do work on rustboot.
+
+If you wish to expand on this document, or have one of the
+slightly-more-familiar authors add anything else to it, please get in touch or
+file a bug. Your concerns are probably the same as someone else's.
+
+
+High-level concepts
+===================
+
+Rustc consists of the following subdirectories:
+
+front/ - front-end: lexer, parser, AST.
+middle/ - middle-end: resolving, typechecking, translating
+driver/ - command-line processing, main() entrypoint
+util/ - ubiquitous types and helper functions
+lib/ - bindings to LLVM
+
+The entry-point for the compiler is main() in driver/rustc.rs, and this file
+sequences the various parts together.
+
+
+The 3 central data structures:
+------------------------------
+
+#1: front/ast.rs defines the AST. The AST is treated as immutable after
+ parsing despite containing some mutable types (hashtables and such).
+ There are three interesting details to know about this structure:
+
+ - Many -- though not all -- nodes within this data structure are wrapped
+ in the type spanned[T], meaning that the front-end has marked the
+ input coordinates of that node. The member .node is the data itself,
+ the member .span is the input location (file, line, column; both low
+ and high).
+
+ - Many other nodes within this data structure carry a def_id. These
+ nodes represent the 'target' of some name reference elsewhere in the
+ tree. When the AST is resolved, by middle/resolve.rs, all names wind
+ up acquiring a def that they point to. So anything that can be
+ pointed-to by a name winds up with a def_id.
+
+ - Many nodes carry an additional type 'ann', for annotations. These
+ nodes are those that later stages of the middle-end add information
+ to, augmenting the basic structure of the tree. Currently that
+ includes the calculated type of any node that has a type; it will also
+ likely include typestates, layers and effects, when such things are
+ calculated.
+
+#2: middle/ty.rs defines the datatype ty.t, with its central member ty.struct.
+ This is the type that represents types after they have been resolved and
+ normalized by the middle-end. The typeck phase converts every ast type to
+ a ty.t, and the latter is used to drive later phases of compilation. Most
+ variants in the ast.ty tag have a corresponding variant in the ty.struct
+ tag.
+
+#3: lib/llvm.rs defines the exported types ValueRef, TypeRef, BasicBlockRef,
+ and several others. Each of these is an opaque pointer to an LLVM type,
+ manipulated through the lib.llvm interface.
+
+
+Control and information flow within the compiler:
+-------------------------------------------------
+
+- main() in driver/rustc.rs assumes control on startup. Options are parsed,
+ platform is detected, etc.
+
+- front/parser.rs is driven over the input files.
+
+- Multiple middle-end passes (middle/resolve.rs, middle/typeck.rs) are run
+ over the resulting AST. Each pass produces a new AST with some number of
+ annotations or modifications.
+
+- Finally middle/trans.rs is applied to the AST, which performs a
+ type-directed translation to LLVM-ese. When it's finished synthesizing LLVM
+ values, rustc asks LLVM to write them out as a bitcode file, on which you
+ can run the normal LLVM pipeline (opt, llc, as) to get an executable.
+
+
+Comparison with rustboot
+========================
+
+Rustc is written in a more "functional" style than rustboot; each rustc pass
+tends to depend only on the AST it's given as input, which it does not mutate.
+Calculations flow from one pase to another by repeatedly rebuilding the AST
+with additional annotations.
+
+Rustboot normalizes to a statement-centric AST. Rustc uses an
+expression-centric AST.
+
+Rustboot generates 3-address IL into imperative buffers of coded IL quads.
+Rustc generates LLVM, an SSA-based expression IL.
+
+Rustc, being attached to LLVM, generates much better code. Factor of 5
+smaller, usually. Sometimes much more.
+
+Rustc preserves more of the parsed input structure. Rustboot "desugars" most
+of the input, rendering round-trip pretty-printing impossible. Error reporting
+is also better in rustc, as type names (as denoted by the user) are preserved
+throughout typechecking.
+
+Rustc is not concerned with the PIC-ness of the resulting code, nor anything
+to do with encoding DWARF or x86 instructions. All this superfluous
+machine-level logic that seeped up to the translation layer in rustboot is
+pushed past LLVM into later stages of the toolchain in rustc.
+
+Numerous "bad idea" idiosyncracies of the rustboot AST have been eliminated in
+rustc. In general the code is much more obvious, minimal and straightforward.