diff options
| author | Graydon Hoare <[email protected]> | 2010-07-01 09:00:47 -0700 |
|---|---|---|
| committer | Graydon Hoare <[email protected]> | 2010-07-01 09:00:47 -0700 |
| commit | 3aaff59dba4b9fff598c49eeb579cb6c631dd4f4 (patch) | |
| tree | 57407e6bc053df681d8f83632d9541b39188d591 /doc/rust.texi | |
| parent | Modify manual to reflect new box/local terminology and new slot/type roles fo... (diff) | |
| download | rust-3aaff59dba4b9fff598c49eeb579cb6c631dd4f4.tar.xz rust-3aaff59dba4b9fff598c49eeb579cb6c631dd4f4.zip | |
Describe numeric and textual literals better; clean up lexeme descriptions a bit.
Diffstat (limited to 'doc/rust.texi')
| -rw-r--r-- | doc/rust.texi | 107 |
1 files changed, 88 insertions, 19 deletions
diff --git a/doc/rust.texi b/doc/rust.texi index 659f1389..f9d6e3e0 100644 --- a/doc/rust.texi +++ b/doc/rust.texi @@ -583,39 +583,42 @@ Unicode characters. * Ref.Lex.Sym:: Special symbol tokens. @end menu -@page +@node + @node Ref.Lex.Ignore @subsection Ref.Lex.Ignore @c * Ref.Lex.Ignore:: Ignored tokens. -The classes of @emph{whitespace} and @emph{comment} is ignored, and are not -considered as tokens. +Characters considered to be @emph{whitespace} or @emph{comment} are ignored, +and are not considered as tokens. They serve only to delimit tokens. Rust is +otherwise a free-form language. @dfn{Whitespace} is any of the following Unicode characters: U+0020 (space), U+0009 (tab, @code{'\t'}), U+000A (LF, @code{'\n'}), U+000D (CR, @code{'\r'}). @dfn{Comments} are any sequence of Unicode characters beginning with U+002F -U+002F (@code{//}) and extending to the next U+000a character, +U+002F (@code{"//"}) and extending to the next U+000A character, @emph{excluding} cases in which such a sequence occurs within a string literal token or a syntactic extension token. -@page @node Ref.Lex.Ident @subsection Ref.Lex.Ident @c * Ref.Lex.Ident:: Identifier tokens. Identifiers follow the pattern of C identifiers: they begin with a -@emph{letter} or underscore character @code{_} (Unicode character U+005f), and -continue with any combination of @emph{letters}, @emph{digits} and -underscores, and must not be equal to any keyword. @xref{Ref.Lex.Key}. +@emph{letter} or @emph{underscore}, and continue with any combination of +@emph{letters}, @emph{decimal digits} and underscores, and must not be equal +to any keyword. @xref{Ref.Lex.Key}. A @emph{letter} is a Unicode character in the ranges U+0061-U+007A and -U+0041-U+005A (@code{a-z} and @code{A-Z}). +U+0041-U+005A (@code{'a'}-@code{'z'} and @code{'A'}-@code{'Z'}). -A @emph{digit} is a Unicode character in the range U+0030-U0039 (@code{0-9}). +An @dfn{underscore} is the character U+005F ('_'). + +A @dfn{decimal digit} is a character in the range U+0030-U+0039 +(@code{'0'}-@code{'9'}). -@page @node Ref.Lex.Key @subsection Ref.Lex.Key @c * Ref.Lex.Key:: Keyword tokens. @@ -701,25 +704,91 @@ The keywords are: @subsection Ref.Lex.Num @c * Ref.Lex.Num:: Numeric tokens. -@emph{TODO: describe numeric literals}. +A @dfn{number literal} is either an @emph{integer literal} or a +@emph{floating-point literal}. + +@sp 1 +An @dfn{integer literal} has one of three forms: +@enumerate +@item A @dfn{decimal literal} starts with a @emph{decimal digit} and continues +with any mixture of @emph{decimal digits} and @emph{underscores}. + +@item A @dfn{hex literal} starts with the character sequence U+0030 +U+0078 (@code{"0x"}) and continues as any mixture @emph{hex digits} +and @emph{underscores}. + +@item A @dfn{binary literal} starts with the character sequence U+0030 +U+0062 (@code{"0b"}) and continues as any mixture @emph{binary digits} +and @emph{underscores}. + +@end enumerate + +@sp 1 +A @dfn{floating point literal} has one of two forms: +@enumerate +@item Two @emph{decimal literals} separated by a period +character U+002E ('.'), with an optional @emph{exponent} trailing after the +second @emph{decimal literal}. +@item A single @emph{decimal literal} followed by an @emph{exponent}. +@end enumerate + +@sp 1 +A @dfn{hex digit} is either a @emph{decimal digit} or else a character in the +ranges U+0061-U+0066 and U+0041-U+0046 (@code{'a'}-@code{'f'}, +@code{'A'}-@code{'F'}). + +A @dfn{binary digit} is either the character U+0030 or U+0031 (@code{'0'} or +@code{'1'}). + +An @dfn{exponent} begins with either of the characters U+0065 or U+0045 +(@code{'e'} or @code{'E'}), followed by an optional @emph{sign character}, +followed by a trailing @emph{decimal literal}. + +A @dfn{sign character} is either U+002B or U+002D (@code{'+'} or @code{'-'}). -@page @node Ref.Lex.Text @subsection Ref.Lex.Text @c * Ref.Lex.Key:: String and character tokens. -@emph{TODO: describe string and character literals}. +A @dfn{character literal} is a single Unicode character enclosed within two +U+0027 (single-quote) characters, with the exception of U+0027 itself, which +must be @emph{escaped} by a preceding U+005C character ('\'). + +A @dfn{string literal} is a sequence of any Unicode characters enclosed +within two U+0022 (double-quote) characters, with the exception of U+0022 +itself, which must be @emph{escaped} by a preceding U+005C character +('\'). + +Some additional @emph{escapes} are available in either character or string +literals. An escape starts with a U+005C ('\') and continues with one +of the following forms: +@itemize +@item An @dfn{8-bit codepoint escape} escape starts with U+0078 ('x') and is +followed by exactly two @dfn{hex digits}. It denotes the Unicode codepoint +equal to the provided hex value. +@item A @dfn{16-bit codepoint escape} starts with U+0075 ('u') and is followed + by exactly four @dfn{hex digits}. It denotes the Unicode codepoint equal to +the provided hex value. +@item A @dfn{32-bit codepoint escape} starts with U+0055 ('U') and is followed + by exactly eight @dfn{hex digits}. It denotes the Unicode codepoint equal to +the provided hex value. +@item A @dfn{whitespace escape} is one of the characters U+006E, U+0072, or +U+0074, denoting the unicode values U+000A (LF), U+000D (CR) or U+0009 (HT) +respectively. +@item The @dfn{backslash escape} is the character U+005C ('\') which must be +escaped in order to denote @emph{itself}. +@end itemize -@page @node Ref.Lex.Syntax @subsection Ref.Lex.Syntax @c * Ref.Lex.Syntax:: Syntactic extension tokens. -Syntactic extensions are marked with the @emph{pound} sigil @code{#} (U+0023), +Syntactic extensions are marked with the @emph{pound} sigil U+0023 (@code{#}), followed by a qualified name of a compile-time imported module item, an -optional parenthesized list of @emph{tokens}, and an optional brace-enclosed -region of free-form text (with brace-matching and brace-escaping used to -determine the limit of the region). @xref{Ref.Comp.Syntax}. +optional parenthesized list of @emph{parsed expressions}, and an optional +brace-enclosed region of free-form text (with brace-matching and +brace-escaping used to determine the limit of the +region). @xref{Ref.Comp.Syntax}. @emph{TODO: formalize those terms more}. |