Previous Home Contents Next

Chapter 4: Basic syntax

4.1 Syntax components

4.2 Escape character

4.3 Keywords

4.4 Names: identifiers and operators

4.5 Comments and desciptions

4.6 Application syntax

4.7 Grouping

4.8 Piles



4.1 : Syntax components

An Aldor program consists of a series of lines of text. These lines of text may be stored in a single file, or gathered from several files, or typed in as interactive input.

Some lines are not part of the Aldor program proper, but instead control its composition and the environment in which it is handled. These lines are called system commands. A system command is a line which has a hash character "#" as its first character. (Note that no white space may precede the "#" on the line.)

The example programs in this chapter use the following system commands:

The system command #include "filename.as" causes the lines of text from ``filename.as'' to be inserted into the Aldor program in place of the #include command.

The system commands #pile and #endpile are used to enclose lines of text in which indentation is used to determine the nesting of sequences of Aldor expressions. (See
section 4.8.)

A complete list of system commands is given in section 24.1. System commands used in the interactive interpreter are described in section 17.2.

When the series of lines comprising an Aldor program has been collected together, these lines are interpreted as a series of words, or tokens. There are several classes of tokens, each of which has a different meaning:

Identifiers
such as "Fred" and "rgf32" are used as names for variables and constants.
Literals
such as 42, 1.414 and "Urania riphaeus", represent explicit values. Literals are described in section 5.2.
Keywords
such as if and == each have a special meaning in the language, and impose a special structure on neighboring expressions.
Operators
such as by and + have special syntactic properties, but are otherwise the same as identifiers ( they are used as names for variables and constants).
Comments
are used to insert free-form text into a program. A comment begins with the two characters "--" and continues until the end of the current line of text.
Descriptions
are used to provide user-level documentation for functions, domains and categories defined in the program. A description begins with the two characters "++" or the three characters "+++" and continues until the end of the current line of text.
White space
consists of spaces, tabs, and newlines. White space is used to determine source position (line and column) information for message reporting, and for piling (See section 4.8).

The exact rules for the syntax of each of these token classes is given in section 24.2.3.



4.2 : Escape character

The underscore character "_" is used as an escape character in Aldor to modify the interpretation of the characters which follow. For example, an escape character followed by any amount of white space (spaces, tabs, and newlines) causes the white space to be ignored, allowing the characters on either side of the white space to form a single token, such as a name or a literal.

Section 5.1 describes how the escape character can be used inside an identifier, and section 5.2 describes how the escape character can be used inside a literal.



4.3 : Keywords

The basic components of any Aldor program can be separated into two broad categories: those which are defined by the language, and those which may be defined or redefined by the program. For example, the meaning of the word "if" is defined by the language, and all "if" statements behave according to the same rules. On the other hand, the meaning of a name such as "a" or "9" or "+" is determined by the program in which it is used.

A keyword in Aldor is a word whose meaning is fixed by the definition of the language. The following words are keywords which may not be redefined:

Generally, language-defined aspects of keywords offer protocols which allow them to work with new types as well as with language-defined types. So, for example, the language-defined "if", provides a way for the condition to be an expression which evaluates to any type, provided that type has certain properties.

The following keywords are meaningless in the current language definition, but are reserved for future language extensions.



4.4 : Names: identifiers and operators

A name is an identifier used to denote a variable or a constant. Most names begin with a letter or the character "%" and are made up of letters, digits and the characters "%", "?" and "!". The words "0" and "1" are also treated as names in Aldor so that mathematical structures can export identity elements without having to support integer literals. (See section 5.2.)

Examples:

When used in an identifier, the escape character is not included in the name of the identifier. To include a single underscore character in the name of an identifier, the sequence "__" must be used. So the name of the identifier denoted by "mod_+" is "mod+", and the name of the identifier denoted by "My__Integer" is "My_Integer".

A sequence of letters which would otherwise be considered a keyword (such as "if") can be treated as a name by escaping one of its constituent letters (as in "_if").

Certain names are treated as having special syntax properties by the language. The following identifiers can be used as infix operators, prefix operators, or both:

Aside from their syntactic properties, these names behave just as other identifiers. See section 4.6 for examples of using infix operators in different contexts.

A few naming conventions are used in the standard libraries:

Note that these are only notational conventions and are not considered as part of the language.



4.5 : Comments and descriptions

Comments and description strings annotate a program to help other people and other programs understand it.

A comment begins with the two characters "--" and continues until the end of the current line of text. Comments can be used to describe how a program operates, including an explanation of special assumptions made by the program, or a step-by-step description of the implementation of the algorithms used by the program. Comments are not examined by the compiler, and do not affect the meaning of a program.

A description begins with two or three plus characters ("++" or "+++") and continues until the end of the current line of text. A description should be used to describe the external characteristics of a program, such as the parameters it will accept or the method used to compute the result.

Description strings are saved in the compiler output in a form accessible by other programs. If a description begins with three plus characters ("+++"), then the name it describes should appear immediately after the description. If a description begins with only two plus characters ("++"), then the name it describes should appear immediately before the description:

Both "+++" and "++" are used so that after a semicolon we can still associate a description with the previous declaration.

It is easy to remember the difference between comments and descriptions: the Aldor compiler keeps positive remarks, and throws away negative ones.

Example:



4.6 : Application syntax

Applications are typically used to denote function calls, array indexing, or element accessors for compound data types.

A prefix application typically has the following form:

There are two additional forms for specifying a prefix application to one argument: juxtaposition and an infix dot.

The second of these forms is completely equivalent to f(a); the first is equivalent in a free-standing occurrence but associates differently -- to the right, rather than the left:

Any application in which the argument is enclosed in parentheses ("( )") or square brackets ("[ ]") is treated as being of the ``typical'' form, and so associates to the left -- even if a space follows the applied object. Thus "first [1,2,3]" is treated as identical with "first([1,2,3])".

The interpretation of mixed forms is determined by precedence rules: the precedence of juxtaposition is lower than that of the other forms, which are all equivalent, so an expression such as "f(a).2(b)(c).x y" is associated as "((((((f(a)).2)(b))(c)).x) y)". (A complete table of Aldor operator precedence appears in
section 4.7.1.)

Infix operators are applied to a pair of arguments using infix notation for function application:

Infix operators are generic in that they can be given definitions in Aldor programs just as other identifiers. The typical form for an infix function definition is as follows:

where op is one of the infix operators listed in section 4.4.

An infix operator can be used in any context where other names can be used. However, in some contexts the infix operator must be enclosed in parentheses to suppress its normal syntactic properties:

An infix operator must be enclosed in parentheses to be used as a prefix operator. Also, an infix operator cannot appear as an argument of another infix operator unless it it enclosed in parentheses.

Alternatively, the same name may be given as an identifier, rather than an infixed operator, using the escape character to include special characters, for example: _*(a, b).



4.7 : Grouping

Complex expressions in Aldor are formed according to the precedence of the operators appearing in the expression. When an expression is formed, the operators with higher precedence form the subexpressions for the operators with lower precedence.

Parentheses ("( )") and braces ("{ }") can be used to override the natural precedence order defined by the language.

Because comma has a lower precedence than most other syntactic forms, it is often necessary to enclose comma-separated expressions in parentheses. We write "f(1, 2)", since "f 1, 2" would be associated the same way as "(f 1), 2".

Likewise, we write "(a, b) := (1, 2)" (see
section 5.8 for an explanation of this notation), since "a, b := 1, 2" would be associated as "a, (b := 1), 2".

Similarly, the expression "(1 + 2) * 3" evaluates to 9, while the expression "1 + 2 * 3" evaluates to 7, since the "*" operator has a higher precedence than the "+" operator.

Braces are normally used to enclose sequences of expressions (see section 5.9):

The meaning of an expression is the same whether braces or parentheses are used. Braces are normally used to enclose a longer expression (especially sequences) split over several lines. Parentheses are normally used to enclose shorter expressions (especially multiple values-see section 5.8) as part of other expressions.

An implicit semicolon is assumed, if possible, after a closing brace. This is determined by whether the following token may start a new expression. For instance, in the construct

introduced in section 7.9, the "==" may not start an expression, so no semicolon is assumed.

To make the use of braces as natural as possible, an expression in braces may not be used as an argument to an infixed operator, e.g. "+", "-", "..". This is because many infixed operators may also be used in prefixed position. (Some, incidentally, may also be postfixed.) With infixed operators, parentheses may be used to achieve the desired effect -- for example:




4.7.1 : Precedence

Figure 4.1 provides a table of keywords and operators, given in order of syntactic precedence. Expressions are represented by ``e'' and keywords or operators by ``o''.

Each of the numbered entries in the table lists syntactic elements with the same precedence. The entries at the top of the table group most loosely, and those at the bottom most strongly. So, for example, since "+" is above "*", the expression

groups as "(a*b) + (c*d)". Entries with the same level number have the same precedence. For instance, ``and'' and ``/'' have the same grouping strength.

Some operators have both unary and binary forms. Some of these operators have meanings defined in the standard base libraries ( infixed "+" and "*"), while others do not ( prefixed "=" and "+-"). A programmer may provide new meanings for operators, but not for keywords. Entries for operators are flagged with ``[+]''; keyword entries are unflagged. This table can serve as a convenient reference for determining relative strengths of keywords and operators. The full details of the language syntax are given in chapter 24.

Fig 4.1: Keyword and operator precedence

Keyword/OperatorAssociativityUnary
 1 .     ;
 2 .     default   define   export   extend
         fluid     free     import   inline
         local     macro
 3 .     ,
 4 .     where
 5 .     :=  ==  ==>  +->
 6 .     break    do        generate   goto
         if       iterate   never      repeat
         return   yield
         =>
 7 .     for while
 8 .     and or
    [+]  /\  \/
 9 .[+]  =   ~=   ^=   >=   >   >>   <=   <   <<
    [+]  case   is   isnt
         has
 10.[+]  ..  by
 11.[+]  +   -   +-
 12.[+]  mod quo rem
 13.[+]  *   /   \
 14.[+]  **  ^
 15.     ::  @   pretend
 16.[+]  ->  <-
 17.     $
 18.     add    with
 19.     per    rep    not   ~   #
         A B    (juxtaposition)
 20.     A(B)   A[B]   A.B
(e o e) o e
---
---
---
(e o e) o e
(e o e) o e
e o (e o e)
---
---
---
e o (e o e)
---
(e o e) o e
(e o e) o e
(e o e) o e
(e o e) o e
(e o e) o e
(e o e) o e
(e o e) o e
(e o e) o e
(e o e) o e
e o (e o e)
(e o e) o e
e o (e o e)
e o (e o e)
(e o e) o e
---
e o (e o e)
(e o e) o e
---
---
---
---
---
---
---
---
---
---
---
---
---
---
o e
o e
---
e o
o e
---
---
---
---
---
---
o e
o e
---
---

[+] operators (may be overriden)



4.8 : Piles

Programmers often use indentation to make the visual structure of a program conform to it's syntactic structure, so that programs are easier to read. In Aldor, white space is usually ignored except to delimit tokens and to compute source position information. However, the compiler can be instructed to use indentation as part of the syntax of the language using a scheme known as piling.

Two system commands are used to instruct the compiler to enable and disable piling syntax, as desired, at various points in an Aldor program. The system command "#pile" instructs the compiler to use piling syntax for the source lines which follow, and the system command "#endpile" instructs the compiler to ignore initial white space on the source lines which follow.

Although the system commands "#pile" and "endpile" are usually found in pairs, the "#endpile" system command can be omitted at the end of a file.

When piling syntax is being used, indentation is treated roughly as follows (see
section 24.3 for full details; more examples of Aldor programs which use piling syntax can be found in figure 18.1).

Expressions which are indented by the same amount are grouped together as a sequence (see section 5.9) as though they were enclosed in braces and separated by semicolons. A sequence of expressions indented by the same amount is called a pile.

An expression which is too large to fit on one line at the current indentation level can be continued on another line by indenting the continuation line more than the initial line.

The indentation rules are applied first to the most indented lines, working outward to the lines which are indented the least.

The following example shows the piling rules being applied to a program which uses piling syntax, to convert it to an equivalent program which does not use piling syntax:

Because the line "<< Eggs" is indented with respect to the previous line, the two are joined.

The "CookBreakfast" and "Eat..." lines form a pile, which can be rewritten as a semicolon separated sequence:

And finally the entire program is treated as a pile:

Readers wishing to experiment interactively with our examples (by using "aldor -gloop") should note that piling is the default in interactive use. The examples generally should still run correctly if the illustrated layouts are used -- a few may require the addition of braces ("{}"). See chapter 17 for further details.


Previous Home Contents Next