Chapter 2. Lexical Structure

WHAT'S IN THIS CHAPTER?

  • Understanding basic syntax

  • Defining values and identifiers

In any language, some basic lexical ideas have to be laid down before programmers can begin to understand the concepts behind the language — questions such as "What makes a comment?" or "What is allowed in identifier names?," although intrinsically boring in many ways, have to be defined before any further progress into the language's structure and form.

F#'s lexical structure derives strongly from its immediate ancestor, OCaml, which is itself a derivative of the pure functional language ML. This means that for the most part, although F# strives to be .NET-friendly in terms of its syntax, the C# and Visual Basic developer can find a number of new and interesting syntactic ideas, some of which will be surprising. Fortunately, most of those will be pleasant surprises because much of the syntax is less restrictive than the other .NET languages offered from Microsoft.

COMMENTS

The easiest place to begin with any language is a simple definition of what makes a comment (meaning, syntax that the compiler will ignore during processing). F# supports three different styles of comments:

  • Multi-line comments using the (* and *) delimiters, such as:

    (* This is
    a multi-line
    comment *)

    Note that multi-line comments nest, meaning that the multi-line comment will only be terminated when the number of end-comment pairs match the number of begin-comment pairs preceding it.

  • Single-line comments using the // delimiter, which signals a comment until the next end-of-line, such as

    // This is a single-line comment
  • Documentation comments using the /// delimiter, which signals a special form of comment (similar to the C# comment of the same form) that can be used to extract documentation for the element that follows.

Of the three, the multi-line comment form isn't seen much in general F# usage, and for the most part is present solely to support F# cross-compiling OCaml code.

The documentation comment supports much, if not all, of the same kinds of XML documentation "hints" that the C# documentation syntax supports, such as:

/// <summary>This is a cool function</summary>
/// <remarks>Use it wisely</remarks>

Note that if developers stick primarily to the single-line and documentation comment forms for regular use, swatches of code can be temporarily removed from use via the multi-line comment form, which can be particularly useful given the nesting nature of multi-line comments.

IDENTIFIERS

Identifiers in F# generally follow the same rules as C# or C++, in that any combination of Unicode characters defined as letters (uppercase or lowercase), digits, and the underscore are allowed, provided that the first character is a letter. For the C# developer, this is identical to how C# operates.

Like most languages, F# reserves certain character combinations for its own use, typically as keywords in the language. F# defines the following as unacceptable identifiers, either because they are keywords, or because the F# team wants to reserve them for future use (meaning they might become keywords in a future release of the F# language):

abstract and as asr assert atomic base begin break checked class component
const constraint constructor continue default delegate do done downcast
downto eager elif else end event exception extern external false finally
fixed for fun function functor global if in include inherit inline
interface internal land lazy let lor lsl lsr lxor match member method mixin
mod module mutable namespace new null object of open or override parallel
private process protected public pure rec return sealed sig static struct
tailcall then to trait true try type upcast use val virtual void volatile
when while with yield

Not all these identifiers are currently used, and some may end up never being used, depending on future directions the language takes. Even should the language later permit using them, none of the preceding words should ever be used as an identifier, for developer sanity if nothing else.

In addition, F# reserves a special syntax when an identifier ends in ?, !, or # for its own use. The most obvious example of this is the let! syntax used for asynchronous workflows. Again, even should the language permit their use in later versions of the language, avoid using them.

Some sample identifiers, and illegal identifiers, appear here:

let y = 1
let aReallyLongIdentifierName = 2
let _underscores_are_OK_too = 3
let soAreNumbers123AfterALetter = 4

Note that, as with all Microsoft Visual Studio-integrated languages, the IDE flags illegal identifiers with the ubiquitous "red-squiggly."

Note

Like most languages, F# has its share of surprise moments, in which something that works generates unexpected results. One interesting edge case emerges within the language; if an identifier containing the & symbol is used, it appears to work. For example, consider let abc&foo = 5. However, what's happening here is not the creation of a single identifier, but two identifiers (one on each side of the &), each with the same value.

For those situations in which F# has to consume an identifier (a class, method, field, property, or some other element) from an assembly that happens to have the same syntax as a reserved word, F# enables for a double-backtick syntax that permits the "escaping" of the identifier: Simply wrap the otherwise prohibited term in double-backtick characters (as in "assert"), and the F# parser will obligingly accept it. As a general rule, however, F# developers should avoid using this syntax for the purpose of overloading existing F# keywords or reserved words, and should take care to avoid creating identifiers that will conflict with other languages' reserved words that are likely to consume F# assemblies (such as C# and Visual Basic), assuming it can be helped at all. This syntax is mostly intended for making it easier to consume assemblies written in non-F# languages, where other programmers accidentally created identifiers that conflict with F# reserved words.

PREPROCESSOR DIRECTIVES

F#, like C#, uses "hash tags" to indicate preprocessor directives, directions to the parser on how to consider the act of parsing. F# employs a small number of preprocessor directives, all of which use the traditional C-style hash syntax, such as #line or #light. These are processed by the compiler before considering any other aspect of the language, and aside from whatever effect they have during compilation, provide no runtime overhead or impact. The full list of preprocessor directives recognized by the F# compiler is given here:

  • #line: Sets the line number for the source file immediately following this line. By default, the first line in an F# file is 1.

  • #if #else #endif: As the names imply, these directives evaluate whether an identifier has been defined (typically using the -define flag given to the compiler) at compile-time, and take either the code between the #if and #else, or between the #else and #endif, for processing. Note that this means that any syntactic or semantic errors in the block of code not taken are never checked by the compiler.

Other preprocessor directives may be added to the language later, depending on future directions.

SIGNIFICANT WHITESPACE

Languages frequently need to find some way to "set off" a block of code from the code around it — the "true" branch of a decision statement or the body of a function or method, for example — and where some languages choose to use some kind of syntactic "pairing," such as C#'s "{"/"}" characters or Visual Basics "Begin"/"End" tokens, F# chooses instead to use significant whitespace, using block indentation as the means by which blocks of code are set off from surrounding context.

Additionally, F# also requires no explicit end-of-line terminator,[1] which means overall that the language has far fewer syntactic "marker"s in the code, relying instead on the implicit structure of the indentation to indicate the structure of the code. Note that this means that developers must be careful to line up indentation levels consistently across the body of the program, because F# uses the indentation level of the "previous" lines to know precisely when a block has ended. For example, in the following:

let outer =
    let x = 1
    if x = 1 then
        System.Console.WriteLine("Hello, F#")
    else
        System.Console.WriteLine("Uh... how did this happen?")

the outer declaration creates one scope block, and inside of that, a new value, x, is declared, and an if/then/else statement (discussed in more detail in Chapter 4) whose true and false branches each form a new block.

If, for some reason, the else block is off by one or more whitespace characters — either indented too far or too little — the language may not know which if block this else is paired up against, and will flag the entire construct as an error (actually, for this precise example, the compiler will be able to adjust... but the programmer may not be so lucky, and other, less-trivial, examples will flag an error):

let outer =
    let x = 2
    if x = 1 then
        System.Console.WriteLine("Hello, F#")
        if x = 1 then
            System.Console.WriteLine("Again!")
      else
        System.Console.WriteLine("Uh... how did this happen?")

Note that F# can be made back into a whitespace-insignificant language by turning off #light mode, but this then forces the use of begin, end, in and ; tokens to denote blocks, according to the syntax of the OCaml language (to which F# originally intended to be syntax-equivalent, back when it was "just" a research language). Most F# developers agree that #light mode is the superior mode to use, and it will be the assumed mode for the code samples for the rest of this book.

SUMMARY

F# uses a number of lexical constructs similar to existing .NET languages, but also several lexical conventions that are brand-new to the platform. For the most part, experienced .NET developers can adjust to F#'s quirks without much work, but a few "gotchas" compared to C#/Visual Basic do exist. Fortunately these disappear quickly as the new F# developer gains experience with the language.



[1] Except when writing statements in the interactive F# window inside Visual Studio or the fsi.exe F# interpreter window, when ";;" is used to indicate that the user is not continuing input onto a new line.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset