Chapter 12. Language-Oriented Programming

In this chapter, you will begin by taking a look at what I mean by language-oriented programming, a term that has been used by many people to mean different things. I'll also briefly discuss its advantages and disadvantages. Next, you'll look at several different approaches to language-oriented programming in F#. These techniques include using F# literals to create little languages and using F# quotations. You'll spend the bulk of this chapter looking at examples where you create a language, then create an interpreter to execute that language. Finally, you'll take a more detailed look at how languages are executed, including a performance comparison of interpreted or compiled execution techniques.

What Is Language-Oriented Programming?

People use the term language-oriented programming to describe many different programming techniques, but the techniques they refer to tend to share a common theme. It's quite common for programmers to have to implement a predefined language; often this is because you need to extract structured data from information stored or received as string or XML data that conforms to this predefined language. The techniques introduced in this chapter will help you do this more reliably. Related to this is the idea of little languages, or domain-specific languages (DSLs); you might want to create a DSL when the best way to solve a problem is to create a custom language to describe the problem and then use this language to solve that problem. Functional programming has always had a strong relationship with language-oriented programming because functional programming languages generally have features that are well suited to creating parsers and compilers.

Data Structures as Little Languages

Taking advantage of language-oriented development doesn't necessarily mean writing your own parser or compiler. That said, you will learn how to create parsers in the next chapter; you can them combine such parsers with the techniques in this chapter to start building a simple compiler. You can accomplish a lot by creating data structures that describe what you want to do and then creating functions or modules that define how the structure should be interpreted.

You can create data structures that represent a program in just about any language, but F# lends itself well to this approach. F#'s literal lists and arrays are easy to define and require no bulky type annotations. Its union types allow you to create structures that express related concepts, but do not necessarily contain the same types of data. You can take advantage of this to build tree-like structures, which prove useful when creating languages. Finally, you can treat functions as values, so you can easily embed functions within data structures. This means F# expressions can become part of your language, usually as an action in response to some particular condition of the language.

Let's start by taking a look at a DSL predefined in the FSharp.PowerPack.dll. The Arg module allows users to build a command-line argument parser quickly. It does this by using F#'s union and list types to create a little language that is then interpreted by a number of functions provided in the Arg module.

The Arg module exposes a tuple type called argspec that consists of two strings and a union type called spec. The first string in the tuple specifies the name of the command-line argument. The second item in the tuple is the union-type spec, which specifies what the command-line argument is; for example, it specifies whether it is followed by a string value or just a flag? It also specifies what should be done if and when the command-line token is found. The final string in the tuple is a text description of what the flag does. This will be printed to the console in the case of a mistake in the command-line arguments. It also serves as a useful note to the programmer.

The Arg module exposes two functions for parsing arguments: parse, which parses the command passed in on the command line; and parse_argv, which requires that you pass the arguments directly to it. You should pass both functions a list of type argspec describing the command-line arguments expected, a function that will be passed all the command-line arguments not prefixed with -, and finally, a string to describe the usage.

The module also exposes a third function usage, which you can pass a list of type argspec and use to write out the usage directly.

The following example demonstrates how to build an argument parser in this manner. You store the parameters collected from the command line in identifiers for later use; in this case, you will write them to the console:

let myFlag = ref true
let myString = ref ""
let myInt = ref 0
let myFloat = ref 0.0
let (myStringList : string list ref) = ref []

let argList =
    [ "-set", Arg.Set myFlag, "Sets the value myFlag";
      "-clear", Arg.Clear myFlag, "Clears the value myFlag";
      "-str_val", Arg.String(fun x -> myString := x), "Sets the value myString";
      "-int_val", Arg.Int(fun x -> myInt := x), "Sets the value myInt";
      "-float_val", Arg.Float(fun x -> myFloat := x), "Sets the value myFloat"; ]

if System.Environment.GetCommandLineArgs().Length <> 1 then
    Arg.parse
        argList
        (fun x -> myStringList := x :: !myStringList)
        "Arg module demo"
else
    Arg.usage
        argList
        "Arg module demo"
    exit 1
printfn "myFlag: %b" !myFlag
printfn "myString: %s" !myString
printfn "myInt: %i" !myInt
printfn "myFloat: %f" !myFloat
printfn "myStringList: %A" !myStringList

When you run the preceding code with no command-line arguments or faulty command-line arguments, the program will output these results:

Arg module demo
        -set: Sets the value my_flag
        -clear: Clears the value my_flag
        -str_val <string>: Sets the value my_string
        -int_val <int>: Sets the value my_int
        -float_val <float>: Sets the value my_float
        --help: display this list of options
        -help: display this list of options

The preceding example will output the following when you run it with the command line, args.exe -clear -str_val "hello world" -int_val 10 -float_val 3.14 "file1" "file2" "file3"::

myFlag: false
myString: hello world
myInt: 10
myFloat: 3.140000
myStringList: ["file3"; "file2"; "file1"]

I am particularly fond of this kind of DSL because I think it makes it clear what arguments the program is expecting and what processing should take place if that argument is received. The fact that the help text is also stored in the structure serves a double purpose; it allows the function processing command-line arguments to print out a help automatically if anything goes wrong, and it also reminds you what the argument is in case you forget. I also like this method of creating a command-line interpreter because I have written several command-line interpreters in imperative languages, and it is not a satisfying experience—you end up having to write lots of code to detail how your command line should be broken up. If you write that code in .NET, then you usually spend way too much time calling the string type's IndexOf and Substring methods.

A Data Structure–Based Language Implementation

Creating any DSL should start with defining what problem you need to solve; in this case, you need to define a DSL library (sometimes called a combinators library) for drawing 2D images. This is something of an obvious choice. This example demonstrates how you can build up complicated structures out a number of simple primitives. An image on a computer screen is essentially just a collection of lines and polygons, although the image displayed might be extremely intricate. This example is presented in four modules: the first, Listing 12-1, provides the primitives for creating a picture; the second, Listing 12-2, shows you how to implement an interpreter for the picture; and Listings 12-3 and 12-4 provide examples that illustrate how to use the libraries. You'll need to use listings 12-1 and 12-2 together with either Listing 12-3 or 12-4 to see any results. You begin by walking through the main points of the design process and conclude by looking at the full listing.

Note

This example was largely inspired by working with the guys at A6 Systems (http://a6systems.com/), who have a similar but more complex library for rendering 3D animated scenes. They use this library for a wide range of industrial purposes.

You start by designing a set of types that that will describe your picture; these types form the primitives of your image:

// represents the basic shapes that will make up the scene
type Shape =
    | Line of Position * Position
    | Polygon of List<Position>
    | CompersiteShape of List<Shape>

This type is recursive, and the CompersiteShape union case contains a list of shapes that it will use to form a tree-like structure. In compiler development, this tree-like structure is referred to as the Abstract Syntax Tree (AST). You'll see another example of using an AST to represent a program at the end of the chapter.

So far, you have created your picture using three basic elements: lines, polygons and shapes. The fact that your type is made up of just three simple elements is an important design decision; making your primitives simple makes implementing the engine that will render the image much simpler. The fact that your primitives are so simple means you don't expect your user to spend time interacting with them directly; instead you'll provide a set of higher-level wrapper functions that return values of type shape. These are your combinators. The CompersiteShape case in your union is an important example of this; it allows you to build up more complicated shapes out of simpler elements. You expose this through the compose function:

// allows us to compose a list of elements into a
    // single shape
    let compose shapes = CompersiteShape shapes

You use this function to implement a number of higher-level functions; for example, the lines function, which takes a list of positions and returns a shape that is a path through those positions, takes advantage of the compose function to combine a number of individual lines into a single line:

// a line composed of two or more points
    let lines posList =
        // grab first value in the list
        let initVal =
            match posList with
            | first :: _ -> first
            | _ -> failwith "must give more than one point"
// creates a new link in the line
        let createList (prevVal, acc) item =
            let newVal = Line(prevVal, item)
            item, newVal :: acc
        // folds over the list accumlating all points into a
        // list of line shapes
        let _, lines = List.fold createList (initVal, []) posList
        // compose the list of lines into a single shape
        compose lines

Next, you use this lines function in the implementation of several high-level shapes, such as the square function:

let square filled (top, right) size =
        let pos1, pos2 = (top, right), (top, right + size)
        let pos3, pos4 = (top + size, right + size), (top + size, right)
        if filled then
            polygon [ pos1; pos2; pos3; pos4; pos1 ]
        else
            lines [ pos1; pos2; pos3; pos4; pos1 ]

The square function uses the lines function to plot the outline of a square with the calculated points. You can see the full module in Listing 12-1, although a more realistic implementation would probably contain more basic shapes for the users of the library to choose from. You will need to add references to System.Drawing.dll and System.Windows.Forms.dll to make the example compile.

Example 12.1. A Combinator Library for Creating Images

namespace Strangelights.GraphicDSL
open System.Drawing

// represents a point within the scene
type Position = int * int

// represents the basic shapes that will make up the scene
type Shape =
    | Line of Position * Position
    | Polygon of List<Position>
    | CompersiteShape of List<Shape>

// allows us to give a color to a shape
type Element = Shape * Color

module Combinators =
    // allows us to compose a list of elements into a
    // single shape
let compose shapes = CompersiteShape shapes

    // a simple line made from two points
    let line pos1 pos2 = Line (pos1, pos2)

    // a line composed of two or more points
    let lines posList =
        // grab first value in the list
        let initVal =
            match posList with
            | first :: _ -> first
            | _ -> failwith "must give more than one point"
        // creates a new link in the line
        let createList (prevVal, acc) item =
            let newVal = Line(prevVal, item)
            item, newVal :: acc
        // folds over the list accumlating all points into a
        // list of line shapes
        let _, lines = List.fold createList (initVal, []) posList
        // compose the list of lines into a single shape
        compose lines

    // a polygon defined by a set of points
    let polygon posList = Polygon posList

    // a triangle that can be either hollow or filled
    let triangle filled pos1 pos2 pos3 =
        if filled then
            polygon [ pos1; pos2; pos3; pos1 ]
        else
            lines [ pos1; pos2; pos3; pos1 ]

    // a square that can either be hollow or filled
    let square filled (top, right) size =
        let pos1, pos2 = (top, right), (top, right + size)
        let pos3, pos4 = (top + size, right + size), (top + size, right)
        if filled then
            polygon [ pos1; pos2; pos3; pos4; pos1 ]
        else
            lines [ pos1; pos2; pos3; pos4; pos1 ]

You now have the basic elements of your language; next you need to implement an interpreter to display the image. The interpreter described in this chapter is a WinForm. The advantage to this approach is that you might also implement an interpreter in WPF, Silverlight, or GTK#, which means that it is quite portable between GUI libraries and platforms. Implementing the interpreter is straightforward. You just need to implement each of your union cases. In the case of Line and Polygon, you draw these shapes using the GDI+ objects that WinForms are based on. Fortunately, GDI+ makes it straightforward to draw a line or polygon. The third CompositeShape case is also straightforward; you simply call your drawing function recursively. You can see the full source code for this in Listing 12-2. You will need to add references to System.Drawing.dll and System.Windows.Forms.dll to make it compile.

Example 12.2. An Interpreter to Render Images from Your Combinator Library

namespace Strangelights.GraphicDSL

open System.Drawing
open System.Drawing.Drawing2D
open System.Windows.Forms

// a form that can be used to display the scene
type EvalForm(items: List<Element>) as x =
    inherit Form()
    // handle the paint event to draw the scene
    do x.Paint.Add(fun ea ->
        let rec drawShape (shape, (color: Color)) =
            match shape with
            | Line ((x1, y1), (x2, y2)) ->
                // draw a line
                let pen = new Pen(color)
                ea.Graphics.DrawLine(pen, x1, y1, x2, y2)
            | Polygon points ->
                // draw a polygon
                let points =
                    points
                    |> List.map (fun (x,y) -> new Point(x, y))
                    |> Array.ofList
                let brush = new SolidBrush(color)
                ea.Graphics.FillPolygon(brush, points)
            | CompersiteShape shapes ->
                // recursively draw the other contained elements
                List.iter (fun shape -> drawShape(shape, color)) shapes
        // draw all the items we have been passed
        items |> List.iter drawShape)

Putting together a simple image composed of two squares and a triangle now becomes straightforward. You simply call the appropriate functions from your combinator library and then combine them with a color to make a full description of the scene. Listing 12-3 shows how to do this; you can see the resulting image in Figure 12-1.

Example 12.3. A Simple Example of Using Your Combinator Library

open System.Drawing
open System.Windows.Forms
open Strangelights.GraphicDSL

// two test squares
let square1 = Combinators.square true (100, 50) 50
let square2 = Combinators.square false (50, 100) 50

// a test triangle
let triangle1 =
    Combinators.triangle false
        (150, 200) (150, 150) (250, 200)

// compose the basic elements into a picture
let scence = Combinators.compose [square1; square2; triangle1]

// create the display form
let form = new EvalForm([scence, Color.Red])

// show the form
Application.Run form

The simple example given in Listing 12-3 probably doesn't represent how you would create images using the combinator library. You'll take look at a more realistic scenario in Listing 12-4. The best approach to using the combinator library would probably be to carry on programming in the style you wrote the original combinator library in; that is, you would build up simple elements that you can reuse in your image. Next, you'll look at how to create a scene composed of seven stars. The obvious place to start is with the creation of the star; you can see how to create the star function defined in Listing 12-4. This function creates triangles that are mirror images and combines them with a slight offset to form a six-sided star. This example might give you some idea of how to build up more complex shapes out of simpler ones. Once you have the definition of your star, you simply need a list of positions that tell where to print the stars. You can see this list of points in Listing 12-4. Once you have these two elements, you can combine them using the List.map function and the compose function to create your scene. Next, you can display your scene the same way you did in the previous listing.

Example 12.4. Creating a More Complex Image Using Your Combinator Library

open System.Drawing
open System.Windows.Forms
open Strangelights.GraphicDSL
// define a function that can draw a 6 sided star
let star (x, y) size =
    let offset = size / 2
    // calculate the first triangle
    let t1 =
        Combinators.triangle false
            (x, y - size - offset)
            (x - size, y + size - offset)
            (x + size, y + size - offset)
    // calculate another inverted triangle
    let t2 =
        Combinators.triangle false
            (x, y + size + offset)
            (x + size, y - size + offset)
            (x - size, y - size + offset)
    // compose the triangles
    Combinators.compose [ t1; t2 ]

// the points where stars should be plotted
let points = [ (10, 20); (200, 10);
               (30, 160); (100, 150); (190, 150);
               (20, 300); (200, 300);  ]

// compose the stars into a single scene
let scence =
    Combinators.compose
        (List.map (fun pos -> star pos 5) points)

// show the scene in red on the EvalForm
let form = new EvalForm([scence, Color.Red],
                        Width = 260, Height = 350)

// show the form
Application.Run form

Figure 12-1 shows the resulting image.

You've now seen two approaches for creating combinator libraries (libraries that create little languages though data structures). At this point, you're probably beginning to see how you can break a problem down into an abstract description of the problem based on a small set of primitives, possibly with aid of other libraries that you build on these primitives.

A scene rendered by your combinator library

Figure 12.1. A scene rendered by your combinator library

Note

If you're looking for a more in-depth view of a combinator library, take a look at the paper by Simon Peyton Jones, Jean-Marc Eber, and Julian Seward called "Composing contracts: an adventure in financial engineering.". The paper gives an in depth, yet understandable, study of a combinator library for describing derivatives contracts. The examples in the paper are in Haskell rather than F#, but you could translate them to F# with some effort. You can read this paper at http://research.microsoft.com/en-us/um/people/simonpj/papers/financial-contracts/contracts-icfp.htm.

Metaprogramming with Quotations

In Chapter 6, you used quotations; these are quoted sections of F# code where the quote operator instructs the compiler to generate data structures representing the code, rather than IL representing the code. This means you have a data structure that represents the code that was coded, rather than code you can execute, and you're free to do what you want with it. You can either interpret it, performing the actions you require as you go along, or you can compile it into another language. Or you can simply ignore it if you want. You could, for example, take a section of quoted code and compile it for another runtime, such as the Java virtual machine (JVM). Or, as in the LINQ example in Chapter 9, you could turn it into SQL and execute it against a database.

In the next example, you'll write an interpreter for integer-based arithmetic expressions in F#. This might be useful for learning how stack-based calculations work. Here, your language is already designed for you; it is the syntax available in F#. You'll work exclusively with arithmetic expressions of the form, <@ (2 * (2 - 1)) / 2 @>. This means you need to generate an error whenever you come across syntax that is neither an integer nor an operation. Quotations are based on discriminating union type. When working with quotations, you have to query the expression that you receive using F#'s pattern matching and active patterns. For example, here you query an expression using an active pattern and a when guard to see whether it is an integer; if it is, you push it onto the stack:

| Value (x,ty) when ty = typeof<int>  ->
                                                   let i = x :?> int
                                                   printfn "Push %i" i
                                                   operandsStack.Push(x :?> int)

If it isn't an integer, you could go on to check whether it is of several other types. There also several parameterized active patterns that you might find useful. For example, SpecificCall accepts a quotation that is a function expression and allows you to query whether the quotation being matched over is a call to that function. You use this to determine whether a call to an operator is made; for example, this example checks whether a call to the plus operator is made:

| SpecificCall <@ (+) @> (_,_, [l;r])  -> interpretInner l
                                                   interpretInner r
                                                   preformOp (+) "Add"

You can see the full listing in Listing 12-5.

Example 12.5. Stack-Based Evaluation of F# Quoted Arithmetic Expressions

open System.Collections.Generic
open Microsoft.FSharp.Quotations
open Microsoft.FSharp.Quotations.Patterns
open Microsoft.FSharp.Quotations.DerivedPatterns

let interpret exp =
    let operandsStack = new Stack<int>()
    let preformOp f name =
        let x, y = operandsStack.Pop(), operandsStack.Pop()
        printfn "%s %i, %i" name x y
        let result = f x y
        operandsStack.Push(result)
    let rec interpretInner exp =
        match exp with
        | SpecificCall <@ (*) @> (_,_, [l;r])  -> interpretInner l
                                                   interpretInner r
                                                   preformOp (*) "Mult"
        | SpecificCall <@ (+) @> (_,_, [l;r])  -> interpretInner l
                                                   interpretInner r
                                                   preformOp (+) "Add"
        | SpecificCall <@ (-) @> (_,_, [l;r])  -> interpretInner l
                                                   interpretInner r
                                                   preformOp (-) "Sub"
| SpecificCall <@ (/) @> (_,_, [l;r])  -> interpretInner l
                                                   interpretInner r
                                                   preformOp (/) "Div"
        | Value (x,ty) when ty = typeof<int>    ->
                                                   let i = x :?> int
                                                   printfn "Push: %i" i
                                                   operandsStack.Push(x :?> int)
        | _ -> failwith "not a valid op"
    interpretInner exp
    printfn "Result: %i" (operandsStack.Pop())

interpret <@ (2 * (2 - 1)) / 2 @>

Running this code produces the following results:

Push: 2
Push: 2
Push: 1
Sub 1, 2
Multi 1, 2
Push: 2
Div 2, 2
Result: 1

You are always working with F# syntax when you use quotations, which is both an advantage and a disadvantage. The advantage is that you can produce powerful libraries based on this technique that integrate well with F# code, but without having to create a parser. The disadvantage is that it is difficult to produce tools suitable for end users based on this technique; however, libraries that consume or transform F# quotations can still be used from other .NET languages because the F# libraries include functions and samples to convert between F# quotations and other common metaprogramming formats, such as LINQ quotations. For some interesting uses of quotations as little languages, you can see the F# DSL in "Microsoft Solver Foundation" at http://code.msdn.microsoft.com/solverfoundation. You can also see my discussion of it on my blog at http://strangelights.com/blog/archive/2008/09/21/1628.aspx.

This concludes your examination of DSLs; the rest of the chapter will dig a bit deeper into the implementation of an interpreter or compiler for your language.

Implementing a Compiler and an Interpreter for an Arithmetic-Language

So far, you've focused more on the design of the languages themselves, the frontend, rather than the implementation of the compiler or interpreter for the language, the backend. In this section, you'll focus on the implementation of a backend for a simple arithmetic-language defined by an AST. The AST syntax tree shown in the first section is based on a union type.

You will return to this example in the next chapter, "Parsing Text," to build a front end for this language. At this point, you have effectively built a small compiler, but that is not the focus of this chapter—right now the focus is exclusively on the backend.

You have two distinct modes of acting on the results of the parser: compiling the results and interpreting them. Compiling refers to changing the AST into some other format that is faster or easier for a machine to execute. Originally, this nearly always meant native code; these days, it's more likely to refer to something a little more abstract, such as IL, F#, or even C#. Interpreting the results means acting on the results straightaway, without any transformation of the AST. You'll look briefly at both of these topics in the sections "Interpreting the AST" and "Compiling the AST"; then you'll compare the two approaches to get some idea of when to use each in the section, "Compilation vs. Interpretation."

The Abstract Syntax Tree

An AST is a representation of the construct that makes up the program; it's intended to be easy for the programmer to use. One reason F# is good for this kind of development is its union type. This type is great for representing languages because you can use it to represent items that are related yet do not share the same structure. The following example shows how to use an AST:

type Ast =
  | Ident of string
  | Val of System.Double
  | Multi of Ast * Ast
  | Div of Ast * Ast
  | Plus of Ast * Ast
  | Minus of Ast * Ast

The tree consists of only one type because it is quite simple. A complicated tree would contain many more types, but it would still follow this basic pattern. Here you can see that the tree, which is of the type Expr, will consist of either identifiers (the Ident type); the names of the identifiers represented by a string; or values (the Val type), which will be values represented by a System.Double. The type Expr consists of four more types (Multi, Div, Plus, and Minus) that represent the arithmetic operations. They use recursion, so they are composed of other expressions.

Interpreting the AST

Once you create your AST, you have two choices: you can either interpret it or compile it. Interpreting it simply means walking the tree and performing actions as you go. Compiling it means changing it into some other form that is easier, or more typically faster, for the machine to execute. This section will examine how to interpret the results; the next section will look at the options for compiling them; finally, you will look at when you should use interpretation and when you should use compilation.

The following example shows a short interpreter for your program. The main work of interpreting the AST is done by the function interpret, which walks the tree, performing the necessary actions as it goes. The logic is quite simple. If you find a literal value or an identifier, you return the appropriate value:

| Ident (s) -> variableDict.[s]
| Val (v) -> v

If you find an operand, you recursively evaluate the expressions it contains to obtain its values and then perform the operation:

| Multi (e1, e2) -> (interpretInner e1) * (interpretInner e2)

You can see the complete interpreter in Listing 12-6.

Example 12.6. Interpreting an AST Generated from Command-Line Input

open System
open System.Collections.Generic
open Strangelights.Expression

// requesting a value for variable from the user
let getVariableValues e =
    let rec getVariableValuesInner input (variables : Map<string, float>) =
        match input with
        | Ident (s) ->
            match variables.TryFind(s) with
            | Some _ -> variables
            | None ->
                printf "%s: " s
                let v = float(Console.ReadLine())
                variables.Add(s,v)
        | Multi (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | Div (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | Plus (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | Minus (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | _ -> variables
    getVariableValuesInner e (Map.empty)
// function to handle the interpretation
let interpret input (variableDict : Map<string,float>) =
    let rec interpretInner input =
        match input with
        | Ident (s) -> variableDict.[s]
        | Val (v) -> v
        | Multi (e1, e2) -> (interpretInner e1) * (interpretInner e2)
        | Div (e1, e2) -> (interpretInner e1) / (interpretInner e2)
        | Plus (e1, e2) -> (interpretInner e1) + (interpretInner e2)
        | Minus (e1, e2) -> (interpretInner e1) - (interpretInner e2)
    interpretInner input

// the expression to be interpreted
let e = Multi(Val 2., Plus(Val 2., Ident "a"))

// collect the arguments from the user
let args = getVariableValues e

// interpret the expression
let v = interpret e args

// print the results
printf "result: %f" v

Compiling and executing the preceding example produces the following results:

[a]: 12
result: 28.000000

Compiling the AST

To many developers, compilation means generating native code, so it has a reputation for being difficult. But it doesn't have to mean generating native code. For a DSL, you typically generate another, more general-purpose programming language. The .NET Framework provides several features for compiling an AST into a program.

Your choice of technology depends on several factors. For example, if you want to target your language at developers, it might be enough to generate a text file containing F#, some other language, or a compiled assembly that can then used within an application. However, if you want to target end users, you will almost certainly have to compile and then execute it on the fly. Table 12-1 summarizes the various options available.

Table 12.1. .NET Code-Generation Technologies

Technology

Description

Microsoft.CSharp.CSharpCodeProvider

This class supports compilation of a C# file that has been created on the fly, either by using simple string concatenation or by using the System.CodeDom namespace. Once the code has been compiled into an assembly, it can be loaded dynamically into memory and executed via reflection. This operation is relatively expensive because it requires writing to the disk and using reflection to execute methods.

System.CodeDom

This is a set of classes aimed at abstracting between operations available in different languages. The idea is to describe your operations using the classes available in this namespace, and then use a provider to compile them into the language of your choice. .NET ships with a provider for both C# and Visual Basic.

FSharp.Compiler.CodeDom.dll

This is the CodeDom provider that ships with F#. It can be used to compile F# on the fly in a similar way to the C# CodeDom provider.

System.Reflection.Emit

This namespace allows you to build up assemblies using IL. IL offers more features than either F#, C#, or System.CodeDom, so it provides more flexibility; however, it is lower level so it also requires more patience and will probably take more time to get right.

Mono.Cecil

This is a library extensively used in the Mono framework for both parsing assemblies and dynamically creating them.

You use the System.Reflection.Emit.DynamicMethod class, not because you need the flexibility of IL, but because IL includes built-in instructions for floating-point arithmetic, which makes it well-suited for implementing a little language. The DynamicMethod also provides a fast and easy way to let you call into the resulting program.

The method createDynamicMethod compiles the AST by walking the AST and generating code. It begins by creating an instance of the DynamicMethod class to hold the IL you define to represent the method:

let temp = new DynamicMethod("", (type float), paramsTypes, meth.Module)

Next, createDynamicMethod starts walking the tree. When you encounter an identifier, you emit some code to load an argument of your dynamic method:

| Ident name ->
    il.Emit(OpCodes.Ldarg, paramNames.IndexOf(name))

When you encounter a literal, you emit the IL code to load the literal value:

| Val x -> il.Emit(OpCodes.Ldc_R8, x)

When you encounter an operation, you must recursively evaluate both expressions and then emit the instruction that represents the required operation:

| Multi (e1 , e2) ->
    generateIlInner e1
    generateIlInner e2
    il.Emit(OpCodes.Mul)

Note how the operation is emitted last, after both expressions have been recursively evaluated. You do it this way because IL is stack-based, so data from the other operations must be pushed onto the stack before you evaluate the operator.

You can see the complete compiler in Listing 12-7.

Example 12.7. Compiling an AST Generated from Command-Line Input

open System
open System.Collections.Generic
open System.Reflection
open System.Reflection.Emit
open Strangelights.Expression

// get a list of all the parameter names
let rec getParamList e =
    let rec getParamListInner e names =
        match e with
        | Ident name ->
            if not (List.exists (fun s -> s = name) names) then
                name :: names
            else
                names
        | Multi (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | Div (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | Plus (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
| Minus (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | _ -> names
    getParamListInner e []

// create the dyncamic method
let createDynamicMethod e (paramNames: string list) =
    let generateIl e (il : ILGenerator) =
        let rec generateIlInner e  =
            match e with
            | Ident name ->
                let index = List.findIndex (fun s -> s = name) paramNames
                il.Emit(OpCodes.Ldarg, index)
            | Val x -> il.Emit(OpCodes.Ldc_R8, x)
            | Multi (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Mul)
            | Div (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Div)
            | Plus (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Add)
            | Minus (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Sub)
        generateIlInner e
        il.Emit(OpCodes.Ret)

    let paramsTypes = Array.create paramNames.Length (typeof<float>)
    let meth = MethodInfo.GetCurrentMethod()
    let temp = new DynamicMethod("", (typeof<float>), paramsTypes, meth.Module)
    let il = temp.GetILGenerator()
    generateIl e il
    temp
// function to read the arguments from the command line
let collectArgs (paramNames : string list) =
    paramNames
    |> Seq.map
        (fun n ->
            printf "%s: " n
            box (float(Console.ReadLine())))
    |> Array.ofSeq

// the expression to be interpreted
let e = Multi(Val 2., Plus(Val 2., Ident "a"))

// get a list of all the parameters from the expression
let paramNames = getParamList e

// compile the tree to a dynamic method
let dm = createDynamicMethod e paramNames

// print collect arguments from the user
let args = collectArgs paramNames

// execute and print out the final result
printfn "result: %O" (dm.Invoke(null, args))

Compiling the code in Listing 12-7 produces the following results:

a: 14
result: 32

Compilation vs. Interpretation

You might be wondering when should you use compilation and when should you use interpretation. The final result is basically the same, so the answer generally comes down to the raw speed of the final generated code, though memory usage and start-up times can also play a role in the decision. If you need your code to execute more quickly, then compilation will generally give you better results, with some activities.

The test harness in Listing 12-8 enables you to execute the interpret function results of createDynamicMethod repeatedly and time how long this takes. It also tests an important variation on dynamic methods; that is where you also generate a new .NET delegate value to act as the handle by which you invoke the generated code. As you can see, it turns out that this is by far the fastest technique. Remember, you're timing how long it takes to evaluate the AST either directly or in a compiled form; you're not measuring the parse time or compilation time.

Example 12.8. A Test Harness for Comparing Performance

open System
open System.Diagnostics
open Strangelights.Expression

// expression to process
let e = Multi(Val 2., Plus(Val 2., Val 2.))

// collect the inputs
printf "Interpret/Compile/Compile Through Delegate [i/c/cd]: "
let interpertFlag = Console.ReadLine()
printf "reps: "
let reps = int(Console.ReadLine())

type Df0 = delegate of unit -> float
type Df1 = delegate of float -> float
type Df2 = delegate of float * float -> float
type Df3 = delegate of float * float * float -> float
type Df4 = delegate of float * float * float * float -> float

// run the tests
match interpertFlag with
| "i" ->
    let args = Interpret.getVariableValues e
    let clock = new Stopwatch()
    clock.Start()
    for i = 1 to reps do
        Interpret.interpret e args |> ignore
    clock.Stop()
    printf "%i" clock.ElapsedTicks
| "c" ->
    let paramNames = Compile.getParamList e
    let dm = Compile.createDynamicMethod e paramNames
    let args = Compile.collectArgs paramNames
    let clock = new Stopwatch()
    clock.Start()
    for i = 1 to reps do
        dm.Invoke(null, args) |> ignore
    clock.Stop()
    printf "%i" clock.ElapsedTicks
| "cd" ->
    let paramNames = Compile.getParamList e
    let dm = Compile.createDynamicMethod e paramNames
    let args = Compile.collectArgs paramNames
    let args = args |> Array.map (fun f -> f :?> float)
    let d =
        match args.Length with
        | 0 -> dm.CreateDelegate(typeof<Df0>)
        | 1 -> dm.CreateDelegate(typeof<Df1>)
        | 2 -> dm.CreateDelegate(typeof<Df2>)
        | 3 -> dm.CreateDelegate(typeof<Df3>)
        | 4 -> dm.CreateDelegate(typeof<Df4>)
        | _ -> failwith "too many parameters"
    let clock = new Stopwatch()
    clock.Start()
    for i = 1 to reps do
        match d with
        | :? Df0 as d -> d.Invoke() |> ignore
        | :? Df1 as d -> d.Invoke(args.[0]) |> ignore
        | :? Df2 as d -> d.Invoke(args.[0], args.[1]) |> ignore
        | :? Df3 as d -> d.Invoke(args.[0], args.[1], args.[2]) |> ignore
        | :? Df4 as d -> d.Invoke(args.[0], args.[1], args.[2], args.[4]) |> ignore
        | _ -> failwith "too many parameters"
    clock.Stop()
    printf "%i" clock.ElapsedTicks
| _ -> failwith "not an option"

Table 12-2 summarizes the results of executing this program against this expression: Multi(Val 2., Plus(Val 2., Val 2.)).

Table 12.2. Summary of Processing the Expression Multi(Val 2., Plus(Val 2., Val 2.)) for Various Numbers of Repetitions (times in milliseconds)

Repetitions

1

10

100

1,000

10,000

100,000

1,000,000

Interpreted

6,890

6,979

6,932

7,608

14,835

84,823

799,788

Compiled via delegate

8,65

856

854

1,007

2,369

15,871

151,602

Compiled

1,112

1,409

2,463

16,895

151,135

1,500,437

14,869,692

Table 12-2 and Figure 12-2 indicate that Compiled and Compiled via delegate are much faster over a small number of repetitions. But notice that over 1, 10, and 100 repetitions, the amount of time required grows negligibly. This is because over these small numbers of repetitions, the time taken for each repetition is insignificant. It is only the time that the JIT compiler takes to compile the IL code into native code that is significant. This is why the Compiled and Compiled via delegate times are so close. They both have a similar amount of code to JIT compile. The Interpreted time takes longer because you must JIT compile more code; specifically, the interpreter. But JIT is a one-off cost because you need to JIT each method only once; therefore, as the number of repetitions goes up, this one-off cost is paid for, and you begin to see a truer picture of the relative performance cost.

The evaluation time in machine ticks of the expression 1 + 1 against the number of evaluations of the express

Figure 12.2. The evaluation time in machine ticks of the expression 1 + 1 against the number of evaluations of the express

You can see clearly from Figure 11-2 that, as the number of repetitions goes up, the cost of Compiled goes up steeply. This is because accessing the compiled DynamicMethod through its Invoke method is expensive, and you incur this cost on every repetition. This means that the time taken for a Compiled method increases at the same rate as the number of repetitions. However, the problem lies not with compilation, but with how you invoke the compiled code. It turns out that calling a DynamicMethod through a delegate rather than the Invoke member on the dynamic delegate allows you to pay only once for the cost of binding to the method, so executing a DynamicMethod this way is much more efficient if you intend to evaluate the expression multiple times. Based on these results, compilation with invocation via a delegate provides the best option in terms of speed.

This analysis shows the importance of measurement: don't assume that compilation has given you the expected performance gains until you actually see the benefits on realistic data sets and have used all the available techniques to ensure no unnecessary overhead is lurking in your code. However, many other factors can affect your performance in the real world. For example, if your expressions change often, your interpreter will need to be JIT compiled only once, but each compiled expression will need to be to JIT compiled, which means you'll need to run your compiled code many times if you want to see any performance gains. Given that interpreted code is usually easier to implement and that compiled code provides significant performance gains only in certain situations, interpreted code is often your best choice.

When dealing with situations that require code to perform as quickly as possible, it's generally best to try a few different approaches and then profile your application to see which approach gives the best results. You can find more information about performance profiling in Chapter 12.

Summary

In this chapter, you looked at the main features and techniques for language-oriented programming in F#. You have seen various techniques; some use data structures as little languages or work with quotations, which involve working with the existing F# syntax to change or extend it. Others, such as implementing a parser, enable you to work with just about any language that is text-based, whether this language is of your own design (or perhaps more commonly) a preexisting language. All these techniques can lead to big productivity gains if you use them correctly.

The next chapter will look at how to parse text using F#. This will allow you to create languages that are not embedded within F#, yet will allow you to work with the main text-based formats that are defined.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset