A Domain-Specific Language is a programming language that mimics the terms, idioms, and expressions used among experts in the targeted domain. Code written in a DSL reads like structured prose for the domain. Ideally, a domain expert with little experience in programming can read, understand, and validate this code. Sometimes, a domain expert might be able to write DSL code, even if he isn’t a professional programmer.
DSLs are a large topic. We’ll
only touch the surface of DSLs and Scala’s impressive support for them. For
more information on DSLs in general, see [Fowler2009], [Ford2009], and [Deursen]. The basic build tool we used for
the book’s examples, sake
, uses a DSL similar to the
venerable make
and its Ruby cousin
rake
. (See the README in the code download
archive for details.) For other examples of Scala “internal” and
“external” DSLs, see [Ghosh2008a] and [Ghosh2008b]. For some advanced work on
DSLs using Scala, [Hofer2008] explores polymorphic
substitution of alternative implementations for DSL abstractions, which is
useful for analysis, optimization, composition, etc.
Well-crafted DSLs offer several benefits:
A DSL hides implementation details and exposes only those abstractions relevant to the domain.
Because implementation details are encapsulated, a DSL optimizes the effort required to write or modify code for application features.
A DSL helps developers understand the domain and domain experts to verify that the implementation meets the requirements.
A DSL minimizes the “impedance mismatch” between feature requirements, as expressed by domain experts, and the implementing source code, thereby minimizing potential bugs.
However, DSLs also have several drawbacks:
Good DSLs are harder to design than traditional APIs. The latter tend to follow language idioms for API design, where uniformity is important. Even then, elegant, effective, and easy-to-use APIs are difficult to design. In contrast, each DSL should reflect the unique language idioms of its domain. The DSL designer has much greater latitude, which also means it is much harder to determine the “best” design choices.
DSLs can require more maintenance over the long term to factor in domain changes. Also, new developers will require more time to learn how to use and maintain a DSL.
However, when a DSL is appropriate for an application—e.g., when it would be used frequently to implement and change functionality—a well-designed DSL can be a powerful tool for building flexible and robust applications.
From the implementation point of view, DSLs are often classified as internal and external.
An internal (sometimes called embedded) DSL is an idiomatic way of writing code in a general-purpose programming language, like Scala. No special-purpose parser is necessary for internal DSLs. Instead, they are parsed just like any other code written in the language. In contrast, an external DSL is a custom language with its own custom grammar and parser.
Internal DSLs are easier to create because they don’t require a special-purpose parser. On the other hand, the constraints of the underlying language limit the options for expressing domain concepts. External DSLs remove this constraint. You can design the language any way you want, as long as you can write a reliable parser for it. The downside of external DSLs is the requirement to write and use a custom parser.
DSLs have been around a long time. For example, internal DSLs written in Lisp are as old as Lisp itself. Interest in DSLs has surged recently, driven in part by the Ruby community, because they are very easy to implement in Ruby. As we’ll see, Scala provides excellent support for the creation of internal and external DSLs.
Let’s create an internal DSL for a payroll application that computes an employee’s paycheck every pay period, which will be two weeks long. The paycheck will include the employee’s net salary, which is the gross salary minus the deductions for taxes, insurance premiums (at least in some countries), retirement fund contributions, etc.
To better understand the contrasts between code that makes use of DSLs and code that does not, let’s try both techniques on the same problem. Here’s how the paycheck might be calculated for two employees, without the help of a DSL:
// code-examples/DSLs/payroll/api/payroll-api-script.scala
import
payroll.api._import
payroll.api.DeductionsCalculator._import
payroll._import
payroll.Type2Money._val
buck =Employee
(Name
("Buck"
,"Trends"
),Money
(80000
))val
jane =Employee
(Name
("Jane"
,"Doe"
),Money
(90000
))List
(buck, jane).foreach { employee=>
// Assume annual is based on 52 weeks.
val
biweeklyGross = employee.annualGrossSalary /26.
val
deductions = federalIncomeTax(employee, biweeklyGross) + stateIncomeTax(employee, biweeklyGross) + insurancePremiums(employee, biweeklyGross) + retirementFundContributions(employee, biweeklyGross)val
check =Paycheck
(biweeklyGross, biweeklyGross - deductions, deductions) format("%s %s: %s
"
, employee.name.first, employee.name.last, check) }
For each employee, the
script calculates the gross pay for the pay period, the deductions, and
the resulting net. These values are placed into a
Paycheck
, which is printed out. Before we describe the
types we are using, notice a few things about the
foreach
loop that does the work.
First, it is noisy. For
example, it mentions employee
and
biweeklyGross
incessantly. A DSL will help us minimize
that “noise” and focus on what’s really going on.
Second, notice that the code is imperative. It says “divide this, add that,” and so forth. We’ll see that our DSLs look similar, but they are more declarative, hiding the work from the user.
Here is the simple
Paycheck
class used in the script:
// code-examples/DSLs/payroll/paycheck.scala
package
payroll/** We're ignoring invalid (?) cases like a negative net
* when deductions exceed the gross.
*/
case
class
Paycheck
(gross:Money
, net:Money
, deductions:Money
) {def
plusGross
(m:Money
) =Paycheck
(gross + m, net + m, deductions)def
plusDeductions
(m:Money
) =Paycheck
(gross, net - m, deductions + m) }
The
Employee
type uses a Name
type:
// code-examples/DSLs/payroll/employee.scala
package
payrollcase
class
Name
(first:String
, last:String
)case
class
Employee
(name:Name
, annualGrossSalary:Money
)
The Money
type handles arithmetic, rounding to four decimal places, etc. It ignores
currency, except for the toString
method. Proper
financial arithmetic is notoriously difficult to do correctly for
real-world transactions. This implementation is not perfectly accurate,
but it’s close enough for our purposes. [MoneyInJava] provides useful information
on doing real money calculations:
// code-examples/DSLs/payroll/money.scala
package
payrollimport
java.math.{BigDecimal => JBigDecimal, MathContext => JMathContext, RoundingMode => JRoundingMode}/** Most arithmetic is done using JBigDecimals for tighter control.
*/
class
Money
(val
amount:BigDecimal
) {def
+
(m:Money
) =Money
(amount.bigDecimal.add(m.amount.bigDecimal))def
-
(m:Money
) =Money
(amount.bigDecimal.subtract(m.amount.bigDecimal))def
*
(m:Money
) =Money
(amount.bigDecimal.multiply(m.amount.bigDecimal))def
/
(m:Money
) =Money
(amount.bigDecimal.divide(m.amount.bigDecimal, Money.scale, Money.jroundingMode))def
<
(m:Money
) = amount < m.amountdef
<
= (m:Money
) = amount <= m.amountdef
>
(m:Money
) = amount > m.amountdef
>
= (m:Money
) = amount >= m.amountoverride
def
equals
(o:Any
) = omatch
{case
m:Money => amount
equals m.amountcase
_
=>
false
}override
def
hashCode
= amount.hashCode *31
// Hack: Must explicitly call the correct conversion: double2Double
override
def
toString
= String.format("$%.2f"
, double2Double(amount.doubleValue)) }object
Money
{def
apply
(amount:BigDecimal
) =new
Money
(amount)def
apply
(amount:JBigDecimal
) =new
Money
(scaled(new
BigDecimal
(amount)))def
apply
(amount:Double
) =new
Money
(scaled(BigDecimal
(amount)))def
apply
(amount:Long
) =new
Money
(scaled(BigDecimal
(amount)))def
apply
(amount:Int
) =new
Money
(scaled(BigDecimal
(amount)))def
unapply
(m:Money
) =Some
(m.amount)protected
def
scaled
(d:BigDecimal
) = d.setScale(scale, roundingMode)val
scale =4
val
jroundingMode = JRoundingMode.HALF_UP
val
roundingMode = BigDecimal.RoundingMode.ROUND_HALF_UP
val
context =new
JMathContext
(scale, jroundingMode) }object
Type2Money
{implicit
def
bigDecimal2Money
(b:BigDecimal
) =Money
(b)implicit
def
jBigDecimal2Money
(b:JBigDecimal
) =Money
(b)implicit
def
double2Money
(d:Double
) =Money
(d)implicit
def
long2Money
(l:Long
) =Money
(l)implicit
def
int2Money
(i:Int
) =Money
(i) }
Note that we use
scala.BigDecimal
, which wraps
java.math.BigDecimal
, as the storage type for financial
figures.
Deductions are calculated
using four helper methods in
payroll.api.DeductionsCalculator
:
// code-examples/DSLs/payroll/api/deductions-calc.scala
package
payroll.apiimport
payroll.Type2Money._object
DeductionsCalculator
{def
federalIncomeTax
(empl:Employee
, gross:Money
) = gross *.25
def
stateIncomeTax
(empl:Employee
, gross:Money
) = gross *.05
def
insurancePremiums
(empl:Employee
, gross:Money
) =Money
(500
)def
retirementFundContributions
(empl:Employee
, gross:Money
) = gross *.10
}
Each method might use the employee information and the gross salary for the pay period. In this case, we use very simple algorithms based on just the gross salary, except for insurance premiums, which we treat as a fixed value.
Running the script for the payroll API produces the following output:
(665) $ scala -cp ... payroll-api-script.scala Buck Trends: Paycheck($3076.92,$1346.15,$1730.77) Jane Doe: Paycheck($3461.54,$1576.92,$1884.62)
The previous code works well enough, but suppose we wanted to show it to the Accounting Department to confirm that we’re calculating paychecks correctly. Most likely, they would get lost in the Scala idioms. Suppose further that we need the ability to customize this algorithm frequently—for example, because it needs to be customized for different employee types (salaried, hourly, etc.), or to modify the deduction calculations. Ideally, we would like to enable the accountants to do these customizations themselves, without our help.
We might achieve these goals if we can express the logic in a DSL that is sufficiently intuitive to an accountant. Can we morph our API example into such a DSL?
Returning to the script for the payroll API, what if we hide most of the explicit references to context information, like the employee, gross salary, and deduction values? Consider the following text:
Rules to calculate an employee's paycheck: employee's gross salary for 2 weeks minus deductions for federalIncomeTax, which is 25% of gross stateIncomeTax, which is 5% of gross insurancePremiums, which are 500. in gross's currency retirementFundContributions are 10% of gross
This reads like normal
English, not code. We have included some “bubble” words (see [Ford2009]) that aid
readability but don’t necessarily correspond to anything essential, such
as to
, an
, is
,
for
, of
, and
which
. We’ll eliminate some of these unnecessary
words and keep others in our Scala DSL.
Compared to the version in
the payroll API script, there’s a lot less clutter obscuring the
essentials of the algorithm. This is because we have minimized explicit
references to the contextual information. We only mention
employee
twice. We mention gross
five times, but hopefully in “intuitive” ways.
There are many possible internal Scala DSLs we could construct that resemble this ad hoc DSL. Here is one of them, again in a script, which produces the same output as before:
// code-examples/DSLs/payroll/dsl/payroll-dsl-script.scala
import
payroll._import
payroll.dsl._import
payroll.dsl.rules_val
payrollCalculator = rules { employee=>
employee salary_for2.
weeks minus_deductions_for { gross=>
federalIncomeTax is (25.
percent_of gross) stateIncomeTax is (5.
percent_of gross) insurancePremiums are (500.
in gross.currency) retirementFundContributions are (10.
percent_of gross) } }val
buck =Employee
(Name
("Buck"
,"Trends"
),Money
(80000
))val
jane =Employee
(Name
("Jane"
,"Doe"
),Money
(90000
))List
(buck, jane).foreach { employee=>
val
check = payrollCalculator(employee) format("%s %s: %s
"
, employee.name.first, employee.name.last, check) }
We’ll go through the implementation step by step, but first, let’s summarize the features of Scala that allow us to implement this DSL.
Consider this line in
the definition of payrollCalculator
:
employee salary_for2.
weeks minus_deductions_for { gross=>
This infix notation is equivalent to the following less-readable form:
employee.salary_for(2.
weeks).minus_deductions_for { gross=>
You can see why we wrote
2.weeks
earlier, because the result of this
expression is passed to salary_for
. Without the
period, the infix expression would be parsed as
employee.salary_for(2).weeks...
. There is no
weeks
method on Int
, of course.
We’ll revisit this expression in a moment.
Method chaining like this
is often implemented where each method returns this
so you can continue calling methods on the same instance. Note that
returning this
allows those method calls to occur in
any order. If you need to impose a specific ordering, then return an
instance of a different type. For example, if
minus_deductions_for
must be called after
salary_for
, then salary_for
should
return a new instance.
Because chaining is so
easy, we could have created separate methods for
salary
, for
,
minus
, and deductions
, allowing us
to write the following expression:
employee salaryfor
2.
weeks minus deductionsfor
{ gross=>
Note that calls to
for
are preceded by different calls with very
different meanings. So, if the same instance is used throughout, it
would have to track the “flow” internally. Chaining different instances
would eliminate this problem. However, since no computations are
actually needed between these words, we chose the simpler design where
words are joined together, separated by _
.
Returning to
2.weeks
, since Int
doesn’t have a
weeks
method, we use an implicit conversion to a
Duration
instance that wraps an
Int
specifying an amount:
// code-examples/DSLs/payroll/dsl/duration.scala
package
payroll.dslcase
class
Duration
(val
amount:Int
) {/**
@return
the number of work days in "amount" weeks. */
def
weeks
= amount *5
/**
@return
the number of work days in "amount" years. */
def
years
= amount *260
}
The
weeks
method multiples that amount by 5 to return the
corresponding amount of work days. Hence, we designed the payroll
calculator to work with days as the unit of time. This decision is
completely hidden behind the DSL. Should we later add support for work
hours, it would be easy to refactor the design to use hours
instead.
Duration
is
one of the ad hoc types that we designed to encapsulate the implicit
context, to implement helper methods for the DSL, etc. We’ll discuss the
implicit conversion method we need in a moment.
A number of the
implementation objects use apply
to invoke behavior.
The rules
object encapsulates the process of building
the rules for payroll calculation. Its apply
method
takes a function literal, Employee =>
Paycheck
.
Now let’s explore the
implementation, starting with the rules
object and
working our way down:
// code-examples/DSLs/payroll/dsl/payroll.scala
package
payroll.dslimport
payroll._object
rules
{def
apply
(rules:Employee => Paycheck
) =new
PayrollBuilderRules
(rules)implicit
def
int2Duration
(i:Int
) =Duration
(i)implicit
def
employee2GrossPayBuilder
(e:Employee
) =new
GrossPayBuilder
(e)implicit
def
grossPayBuilder2DeductionsBuilder
(b:GrossPayBuilder
) =new
DeductionsBuilder
(b)implicit
def
double2DeductionsBuilderDeductionHelper
(d:Double
) =new
DeductionsBuilderDeductionHelper
(d) }import
rules._ ...
The function literal
argument for rules.apply
is used to construct a
PayrollBuilderRules
that will process the specified
rules. It is used at the very beginning of the DSL.
val
payrollCalculator = rules { employee=>
...
The
rules
object also defines implicit conversions. The
first one is used by the 2.weeks
expression. It
converts 2
into a Duration
instance, which we discussed previously. The other conversions are used
later in the DSL to enable transparent conversion of
Doubles
, Employees
, etc. into
wrapper instances that we will describe shortly.
Note that the
rules
object is imported so these conversions are
visible in the rest of the current file. It will also need to be
imported in files that use the DSL.
The
PayrollBuilderRules
is our first wrapper instance. It
evaluates the function literal for the whole rule set, wrapped in a
try/catch
block:
// code-examples/DSLs/payroll/dsl/payroll.scala
...class
PayrollException
(message:String
, cause:Throwable
)extends
RuntimeException
(message, cause)protected
[dsl]class
PayrollBuilderRules
(rules:Employee => Paycheck
) {def
apply
(employee:Employee
) = {try
{ rules(employee) }catch
{case
th:Throwable => new
PayrollException
("Failed to process payroll for employee: "
+ employee, th) } } } ...
Note that we protect
access to PayrollBuilderRules
, because we don’t want
clients using it directly. However, we left the exception public for use
in catch
clauses. (You can decide whether or not you
like wrapping a thrown exception in a “domain-specific” exception, as
shown.)
Note that we have to pass
the employee as a “context” instance in the function literal. We said
that it is desirable to make the context as implicit as possible. A
common theme in our implementation classes, like
PayrollBuilderRules
, is to hold context information
in wrapper instances and to minimize their visibility in the DSL. An
alternative approach would be to store context in singleton objects so
other instances can get to them. This approach raises thread safety
issues, unfortunately.
To see what we mean concerning the context, consider the part of our script that uses the payroll DSL, where the deductions are specified:
... { gross=>
federalIncomeTax is (25.
percent_of gross) stateIncomeTax is (5.
percent_of gross) insurancePremiums are (500.
in gross.currency) retirementFundContributions are (10.
percent_of gross) }
Consider the insurance
premiums, for which a flat Money(500)
is deducted.
Why didn’t we just write insurancePremiums are 500.
,
instead? It turns out we have to “sneak” the gross
instance into the expression somehow. The name gross
implies that it is a Money
representing the
employee’s salary for the pay period. Tricksey
DSLses!! It is actually another helper instance,
DeductionsBuilder
, which holds the whole paycheck,
including the gross pay, and the employee instance. The name
gross
is used merely because it reads well in the
places where it is used.
This block is calculating
the deductions and deducting them from the gross pay to determine the
net pay. The gross
instance handles this process.
There is no “communication” between the four lines of the function
literal. Furthermore, federalIncomeTax
,
insurancePremiums
, etc. are objects with no
connection to DeductionsBuilder
(as we’ll see
shortly). It would be great if they could be members of
DeductionsBuilder
or perhaps some other wrapper
instance enclosing this scope. Then each line would be a method call on
one or the other wrapper. Unfortunately, this doesn’t work. Hence, each
line must specify the gross
instance to maintain
continuity. We jump through various hoops to support the syntax, yet
allow gross
to be available, as needed.
So, we contrived the
convention that “raw” numbers, like the insurance deduction, have to be
qualified by the particular currency used for the gross pay. We’ll see
how the expression 500. in gross.currency
works in a
moment. It is something of a hack, but it reads well and it solves our
design problem.
Here is a possible alternative design that would have avoided the problem:
... { builder=>
builder federalIncomeTax (25.
percent_of gross) builder stateIncomeTax (5.
percent_of gross) builder insurancePremiums500.
builder retirementFundContributions (10.
percent_of gross) }
Now the fact that a
builder
is being used is more explicit, and
federalIncomeTax
, insurancePremiums
, etc. are methods on
the builder. We opted for a more readable style, with the penalty of a
harder implementation. You’ll sometimes hear the phrase fluent
interface used to refer to DSLs that emphasize
readability.
// code-examples/DSLs/payroll/dsl/payroll.scala
...import
payroll.Type2Money._protected
[dsl]class
GrossPayBuilder
(val
employee:Employee
) {var
gross:Money
=0
def
salary_for
(days:Int
) = { gross += dailyGrossSalary(employee.annualGrossSalary) * daysthis
}// Assume 260 working days: 52 weeks (including vacation) * 5 days/week.
def
weeklyGrossSalary
(annual:Money
) = annual /52.0
def
dailyGrossSalary
(annual:Money
) = annual /260.0
} ...
Recall that
rules
defines an implicit conversion from
Employee
to this type. It is invoked by the
expression employee salary_for
, so the
GrossPayBuilder.salary_for
method can be called.
GrossPayBuilder
initializes the
gross
and appends new values to it whenever
salary_for
is called, which assumes we’re adding
gross pay in increments of days. Finally, salary_for
returns this
to support chaining.
Deduction calculation is
the most complex part. When minus_deductions_for
is
used in the DSL, it triggers the implicit conversion defined in
rules
from the GrossPayBuilder
to
a DeductionsBuilder
:
// code-examples/DSLs/payroll/dsl/payroll.scala
...protected
[dsl]class
DeductionsBuilder
(gpb:GrossPayBuilder
) {val
employee = gpb.employeevar
paycheck:Paycheck
=new
Paycheck
(gpb.gross, gpb.gross,0
)def
currency
=this
def
minus_deductions_for
(deductionRules:DeductionsBuilder => Unit
) = { deductionRules(this
) paycheck }def
addDeductions
(amount:Money
) = paycheck = paycheck plusDeductions amountdef
addDeductionsPercentageOfGross
(percentage:Double
) = {val
amount = paycheck.gross * (percentage/100.
) addDeductions(amount) } } ...
DeductionsBuilder
saves the employee
from the passed-in
GrossPayBuilder
, which it doesn’t save as a field. It
also initializes the paycheck
using the calculated
gross pay.
Note that the
currency
method simply returns
this
. We don’t need to do anything with the actual
currency when this method is invoked. Instead, it is used to support a
design idiom that we’ll discuss shortly.
The
minus_deductions_for
does the important work. It
invokes the function literal with the individual rules and then returns
the completed Paycheck
instance, which is ultimately
what rules.apply
returns.
Our remaining two methods
are used to calculate individual deductions. They are called from
DeductionsBuilderDeductionHelper
, which we show
now:
// code-examples/DSLs/payroll/dsl/payroll.scala
...class
DeductionCalculator
{def
is
(builder:DeductionsBuilder
) = apply(builder)def
are
(builder:DeductionsBuilder
) = apply(builder)def
apply
(builder:DeductionsBuilder
) = {} }object
federalIncomeTax
extends
DeductionCalculator
object
stateIncomeTax
extends
DeductionCalculator
object
insurancePremiums
extends
DeductionCalculator
object
retirementFundContributions
extends
DeductionCalculator
protected
[dsl]class
DeductionsBuilderDeductionHelper
(val
factor:Double
) {def
in
(builder:DeductionsBuilder
) = { builder addDeductionsMoney
(factor) builder }def
percent_of
(builder:DeductionsBuilder
) = { builder addDeductionsPercentageOfGross factor builder } }
Now we see that
federalIncomeTax
, etc. are singleton objects. Note
the “synonym” methods is
and are
.
We used are
for the objects with plural names, like
insurancePremiums
, and is
for the
singular objects, like federalIncomeTax
. In fact,
since both methods delegate to apply
, they are
effectively bubble words that the user could omit. That is, the
following two DSL lines are equivalent:
federalIncomeTax is (25.
percent_of gross) federalIncomeTax (25.
percent_of gross)
The apply
method takes DeductionsBuilder
and does nothing with
it! In fact, by the time apply
is called, the
deduction has already been calculated and factored into the paycheck. By
implication, the presence of expressions like federalIncomeTax
is
are effectively syntactic sugar (at least as this DSL is
currently implemented). They are a fancy form of comments, but at least
they have the virtue of type checking the “kinds” of deductions that are
allowed. Of course, as the implementation evolves, these instances might
do real work.
To see why
DeductionCalculator.apply
is empty, let’s discuss
DeductionsBuilderDeductionHelper
. Recall that the
rules
object has an implicit conversion method to
convert a Double
to a
DeductionsBuilderDeductionHelper
. Once we have a
helper instance, we can call either the in
method or
the percent_of
method. Every line in the deductions
function literal exploits this instance.
For example, (25.
percent_of gross)
is roughly equivalent to the following
steps:
Call to
rules.double2DeductionsBuilderDeductionHelper(25.)
to create a new
DeductionsBuilderDeductionHelper(25.)
Call to the helper’s percent_of(gross)
method, where gross
is a
DeductionsBuilder
gross.addDeductionsPercentageOfGross(factor)
In other words, we used
DeductionsBuilderDeductionHelper
to convert an
expression of the form Double method
DeductionsBuilder
into an expression of the form DeductionsBuilder method2 Double
.
DeductionsBuilder
accumulates each deduction into the
paycheck we’re building.
The expression
500. in gross.currency
works almost identically.
DeductionsBuilder.currency
is effectively another
bubble word; it simply returns this
, but gives a
readable idiom for the DSL. The in
method simply
converts the Double
to a Money
and
passes it to DeductionsBuilder.addDeductions
.
So
DeductionCalculator.apply
does nothing, because all
the work is already done by the time apply
is
called.
So which is better, the original API implementation or the DSL implementation? The DSL implementation is complex. Like any language, testing its robustness can be a challenge. Users will try many combinations of expressions. They will probably not understand the compiler error messages that refer to the internals we’ve hidden behind the DSL.
Designing a quality DSL is difficult. With an API, you can follow the Scala library conventions for types, method names, etc. However, with a DSL, you’re trying to imitate the language of a new domain. It’s hard to get it right.
It’s worth the effort, though. A well-designed DSL minimizes the translation effort between requirements and code, thereby improving communications with stakeholders about requirements. DSLs also facilitate rapid feature change and hide distracting implementation details. As always, there is a cost/benefit analysis you should make when deciding whether to use a DSL.
Assuming you’ve made the “go” decision, a common problem in DSL design is the finishing problem (see [Ford2009]). How do you know when you’ve finished building up the state of an instance and it’s ready to use?
We solved this problem in
two ways. First, we nested the calculation steps in a function literal.
As soon as rules(employee)
was invoked, the paycheck
was built to completion. Also, all the steps were evaluated “eagerly.”
We didn’t need to put in all the rules, then run them at the end. Our
only ordering requirement was the need to calculate the gross pay first,
since the deductions are based on it. We enforced the correct order of
invocation using instances of different types.
There are cases in which you can’t evaluate the build steps eagerly. For example, a DSL that builds up a SQL query string can’t run a query after each step of the build process. In this case, evaluation has to wait until the query string is completely built.
By contrast, if your DSL
steps are stateless, chained method invocation works just fine. In this
case, it doesn’t matter when you stop calling chained methods. If you
chain methods that build up state, you’ll have to add some sort of
done
method and trust the users to always use it at
the end.
When you write a parser for an external DSL, you can use a parser generator tool like Antlr (see [Antlr]). However, the Scala library includes a powerful parser combinator library that can be used for parsing most external DSLs that have a context-free grammar. An attractive feature of this library is the way it defines an internal DSL that makes parser definitions look very similar to familiar grammar notations, like EBNF (Extended Backus-Naur Form—see [BNF]).
Parser combinators are building blocks for parsers. Parsers that handle specific kinds of input—floating-point numbers, integers, etc.—can be combined together to form other parser combinators for larger expressions. A combinator framework makes it easy to combine parsers to handle sequential and alternative cases, repetition, optional terms, etc.
We’ll learn more about parsing techniques and terminology as we proceed. A complete exposition of parsing techniques is beyond our scope, but our example should get you started. You can find additional examples of parsers written using Scala’s parser combinator library in [Spiewak2009b], [Ghosh2008a], and [Odersky2008].
For our parser combinator example, we’ll reuse the example we just discussed for internal DSLs. We’ll modify the grammar slightly, since our external DSL does not have to be valid Scala syntax. Other changes will make parser construction easier. Here’s an example written in the external DSL:
paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for { federal income tax is 25. percent of gross, state income tax is 5. percent of gross, insurance premiums are 500. in gross currency, retirement fund contributions are 10. percent of gross }
Compare this example to the internal DSL we defined in A Payroll Internal DSL:
... = rules { employee=>
employee salary_for2.
weeks minus_deductions_for { gross=>
federalIncomeTax is (25.
percent_of gross) stateIncomeTax is (5.
percent_of gross) insurancePremiums are (500.
in gross.currency) retirementFundContributions are (10.
percent_of gross) } } ...
In our new DSL, we insert a specific employee in the script. We wouldn’t expect a user to copy and paste this script for every employee. A natural extension that we won’t pursue would allow the user to loop over all salaried employees in a database, for example.
Some of the differences are “gratuitous”; we could have used the same syntax we used previously. These changes include removing underscores between words in some expressions and expanding camel-case words into space-separated words. That is, we turned some single words into multi-word expressions. We made these changes because they will be easy to implement using parser combinators, but using the same multi-word expressions would have added a lot of complexity to the internal DSL’s implementation.
We no longer need “local
variables” like employee
and
gross
. Those words still appear in the DSL, but our
parser will keep track of the corresponding instances internally.
The remaining changes are punctuation. It is still convenient to surround the list of deductions with curly braces. We now use a comma to separate the individual deductions, as that will make the parser’s job easier. We can also drop the parentheses we used earlier.
To see how closely the internal DSL for Scala’s parser combinator library resembles the context-free grammar, let’s start with the grammar itself, written in a variation of EBNF. We’ll omit commas to separate sequences, for clarity:
paycheck = empl gross deduct; empl = "paycheck" "for" "employee" employeeName; gross = "is" "salary" "for" duration; deduct = "minus" "deductions" "for" "{" deductItems "}"; employeeName = """ name " " name """; name = ... duration = decimalNumber weeksDays; weeksDays = "week" | "weeks" | "day" | "days"; deductItems = Ɛ | deductItem { "," deductItem }; deductItem = deductKind deductAmount; deductKind = tax | insurance | retirement; tax = fedState "income" "tax"; fedState = "federal" | "state"; insurance = "insurance" "premiums"; retirement = "retirement" "fund" "contributions"; deductAmount = percentage | amount; percentage = toBe doubleNumber "percent" "of" "gross"; amount = toBe doubleNumber "in" "gross" "currency"; toBe = "is" | "are"; decimalNumber = ... doubleNumber = ...
We can see that most of
the terminals (the literal strings
paycheck
, for
,
employee
, the characters {
and
}
, etc.) will be bubble words, as defined in the
previous section. We’ll ignore these after parsing. The Ɛ is used to
indicate an empty production for deductItems
,
although there will rarely be no deductions!
We didn’t spell out the details for decimal numbers, double numbers, and allowed letters in the employee names. We simply elided those definitions. We’ll handle the details later.
Each line in the grammar defines a production rule. The end of the definition is marked with a semicolon. A nonterminal appears on the lefthand side of the equals sign. The righthand side consists of terminals (e.g., the literal strings and characters we just mentioned) that require no further parsing, other nonterminals (including possibly a recursive reference to the lefthand side nonterminal), and operators that express relationships between the items. Notice that the grammar forms have a hierarchical decomposition, although not a directed acyclic graph, as generally speaking these grammars can have cycles.
We have a context-free grammar because every production rule has a single nonterminal on the lefthand side of the equals sign, i.e., without any additional context information required to specify the production’s applicability and meaning.
Production rules like
toBe = "is" | "are"
mean the is
production (a terminal in this case) or the
are
production will match. This is an example of an
alternative composition.
When productions are
separated by white space on the righthand side of another production,
e.g., prod1 prod2
, both productions are required to
appear sequentially for a match. (Most EBNF formats actually require a
comma to separate these items.) Hence, these expressions are more like
“and” expressions, but sequential composition is so
common that no &
operator is used, the analog of
|
for alternative composition.
The production rule with
"{" deductItem { "," deductItem } "}"
demonstrates
how to specify optional (zero or more) repetitions. This expression
matches a literal {
character, followed by a
deductItem
(another production), followed by zero or
more expressions consisting of a literal comma ,
and
another deductItem
, and finally ending with a literal
}
character. Sometimes an asterisk is used to
indicate repetition zero or more times, e.g., prod *
.
For repetition at least once, prod +
is sometimes
used.
Finally, if we had
optional items in our grammar, we would enclose them in square brackets,
[...]
. There are other kinds of composition operators
possible (and supported in the
Scala library), a few of which we’ll discuss. See the [ScalaAPI2008] entry for
Parsers
for more details.
Here is the parser
written using Scala’s parser combinators. At this point, we won’t do
anything to actually calculate an employee’s paycheck, so we’ll append
V1
to the class name:
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-v1.scala
package
payroll.pcdslimport
scala.util.parsing.combinator._import
org.specs._import
payroll._import
payroll.Type2Money._class
PayrollParserCombinatorsV1
extends
JavaTokenParsers
{def
paycheck
= empl ~ gross ~ deductdef
empl
="paycheck"
~>"for"
~>"employee"
~> employeeNamedef
gross
="is"
~>"salary"
~>"for"
~> durationdef
deduct
="minus"
~>"deductions"
~>"for"
~>"{"
~> deductItems <~"}"
// stringLiteral provided by JavaTokenParsers
def
employeeName
= stringLiteral// decimalNumber provided by JavaTokenParsers
def
duration
= decimalNumber ~ weeksDaysdef
weeksDays
="weeks"
|"week"
|"days"
|"day"
def
deductItems
= repsep(deductItem,","
)def
deductItem
= deductKind ~> deductAmountdef
deductKind
= tax | insurance | retirementdef
tax
= fedState <~"income"
<~"tax"
def
fedState
="federal"
|"state"
def
insurance
="insurance"
~>"premiums"
def
retirement
="retirement"
~>"fund"
~>"contributions"
def
deductAmount
= percentage | amountdef
percentage
= toBe ~> doubleNumber <~"percent"
<~"of"
<~"gross"
def
amount
= toBe ~> doubleNumber <~"in"
<~"gross"
<~"currency"
def
toBe
="is"
|"are"
// floatingPointNumber provided by JavaTokenParsers
def
doubleNumber
= floatingPointNumber }
The body of
PayrollParserCombinatorsV1
looks very similar to the
grammar we defined for the DSL. Each production rule becomes a method.
The terminating semicolon is dropped, but since the production is a
method, it would be valid Scala to leave it in.
Where we had whitespace
between each production on the righthand side, we now use a combinator
operator, either ∼
, ∼>
, or
<∼
. The combinator for sequential composition is
∼
, used when we want to retain for further processing
the results produced by both productions on the left and right sides of
the ∼
. For example, when processing the
paycheck
production, we want to keep all three
results from empl
, gross
, and
deduct
. Hence we use two ∼
operators:
def
paycheck
= empl ~ gross ~ deduct
We use another sequential
composition combinator ∼>
when we no longer need
the result of the production to the left. For
example, when processing the empl
production, we only
want to keep the parse result for the last production,
employeeName
:
def
empl
="paycheck"
~>"for"
~>"employee"
~> employeeName
Similarly, we use
<∼
when we no longer need the result for the
production to the right. For example, when
processing the tax
production, we only want to keep
the result of the first production, fedState
:
def
tax
= fedState <~"income"
<~"tax"
Our heavy use of the
<∼
sequential combinator in the various
productions related to deductions indicates that we aren’t keeping track
of the source of each deduction, just the amount of the deduction. A
real paycheck application would print this information, of course. Our
aim is for simplicity. As an exercise, consider how
PayrollParserCombinatorsV1
and the subsequent
refinements below would change if we tracked this information. Would you
necessarily keep the parsed strings or track the information some other
way?
The “or” case is expressed
with the |
method, just as in the grammar:
def
weeksDays
="weeks"
|"week"
|"days"
|"day"
The rep
method can be used for zero or more repetitions. We actually use a
similar method, repsep
, which lets us specify a
separator, in our case a comma (,
):
def
deduct
= ... ~>"{"
~> repsep(deductItem,","
) <~"}"
Note that
deduct
combines several features we have just
described.
Like repetition, there is
an opt
method for optional terms, which we aren’t
using.
PayrollParserCombinatorsV1
inherits from JavaTokenParsers
, which inherits from
RegexParsers
, which inherits from the root parser
trait Parsers
. It’s well known that parsing
non-trivial grammars with just regular expressions tends to break down
pretty quickly. However, using regular expressions to parse individual
terms inside a parsing framework can be very effective. In our example,
we exploit the productions in JavaTokenParsers
to parse quoted strings
(for the employee’s name), decimal literals, and floating-point
literals.
Let’s try it out! Here is a specification that exercises the parser for two cases, without and with deductions:
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-v1-spec.scala
package
payroll.pcdslimport
scala.util.parsing.combinator._import
org.specs._import
payroll._import
payroll.Type2Money._object
PayrollParserCombinatorsV1Spec
extends
Specification
("PayrollParserCombinatorsV1"
) {"PayrollParserCombinatorsV1"
should {"parse rules when there are no deductions"
in {val
input ="""paycheck for employee "Buck Trends"
is salary for 2 weeks minus deductions for {}"""
val
p =new
PayrollParserCombinatorsV1
p.parseAll(p.paycheck, input)match
{case
p.Success
(r,_
)=>
r.toString mustEqual"""(("Buck Trends"~(2~weeks))~List())"""
case
x=>
fail(x.toString) } }"calculate the gross, net, and deductions for the pay period"
in {val
input ="""paycheck for employee "Buck Trends"
is salary for 2 weeks minus deductions for {
federal income tax is 25. percent of gross,
state income tax is 5. percent of gross,
insurance premiums are 500. in gross currency,
retirement fund contributions are 10. percent of gross
}"""
val
p =new
PayrollParserCombinatorsV1
p.parseAll(p.paycheck, input)match
{case
p.Success
(r,_
)=>
r.toString mustEqual"""(("Buck Trends"~(2~weeks))~List(25., 5., 500., 10.))"""
case
x=>
fail(x.toString) } } } }
This part of the specification shows us how to instantiate and use the parser:
val
p =new
PayrollParserCombinatorsV1
p.parseAll(p.paycheck, input)match
{case
p.Success
(r,_
)=>
r.toString mustEqual"..."
case
x=>
fail(x.toString) }
The
parseAll
method is defined in a parent class. We
invoke the top-level production method, paycheck
, and
pass its return value as the first argument to
parseAll
and pass the string to parse as the second
argument.
If the parsing process is
successful, the result of the parse is returned as an instance of type
p.Success[+T]
, a case class declared in the
Parsers
trait. Why is there a p.
prefix? It indicates that p.Success
is a
path-dependent type, which we will discuss in Path-Dependent Types. For now, just know that even though
Success
is defined in the Parsers
trait, the actual type of the instance is dependent on the
PayrollParserCombinatorsV1
instance we created. In
other words, if we had another parser, say
p2
of type MyOtherParser
,
then p2.Success[String]
would be different from
p.Success[String]
and one could
not be substituted for the other.
The
Success
instance contains two fields. The first is
the result of the parse, an instance of type T
(assigned to r
in the case
clause). The second is the remaining input string to parse, which will
be empty after a successful parse (we will have parsed the whole string
at this point). This string is assigned to _
.
If the parse fails, the
returned instance is either a p.Failure
or
p.Error
, which our example handles with a generic
case
clause. Both are derived from
p.NoSuccess
, which contains fields for an error
message and the unconsumed input at the point of failure. A
p.Failure
in a parser will trigger backtracking so
that a retry with a different parser can be invoked by the parser
framework, if possible. An Error
result does not
trigger backtracking and is used to signal more serious problems.
For completeness, both
p.Success
and p.NoSuccess
derive
from p.ParseResult
.
We have two big
unanswered questions: what do the production methods actually return,
and what is the type of the result instance returned in the
p.Success
?
The production methods
themselves return parsers. Most of them in our example return
p.Parser[String]
(again, a path-dependent type).
However, because the deduct
method handles repetition
(it invokes the repsep
method), it actually returns a
p.Parser[List[String]]
. When this parser is used, it
will return a List[String]
, with one string
corresponding to each match in the repetition.
So, our call to
p.parseAll(p.paycheck, input)
earlier parses the
input
string using the parser returned by
p.paycheck
. That brings us to the second question:
what is the result of a successful parse?
To see what’s returned,
compile the PayrollParserCombinatorsV1 file listed
at the beginning of this section and invoke the scala
interpreter with the -cp
option to include the
directory where the class files were written (it will be
build if you used the build process for the code
example distribution).
Once in the interpreter,
enter the following expressions after the scala>
prompt. (You can also find this input the
payroll-parser-comb-script.scala file in the code
example distribution.)
scala> import scala.util.parsing.combinator._ import scala.util.parsing.combinator._ scala> import payroll.pcdsl._ import payroll.pcdsl._ scala> val p = new PayrollParserCombinatorsV1 p: payroll.pcdsl.PayrollParserCombinatorsV1 = payroll.pcdsl.PayrollParserCombinatorsV1@79e84310 scala> p.empl res0: p.Parser[String] = Parser (~>) scala> p.weeksDays res2: p.Parser[String] = Parser (|) scala> p.doubleNumber res3: p.Parser[String] = Parser () scala> p.deduct res1: p.Parser[List[String]] = Parser (<~) scala> p.paycheck res4: p.Parser[p.~[p.~[String,p.~[String,String]],List[String]]] = Parser (~) scala> p.parseAll(p.weeksDays, "weeks") res5: p.ParseResult[String] = [1.6] parsed: weeks scala> val input = """paycheck for employee "Buck Trends" | is salary for 2 weeks minus deductions for {}""" input: java.lang.String = paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for {} scala> p.parseAll(p.paycheck, input) res6: p.ParseResult[p.~[p.~[String,p.~[String,String]],List[String]]] = [2.53] parsed: (("Buck Trends"~(2~weeks))~List()) scala>
We import the necessary
types and create a PayrollParserCombinatorsV1
instance. Then we call several of the production methods to see what
kind of Parser
each returns. The first
three—empl
, weeksDays
, and
doubleNumber
—return
p.Parser[String]
.
Note what’s written on
the righthand side in the output for the first three parsers:
empl
, weeksDays
, and
doubleNumber
. We see Parser
(∼>)
, Parser (|)
, and Parser
()
, respectively. The parsers returned reflect the definitions
of the production rules, where empl
ends with a
combinator of the form prod1 ∼> prod2
,
weeksDays
returns a combinator of the form
prod1 | prod2
, and doubleNumber
returns a parser for a single production.
Because
deduct
consists of combinators that handle
repetition, the parser returned by deduct
is of type
p.Parser[List[String]]
, as we stated previously. The
righthand side of the output is Parser (<∼)
,
because the definition of deduct
ends with
prod1 <∼ prod2
.
Things get more
interesting when we look at the top-level production,
paycheck
. What is
p.Parser[p.∼[p.∼[String,p.∼[String,String]],List[String]]] =
Parser (∼)
supposed to
mean? Well, the righthand side should be easy to understand now; the
definition of paycheck
ends in prod1 ∼
prod2
. What is the type parameter for
p.Parser
on the lefthand side of the equals
sign?
The
Parsers
trait also defines a case class named
∼
that represents a pair of sequential
rules:
case
class
~
[+a, +b]
(_1:a
, _2:b
) {override
def
toString
="("
+ _1 +"~"
+ _2 +")"
}
The actual path-dependent
type in our example is p.∼[+a,+b]
. Hence, the type
parameter T
in p.Parser[T]
is
p.∼[p.∼[String,p.∼[String,String]],List[String]]
,
which is a hierarchical tree of types.
Let’s break it down,
working our way inside out. Note that there are three
p.∼
. We’ll start with the innermost type,
p.∼[String,String]
, and map the type declaration to
the output we saw in the scala
session "Buck
Trends"∼(2∼weeks∼List())
.
The
p.∼[String,String]
corresponds to the parser that
handles expressions like 2
weeks
. Hence, the instance created when we parsed our example
string was the instance p.∼("2", "weeks")
. Calling
the p.∼.toString
method produces the output
(2~weeks)
.
Working out one level, we
have p.∼[String,p.∼[String,String]]
. This combination
parses paycheck for employee "Buck Trends" is salary for 2
weeks
. Recall that we discard paycheck for
employee
and is salary for
, keeping only
the Buck Trends
and 2 weeks
. So we
create an instance p.∼("Buck Trends", p.∼("2",
"weeks"))
. Calling toString
again results
in the string ("Buck Trends"∼(2∼weeks))
.
Finally, at the outermost
level, you can see we have the following:
p.∼[p.∼[String,p.∼[String,String]],List[String]]
.
We’ve already discussed everything up to the last
List[String]
, which comes from the
deduct
production:
def
deduct
="minus"
~>"deductions"
~>"for"
~>"{"
~> repsep(deductItem,","
) <~"}"
We discard everything
except for the list of zero or more deductItems
.
There are none in our example, so we get an empty list for which
toString
returns List()
.
Therefore, calling p.∼.toString
on our outermost
type, the one that parameterizes p.Parser
, returns the string "Buck
Trends"∼(2∼weeks∼List())
. We’re done!
Well, not quite. We’re still not calculating an actual paycheck for ol’ Buck. Let’s complete our implementation.
As we parse the DSL, we
want to look up the employee by name, fetch his or her gross salary for
the specified pay period, and then calculate the deductions as we go.
When the parser returned by paycheck
finishes, we
want to return a Pair
with the
Employee
instance and the completed
Paycheck
.
We will reuse “domain”
classes like Employee
, Money
,
Paycheck
, etc. from earlier in the chapter. To do the
calculations on demand, we will create a second iteration of PayrollParserCombinatorsV1
that we’ll
call PayrollParserCombinators
. We’ll modify the
parsers returned by some of the production methods to return new kinds
of parsers. We’ll also do administrative work like storing running
context data, as needed. Our implementation won’t be thread-safe. You’ll
want to ensure that only one thread uses a given
PayrollParserCombinators
. We could make it more
robust, but doing so isn’t the goal of this exercise.
Here is our final
PayrollParserCombinators
:
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb.scala
package
payroll.pcdslimport
scala.util.parsing.combinator._import
org.specs._import
payroll._import
payroll.Type2Money._class
UnknownEmployee
(name:Name
)extends
RuntimeException
(name.toString)class
PayrollParserCombinators
(val
employees:Map[Name, Employee]
)extends
JavaTokenParsers
{var
currentEmployee:Employee
=null
var
grossAmount:Money
=Money
(0
)/**
@return
Parser[(Employee, Paycheck)] */
def
paycheck
= empl ~ gross ~ deduct ^^ {case
e ~ g ~ d=>
(e,Paycheck
(g, g-d, d)) }/**
@return
Parser[Employee] */
def
empl
="paycheck"
~>"for"
~>"employee"
~> employeeName ^^ { name=>
val
names = name.substring(1
, name.length-1
).split(" "
)// remove ""
val
n =Name
(names(0
), names(1
));if
(! employees.contains(n))throw
new
UnknownEmployee
(n) currentEmployee = employees(n) currentEmployee }/**
@return
Parser[Money] */
def
gross
="is"
~>"salary"
~>"for"
~> duration ^^ { dur=>
grossAmount = salaryForDays(dur) grossAmount }def
deduct
="minus"
~>"deductions"
~>"for"
~>"{"
~> deductItems <~"}"
/**
* "stringLiteral" provided by JavaTokenParsers
*
@return
Parser[String]
*/
def
employeeName
= stringLiteral/**
* "decimalNumber" provided by JavaTokenParsers
*
@return
Parser[Int]
*/
def
duration
= decimalNumber ~ weeksDays ^^ {case
n ~ factor=>
n.toInt * factor }def
weeksDays
= weeks | daysdef
weeks
="weeks?"
.r ^^ {_
=>
5
}def
days
="days?"
.r ^^ {_
=>
1
}/**
@return
Parser[Money] */
def
deductItems
= repsep(deductItem,","
) ^^ { items=>
items.foldLeft(Money
(0
)) {_
+_
} }/**
@return
Parser[Money] */
def
deductItem
= deductKind ~> deductAmountdef
deductKind
= tax | insurance | retirementdef
tax
= fedState <~"income"
<~"tax"
def
fedState
="federal"
|"state"
def
insurance
="insurance"
~>"premiums"
def
retirement
="retirement"
~>"fund"
~>"contributions"
def
deductAmount
= percentage | amount/**
@return
Parser[Money] */
def
percentage
= toBe ~> doubleNumber <~"percent"
<~"of"
<~"gross"
^^ { percentage=>
grossAmount * (percentage /100.
) }def
amount
= toBe ~> doubleNumber <~"in"
<~"gross"
<~"currency"
^^ {Money
(_
) }def
toBe
="is"
|"are"
def
doubleNumber
= floatingPointNumber ^^ {_
.toDouble }// Support method. Assume 260 (52 * 5) paid work days/year
def
salaryForDays
(days:Int
) = (currentEmployee.annualGrossSalary /260.0
) * days }
For simplicity, we’ll use
a map of “known” employees, keyed by Name
instances,
that we save as a field in PayrollParserCombinators
.
A real implementation would probably use a data store of some
kind.
There are two other fields:
currentEmployee
, which remembers which employee we
are processing, and grossAmount
, which remembers the
gross amount of pay for the employee for the pay period. Both fields
have a slight design smell. They are mutable. They
are set only once per parse, but not when they are declared, only when
we parse the input that allows us to calculate them. You might have also
noticed that if the same PayrollParserCombinators
instance is used more than once, we don’t reset these fields to their
default values. No doubt it would be possible to write scripts in the
DSL that exploit this bug.
These weaknesses are not inherent to parser combinators. They reflect simplifications we used for our purposes. As an exercise, you might try improving the implementation to eliminate these weaknesses.
We have added Javadoc-style
@return
annotations for most of the productions to
make it clear what they are now returning. In some cases, the
productions are unchanged, as the original parser instances are fine as
is. Most of the changes reflect our desire to calculate the paycheck as
we go.
Consider the new
paycheck
production:
/**
@return
Parser[(Employee, Paycheck)] */
def
paycheck
= empl ~ gross ~ deduct ^^ {case
e ~ g ~ d=>
(e,Paycheck
(g, g-d, d)) }
Now, we return a
Pair
with the Employee
and the
computed Paycheck
. The empl ∼ gross ∼
deduct
combination would still return
Parser[String]
(we’ll drop the path-dependent prefix for now). We have added a
new combinator ^^
, e.g., prod1 ^^
func1
, where func1
is a function. If
prod1
succeeds, then the result of applying
func1
to the result of prod1
is
returned. That is, we return func1(prod1)
.
For
paycheck
, we give it a function literal that does a
pattern match to extract the three results from empl
,
gross
, and deduct
, respectively.
We create a 2-tuple (Pair
) with e
,
the Employee
, and a Paycheck
calculated from the gross salary for the pay period (in
g
) and the sum of all the deductions (in
d
).
It’s important to keep
clear that the anonymous function passed as an argument to
^^
returns a tuple (Employee,
Paycheck)
, but the production paycheck
method itself returns a Parser[(Employee, Paycheck)]
.
This pattern has been true from the beginning, actually, where
Strings
were always involved in our first version. It
will remain true for all the production rules in
PayrollParserCombinators
.
The empl
production assumes the employee’s first name and last name are given.
(Obviously, this would be inadequate in a real application.)
/**
@return
Parser[Employee] */
def
empl
="paycheck"
~>"for"
~>"employee"
~> employeeName ^^ { name=>
val
names = name.substring(1
, name.length-1
).split(" "
)// remove ""
val
n =Name
(names(0
), names(1
));if
(! employees.contains(n))throw
new
UnknownEmployee
(n) currentEmployee = employees(n) currentEmployee }
To construct the name, the
embedded double quotes have to be removed, which is why we start by
extracting the substring that tosses the first and last characters. The
name is used to look up the Employee
instance in the
map, saving the value in the currentEmployee
field.
In general, there is not a lot of “graceful” error handling in
PayrollParserCombinators
. However, the
empl
method handles the case where no employee is
found with the specified name, throwing an
UnknownEmployee
exception when this occurs.
The rest of the
productions work similarly. Sometimes, a parser converts an input string
to an Int
(e.g., duration
) or a
Money
(e.g., gross
). An
interesting case is deduct
. It folds the list of
deductions into a single deduction amount, using addition. The
foldLeft
method takes two argument lists. The first
has a single argument that specifies the initial value, in this case,
zero Money
. The second argument list has a single
function literal argument that takes two arguments: the accumulated
value of the folding operation, and an item from the list. In this case,
we return the sum of the arguments. So, foldLeft
iterates over the
items
collection, adding them together. See Traversing, Mapping, Filtering, Folding, and Reducing for more information on
foldLeft
and related operations.
The weeks
and days
productions remind us that we are using
parser combinators based on regular-expressions. (We’re also using
stringLiteral
, decimalNumber
, and
floatingPointNumber
provided by
JavaTokenParsers
.) Note that weeks
and days
ignore the parsed string. They just return a
multiplication factor used to determine total days in the pay period in
the duration
production rule.
There are other
combinator methods for applying functions to parser results in different
ways. See the Parsers
Scaladoc page for
details.
The following (somewhat incomplete) specification shows the calculation of paychecks when there are no deductions and when there are several deductions:
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-spec.scala
package
payroll.pcdslimport
scala.util.parsing.combinator._import
org.specs._import
payroll._import
payroll.Type2Money._// Doesn't test "sad path" scenarios...
object
PayrollParserCombinatorsSpec
extends
Specification
("PayrollParserCombinators"
) {val
salary =Money
(100000.1
)// for a full year
val
gross = salary /26.
// for two weeks
val
buck =Employee
(Name
("Buck"
,"Trends"
), salary)val
employees =Map
(buck.name -> buck)implicit
def
money2double
(m:Money
) = m.amount.doubleValue"PayrollParserCombinators"
should {"calculate the gross == net when there are no deductions"
in {val
input ="""paycheck for employee "Buck Trends"
is salary for 2 weeks minus deductions for {}"""
val
p =new
PayrollParserCombinators
(employees) p.parseAll(p.paycheck, input)match
{case
p.Success
(Pair
(employee, paycheck),_
)=>
employee mustEqual buck paycheck.gross must beCloseTo(gross,Money
(.001
)) paycheck.net must beCloseTo(gross,Money
(.001
))// zero deductions?
paycheck.deductions must beCloseTo(Money
(0.
),Money
(.001
))case
x=>
fail(x.toString) } }"calculate the gross, net, and deductions for the pay period"
in {val
input ="""paycheck for employee "Buck Trends"
is salary for 2 weeks minus deductions for {
federal income tax is 25. percent of gross,
state income tax is 5. percent of gross,
insurance premiums are 500. in gross currency,
retirement fund contributions are 10. percent of gross
}"""
val
p =new
PayrollParserCombinators
(employees) p.parseAll(p.paycheck, input)match
{case
p.Success
(Pair
(employee, paycheck),_
)=>
employee mustEqual buckval
deductions = (gross *.4
) +Money
(500
)val
net = gross - deductions paycheck.gross must beCloseTo(gross,Money
(.001
)) paycheck.net must beCloseTo(net,Money
(.001
)) paycheck.deductions must beCloseTo(deductions,Money
(.001
))case
x=>
fail(x.toString) } } } }
If you work out what the results should be from the input strings, you’ll see that the implementation correctly calculates the paycheck.
Besides the many small details that differ between this implementation of the external DSL and the previous implementation of the internal DSL, there is one big conceptual difference from the two implementations. Here we are computing the paycheck as we parse code written in the external DSL. In the internal DSL case, we generated a paycheck calculator when we parsed the DSL. Afterward, we used that calculator to compute paychecks for one employee at a time. We could have generated a paycheck calculator like we did before, but we chose a simpler approach to focus on the construction of the parser itself. Also, as we discussed earlier, we weren’t as careful about thread safety and other issues in the implementation.
Scala provides rich support for creating your own internal and external DSLs. However, a non-trivial DSL can be a challenge to implement and debug. For the examples in this chapter, the parser combinators implementation was easier to design and write than the implementation for the internal DSL. However, we found that debugging the internal DSL was easier.
You must also consider how robust the parser must be when handling invalid input. Depending on the level of sophistication of the users of the DSL, you may need to provide very good feedback when errors occur, especially when your users are non-programmers. The parser combinator library in Scala version 2.8 will provide improved support for error recovery and reporting, compared to the version 2.7.X library.
The version 2.8 library will also provide support for writing packrat parsers that can implement unambiguous parsing expression grammars (PEGs). The 2.8 implementation of packrat parsers also supports memoization, which helps improve performance, among other benefits. If you need a fast parser, a packrat parser will take you further before you need to consider more specialized tools, like parser generators.
It’s tempting to create DSLs with abandon. DSLs in Scala can be quite fun to work with, but don’t underestimate the effort required to create robust DSLs that meet your clients usability needs, nor long-term maintenance and support issues.
If you choose to write a DSL, you have rich options in Scala. The syntax is flexible yet powerful enough that an internal DSL may be sufficient. A internal DSL is an excellent starting point, especially if other programmers will be the primary writers of code in the DSL.
If you expect your non-programming stakeholders to read or even write code written in the DSL, it might be worth the extra effort to create an external DSL that eliminates as many of the programming-language idioms as possible. Consider whether the code written in the DSL will need to be processed for other purposes, like generating documentation, spreadsheets, etc. Since you will have to write a parser for the DSL anyway, it might be straightforward to write others to handle these different purposes.
In the next chapter, we’ll explore the richness of Scala’s type system. We’ve learned many of its features already. Now, we’ll explore the type system in full detail.