Chapter 20. Parsing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 20. Parsing

Topics in This Chapter

20.1 Grammars

20.2 Combining Parser Operations

20.3 Transforming Parser Results

20.4 Discarding Tokens

20.5 Generating Parse Trees

20.6 Avoiding Left Recursion

20.7 More Combinators

20.8 Avoiding Backtracking

20.9 Packrat Parsers

20.10 What Exactly Are Parsers?

20.11 Regex Parsers

20.12 Token-Based Parsers

20.13 Error Handling

Exercises

In this chapter, you will see how to use the “parser combinators” library to analyze data with fixed structure. Examples of such data are programs in a programming language or data in formats such as HTTP or JSON. Not everyone needs to write parsers for these languages, so you may not find this chapter useful for your work. If you are familiar with the basic concepts of grammars and parsers, glance through the chapter anyway because the Scala parser library is a good example of a sophisticated domain-specific language embedded in the Scala language.

Note

The API documentation for Scala parser combinators is at www.scala-lang.org/api/current/scala-parser-combinators.

The key points of this chapter are:

• Alternatives, concatenation, options, and repetitions in a grammar turn into |, ~, opt, and rep in Scala combinator parsers.

• With RegexParsers, literal strings and regular expressions match tokens.

• Use ^^ to process parse results.

• Use pattern matching in a function supplied to ^^ to take apart ~ results.

• Use ~> and <~ to discard tokens that are no longer needed after matching.

• The repsep combinator handles the common case of repeated items with a separator.

• A token-based parser is useful for parsing languages with reserved words and operators. Be prepared to define your own lexer.

• Parsers are functions that consume a reader and yield a parse result: success, failure, or error.

• For a practical parser, you need to implement robust error reporting.

• Thanks to operator symbols, implicit conversions, and pattern matching, the parser combinator library makes parser writing easy for anyone who understands context-free grammars. Even if you don’t feel the urge to write your own parsers, you may find this an interesting case study for an effective domain-specific language.

20.1 Grammars

To understand the Scala parsing library, you need to know a few concepts from the theory of formal languages. A grammar is a set of rules for producing all strings that follow a particular format. For example, we can say that an arithmetic expression is given by the following rules:

• Each whole number is an arithmetic expression.

• + - * are operators.

• If left and right are arithmetic expressions and op is an operator, then left op right is an arithmetic expression.

• If expr is an arithmetic expression, then ( expr ) is an arithmetic expression.

According to these rules, 3+4 and (3+4)*5 are arithmetic expressions, but 3+) or 3^4 or 3+x are not.

A grammar is usually written in a notation called Backus-Naur Form (BNF). Here is the BNF for our expression language:

Table of Contents for Chapter 20. Parsing

Create new playlist

Sign In

Sign Up

Chapter 20. Parsing

20.1 Grammars

20.2 Combining Parser Operations

20.3 Transforming Parser Results

20.4 Discarding Tokens

20.5 Generating Parse Trees

20.6 Avoiding Left Recursion

20.7 More Combinators

20.8 Avoiding Backtracking

20.9 Packrat Parsers

20.10 What Exactly Are Parsers?

20.11 Regex Parsers

20.12 Token-Based Parsers

20.13 Error Handling

Exercises

Table of Contents for
Chapter 20. Parsing