In this recipe, we will give another DSL example for constructing a simple configuration language for the analysis of logfiles, and the generation of reports based on the content of such logfiles. The technique used in this recipe is similar to the one used in the recipe DSL for executing commands over SSH.
Let's consider having the following performance log data:
execution of getCustomerName took 244ms execution of getCustomerName took 144ms execution of getAccountNumber took 44ms execution of getCustomerName took 244ms execution of getCustomerName took 24ms execution of getAccountNumber took 112ms execution of getCustomerName took 200ms execution of getCustomerName took 22ms ...
The goal is to calculate the average and total times spent on each method. Of course, we could have written a very simple script to reach the same result, but our purpose is to create a DSL that will allow parsing any arbitrary logfile format and extract both grouped and aggregated numeric information from it. A reasonable DSL may look like the following code snippet:
format '^execution of (\w+) took (\d+)ms$' column 1, 'methodName' column 2, 'duration' source('PerformanceData2012') { localFile 'log1.log' localFile 'log2.log' } report('Duration') { avg 'duration' sum 'duration' groupBy 'methodName' }
We will try to define the language exactly like this example.
The first expression defines a log line format; then we define a regular expression group mapping to column names, which are used later to refer to log data inside the report definition. The report definition contains a list of calculated values (average of duration and sum of duration) and a column to group report data by. Another important component of the DSL is the definition of the data source.
To define our internal DSL, we first need to define its building blocks, that is, the data structures that compose our mini language:
class Report { def name def sumColumns = [] as Set def avgColumns = [] as Set def groupByColumns = [] as Set Report(String name) { this.name = name } void sum(String columnName) { sumColumns << columnName } void avg(String columnName) { avgColumns << columnName } void groupBy(String columnName) { groupByColumns << columnName } }
class Source { def name def files = [] as Set Source(String name) { this.name = name } void localFile(File file) { if (file) { files << file.absoluteFile.canonicalFile } } void localFile(String file) { localFile(new File(file)) } }
class Configuration { def format private final columnNames = [:] private final columnIndexes = [:] private final sources = [:] private final reports = [:] private static int sourceCounter = 0 private static int reportCounter = 0 void format(String format) { this.format = format } void column(int group, String name) { columnNames[group] = name columnIndexes[name] = group } void source(Closure cl) { def generatedName = "source${sourceCounter++}" source(generatedName, cl) } void source(String name, Closure cl) { Source source = new Source(name) cl.delegate = source cl.resolveStrategy = Closure.DELEGATE_FIRST cl() sources[name] = source } void report(Closure cl) { def generatedName = "report${reportCounter++}" report(generatedName, cl) } void report(String name, Closure cl) { Report report = new Report(name) cl.delegate = report cl.resolveStrategy = Closure.DELEGATE_FIRST cl() reports[name] = report } }
engine
class that will glue together the configuration creation and actual report generation:class LogReportDslEngine { void process(Closure cl) { Configuration config = new Configuration() cl.delegate = config cl.resolveStrategy = Closure.DELEGATE_FIRST cl() config.sources.values().each { Source source -> config.reports.values().each { Report report -> // Collect report data. def reportData = [:] source.files.each { File sourceFile -> sourceFile.eachLine { String line -> // Match the data line. if (line =~ config.format) { def fields = (line =~ config.format)[0] // Map column names def fieldMap = fields.collect {} // Generate group key, for which // to aggregate the data. def group = report.groupByColumns .collect { fields[config.columnIndexes[it]] }.join(', ') // Create empty group record // if it does not exist. reportData[group] = reportData[group] ?: emptyRecord // Calculate report values for given key. def g = reportData[group] report.avgColumns.each { String column -> def fieldIndex = config.columnIndexes[column] g['avg'][column] = g['avg'][column] ?: 0 g['avg'][column] += fields[fieldIndex].toDouble() } report.sumColumns.each { String column -> def fieldIndex = config.columnIndexes[column] g['sum'][column] = g['sum'][column] ?: 0 g['sum'][column] += fields[fieldIndex].toDouble() } g['count'] += 1 } } } // Produce report output. def reportName = "${source.name}_${report.name}" def reportFile = new File("${reportName}.report") reportFile.text = '' reportData.each { key, data -> reportFile << "Report for $key " reportFile << " Total records: ${data['count']} " data['avg'].each { column, value -> reportFile << " Average of ${column} is " + "${value / data['count']} " } data['sum'].each { column, value -> reportFile << " Sum of ${column} is ${value} " } } } } } def getEmptyRecord() { [count: 0, avg: [:], sum: [:]] } }
def engine = new LogReportDslEngine() engine.process { format '^execution of (\w+) took (\d+)ms$' column 1, 'methodName' column 2, 'duration' source('PerformanceData2012') { localFile 'log1.log' localFile 'log2.log' } source('PerformanceData2013') { localFile 'log3.log' localFile 'log4.log' } report('Duration') { avg 'duration' sum 'duration' groupBy 'methodName' } }
PerformanceData2012_Duration.report
and PerformanceData2013_Duration.report
. The report will look approximately like the following example:Report for getCustomerName Total records: 12 Average of duration is 179.0 Sum of duration is 2148.0 Report for getAccountNumber Total records: 4 Average of duration is 64.0 Sum of duration is 256.0
The Source
and Report
classes defined previously are simple structures holding information needed to build the reports; therefore, we will not spend any time on them.
The Configuration
class is a bit more involved because it makes use of closure delegates (similar to the DSL for executing commands over SSH recipe).
The Configuration
object is also constructed through a closure delegate inside the process method of the LogReportDslEngine
class. After the configuration closure is executed, we get back a fully constructed data structure, which we are ready to use for further processing.
The code executed after we have a configuration object does the following:
Obviously, this DSL implementation is rather primitive and can be extended with many more features such as:
groupBy
columns cannot appear in an aggregated function)