A Recursive Algorithm

Now let's talk about a more radical alternative approach to designing this utility. We have outlined in this program the basic strategy for walking through the tree that represents a DOM Document. For each selected Node we get a list of the Node's children. Then for each of the children we get a list of children and process them however we want. Keeping with my “clarity over cleverness” orientation, I've designed specialized routines that deal specifically with the expected structure of the input XML document. However, it isn't too hard to see how this approach can be generalized into a recursive algorithm that walks the complete tree and takes actions that are appropriate to each type or name of Node encountered. Listed below is the pseudocode for a recursive approach that processes Document, Row, Column, and Text Nodes.

Logic for the Main Routine: Recursive Implementation
Parse input arguments from command line
IF help option is specified
  display help message and exit
ENDIF
Set up DOM XML environment (dependent on implementation)
Load input XML Document (dependent on implementation)
Open output file
Initialize CSVRowWriter object
Call processNode, passing Document object and null pointers for
  Column Array and Column Text
Close input and output files

The main routine is quite similar to the original main routine. We see the differences primarily in the processNode method.

Logic for the CSVRowWriter.processNode Routine: Recursive Implementation
NodeList <- Get passed Node's childNodes attributes
DO CASE of Node's nodeName
  Document:
    // Process the Document's Row children
    DO for each of the Node's children
      Call processNode, passing child Node and null pointers
        for Column Array and Column Text
    ENDDO
  Row:
    // Process the Row's ColumnXX children
    Initialize Column Array
    DO for each of the Node's children
    Call processNode, passing child Node and pointer
      to Column Array, null pointer to Column Text
    ENDDO
    Output Buffer <- Call formatRow, passing Column Array
    Write Output Buffer
  ColumnXX:
    // Process the Column's #Text children
    Column Number <- Derive from Column Name
    DO for each of the Node's children
      Call processNode, passing child Node and null pointer
        to Column Array, pointer to Column Text
    ENDDO
    Column Array[Column Number] <- Column Text
  #Text:
    // Get the text
    Column Text <- get nodeValue
  Default:
    No operation
ENDDO

If you understand how recursive programming works, this routine should be easy enough to figure out. However, we need to ask this question: Just because we can solve this problem recursively, should we? If all we wanted to do was to walk the document tree and print out Node names and values, a recursive approach might be very appropriate. However, in this case we have a different kind of problem to solve. We do different types of processing for each type of Node we encounter. The different types of processing require that we pass arguments to the routine that are used only in specific cases and not in others. Overall, we need to ask whether or not this approach contributes to our goals of simplicity, understandability, and maintainability. You may run into some kinds of XML processing problems where a recursive algorithm contributes to those goals. In this case I don't think it does.

The one advantage to this type of recursive approach is that we have a Default case to handle unexpected Nodes. However, if we validate the input XML document, this advantage is moot. Even though I don't think that a recursive algorithm has any advantages for this particular utility, I did want to point it out to you in case you might find it applicable to a problem you need to solve. In Chapters 8 and 9 I present two cases where a recursive approach does make sense.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset