EDIRecordReader Class (Extends RecordReader)

Overview

The EDIRecordReader is the generalized base class for several derived classes that deal with specific EDI syntaxes. In this chapter we develop an X12RecordReader derived class. However, as you'll see, the attributes and methods we develop here are applicable to other syntaxes, so we move them to a common base class.

Attributes:

  • Character Element Separator

  • Character Component Separator

  • Character Repetition Separator

  • Character Release Character

Methods:

  • Constructor

  • getRecordType

  • parseRecord

  • setFileDescriptionDocument

  • writeRecord

Methods

Constructor

The EDIRecordReader constructor method does very little processing. It calls the base RecordReader constructor method, passing along the null value for the file description document, and initializes the class member attributes.

getRecordType

This method extracts the segment identifier from the input record buffer.

Logic for the EDIRecordReader getRecordType Method
Arguments:
  None

Returns:
  Record ID tag value; throws exception or returns status

Record Tag Value <- Get first characters of input record up to
    the Element Separator
Return Record Tag Value or status

parseRecord

This is perhaps the most interesting method developed in this chapter. To develop it we draw on the BNF grammar analysis presented at the start of the High-Level Design Considerations section. We'll start with some general observations that lead us to the characteristics and outlines of a processing algorithm. The state transition diagram will follow that. We'll finish off with the full algorithm.

As I said in the grammar analysis, the grammar of EDI segments is fairly complex. If for no other reason this is due to the number of productions. It would take a fairly complex state machine and associated algorithm to directly implement the grammar I outlined. Fortunately, we don't have to take that approach. We can simplify things somewhat by observing that we mainly want to extract data from simple data elements. From strictly that perspective we don't care very much whether an element is a stand-alone simple data element, a component data element within a composite data structure, an instance of a repeating simple element, or a component data element within an instance of a repeating composite. We do, of course, in the big picture care about these distinctions, but not when it comes to parsing the input segment. We care about them when we actually save the data. The good news is that we have a way to make these distinctions when we save the data. We have the segment grammar, and if we walk the grammar as we parse the segment, we can make the necessary distinctions.

So, here is the general outline of a processing algorithm. We maintain pointers to the current DataCell and the current grammar Element. We also maintain a pointer to the grammar for the current composite data structure.

  • We start processing by parsing past the segment identifier to the first element separator.

  • If we encounter a delimiter such as an element separator, a component data element separator, or a repetition separator, we point the current grammar Element to the appropriate element within the segment or composite grammar. We also clear the pointer to the current DataCell.

  • If we encounter a normal character in an input stream, we save it. If the current DataCell pointer is null, we first create a new DataCell, passing the current grammar Element.

  • If we encounter the release character, we treat the next character as a normal character and save it.

  • If we encounter a delimiter that should not immediately follow a different delimiter, we move to an error state. Examples of this type of error would be the cases of a repetition separator or a data element separator immediately following a component data element separator. These cases indicate an invalid segment because trailing component data element separators within a composite data structure are not allowed.

  • The last character in the segment immediately preceding the segment terminator should be a normal character and not a delimiter. We can check for this condition by testing whether or not the current DataCell pointer is null. If it is null, the last character processed was a delimiter and we exit the routine with an error. If it isn't null, the last character processed was a normal character and we exit the parsing routine normally.

With this outline we can construct the state transition diagram shown in Figure 9.1. Four states correspond to each of our four delimiters, one state corresponds to a data element, and one is an error state. To make the state machine more closely reflect the sense of the algorithm, I named the states for the actions taken rather than for the symbol from the segment grammar.

  • Next Element, entered after parsing the data element separator: If the element position within the segment is greater than the Field Number of the current Grammar Element, advance the current Grammar Element until the Field Number is equal to or greater than the Element Position. If greater, set the current Grammar Element to null. Otherwise, if the current Grammar Element describes the grammar of a composite data structure, point the current composite grammar pointer to it and set the current Grammar Element to point to its first child Element. Otherwise, clear the current composite grammar pointer. Clear the DataCell pointer.

  • Next Component, entered after parsing the component data element separator: Point the current Grammar Element to the next child Element of the current composite Grammar Element. Clear the DataCell pointer.

  • Next Repeat, entered after parsing the repetition separator: If the composite grammar pointer is null, we don't change the current Grammar Element. If it is not null, we reset the current Grammar Element to point to the first child of the current composite Grammar Element. Clear the DataCell pointer.

  • Escape, entered after parsing the Release character: Set the current state to Save Character so that the next character in the segment is saved.

  • Save Character, entered after parsing a normal character or an escaped delimiter: If the DataCell pointer is null, create a new DataCell. Save the current character to the DataCell.

  • Error, entered after encountering a parsing error in one of the other states: Terminate processing and exit the routine.

Figure 9.1. State Transition Diagram for Parsing an EDI Segment


In Figure 9.1 the double circle on the Save Character state indicates an accepting state and on the Error state indicates a termination state.

From the perspective of more formal analysis, this outline makes a lot of sense. If we were to successively replace each of the nonterminal symbols in the productions of the segment grammar with terminal symbols (with a bit of leeway for the char and special char productions), we could eventually reduce the grammar to a single production. It would not be very readable or useful, but it would be a single production. Most importantly, it would be a single production that could be converted to a regular expression. This is significant because it proves that we can process a segment grammar with a finite state automaton and the type of state transition diagram shown in Figure 9.1. I leave to you the exercise of actually performing the reduction and creating the regular expression. The diagram is convincing enough for me.

Finally, we combine the outline of the algorithm and the information from the state transition diagram, yielding the following processing algorithm.

When we enter the method there's a segment loaded in the Record Buffer and we have identified the segment type. The method has also been passed the Grammar Element for the segment. When reviewing the algorithm, bear in mind that if a data element isn't used in a segment or composite data structure, we don't require that it be defined in the grammar. So, we advance through the grammar as we advance through the appropriate delimiters, but we match data element position within segment against the Grammar Element's FieldNumber Attribute and component data element position within composite data structure against the Grammar Element's SubFieldNumber Attribute.

The Save Character state is the only accepting state or successful completion state. If we exit the main loop in any other state, the segment is invalid because it ends with a delimiter. In such cases we move to the Error state.

Logic for the EDIRecordReader parseRecord Method
Arguments:
  DOM Element Segment Grammar

Returns:
  Error status or throw exception

Segment Child <- get Segment Grammar's firstChild
DO until Segment Child nodeType is Element
 Segment Child <- Segment Child nextSibling
ENDDO
Current Grammar <- Segment Child
Grammar Position <- Current Grammar getAttribute for
    Field Number
Composite Grammar <- null
Element Position <- 0
Component Position <- 0
Parsing State <- Next Element
Advance Pos to point to the Record Buffer's first
    Element Separator
DO while Pos < Buffer Length and
    Parsing State != Parsing Error
  CurChar = Record Buffer[Pos]
  //  Advance to the next state
  DO CASE of Parsing State
    Next Element:
      IF CurChar = Element Separator
        Parsing State = Next Element
      ELSE
        IF CurChar = Component Separator
          Parsing State = Next Component
        ELSE
          IF CurChar = Repetition Separator
            Parsing State = Next Repeat
          ELSE
            IF CurChar = Release Character
              Parsing State = Escape
            ELSE
              Parsing State = Save Character
            ENDIF
          ENDIF
        ENDIF
      ENDIF
      BREAK
    Next Component:
      IF CurChar = Element Separator
        Parsing State = Parsing Error
      ELSE
        IF CurChar = Component Separator
          Parsing State = Next Component
        ELSE
          IF CurChar = Repetition Separator
            Parsing State = Parsing Error
          ELSE
            IF CurChar = Release Character
              Parsing State = Escape
            ELSE
              Parsing State = Save Character
            ENDIF
          ENDIF
        ENDIF
      ENDIF
      BREAK
    Next Repeat:
      IF CurChar = Element Separator
        Parsing State = Parsing Error
      ELSE
        IF CurChar = Component Separator
          Parsing State = Next Component
        ELSE
          IF CurChar = Repetition Separator
            Parsing State = Next Repeat
          ELSE
            IF CurChar = Release Character
              Parsing State = Escape
            ELSE
              Parsing State = Save Character
            ENDIF
          ENDIF
        ENDIF
      ENDIF
      BREAK
    Escape:
      Parsing State = Save Character
      BREAK
   Save Character:
    IF CurChar = Element Separator
        Parsing State = Next Element
      ELSE
        IF CurChar = Component Separator
          Parsing State = Next Component
        ELSE
          IF CurChar = Repetition Separator
            Parsing State = Next Repeat
          ELSE
            IF CurChar = Release Character
              Parsing State = Escape
            ELSE
              Parsing State = Save Character
            ENDIF
          ENDIF
        ENDIF
      ENDIF
      BREAK
  ENDDO
  //  Take action appropriate to state
  DO CASE of Parsing State
    Next Element:
      CurrentCell = null
      Increment ElementPosition
      // Get the next the Grammar Element for the next data
      //   element in the segment
      DO while Element Position > Grammar Position
        Segment Child <- Segment Child nextSibling
        IF Segment Child = null
          Return error for no Grammar Element
        ENDIF
        IF Segment Child nodeType != Element
          CONTINUE
        ENDIF
        Grammar Position <- Current Grammar getAttribute for
            "FieldNumber"
      ENDDO
      //  If we find a match then do the setup, otherwise set
      //    the current Grammar Element to null
      IF Element Position = Grammar Position
        Node Name <- Segment Child nodeName
        IF Node Name = "CompositeStructureDescription"
          Composite Grammar <- Segment Child
          Composite Child <- Composite Grammar firstChild
          DO until Composite Child nodeType is Element
            Composite Child <- Composite Child nextSibling
          ENDDO
          Current Grammar <- Composite Child
          Component Position <- 1
        ELSE
          Composite Grammar <- null
          Composite Child <- null
          Component Position <- 0
          Current Grammar <- Segment Child
        ENDIF
      ELSE
        Current Grammar <- null
      ENDIF
      BREAK
    Next Component:
      CurrentCell = null
      Increment Component Position
      Grammar SubPosition <- Current Grammar getAttribute for
          "SubFieldNumber"
      // Get the next Grammar Element for the next data
      //   element in the Composite
      DO while Component Position > Grammar SubPosition
        Composite Child <- CompositeChild nextSibling
        IF Composite Child = null
          Return error for no Grammar Element
        ENDIF
        IF Composite Child nodeType != Element
          CONTINUE
        ENDIF
        Grammar SubPosition <- Current Grammar getAttribute for
            "SubFieldNumber"
      ENDDO
      //  If we find a match then do the setup, otherwise set
      //    the current Grammar Element to null
      IF Component Position = Grammar SubPosition
        Current Grammar <- Composite Child
      ELSE
        Current Grammar <- null
      ENDIF
      BREAK
    Next Repeat:
      // If we're processing a composite, reset the grammar and
      //   index back to the start of the composite. Otherwise
      //   do nothing and stay with the current grammar.
      Current Cell = null
      IF Composite Grammar != null
        Composite Child <- Composite Grammar firstChild
        DO until Composite Child nodeType is Element
          Composite Child <- Composite Child nextSibling
        ENDDO
        Current Grammar <- Composite Child
        Component Position <- 1
      ENDIF
      BREAK
    Release:
      //  Take no action
      BREAK
    Save Character:
      IF Current Cell = null
        IF Current Grammar = null
          Return error for no Grammar Element
        ENDIF
        Current Cell = call createDataCell, passing Element
            Position and Current Grammar
        IF ComponentPosition != 0
          Call Current Cell's setSubFieldNumber, passing
            Component Position
        ENDIF
      ENDIF
      Call Current Cell putByte, passing CurChar
      BREAK
    Parsing Error:
      //  Take no action
      BREAK
  ENDDO
  Increment Pos
ENDDO
//  Check if we are finishing with a delimiter
IF Parsing State != Save Character
  Parsing State = Parsing Error
ENDIF
IF Parsing State = Parsing Error
  Display parsing error message with Element Position,
    Component Position, Pos, and segment contents
  Return error
ENDIF
Return success

One restriction needs to be noted regarding the algorithm. While it parses input correctly, the actions taken in the Next Repeat state may not meet all requirements. The X12 Design Rules imply that the position of a particular instance of a repeating data element may have semantic significance. I've not investigated the UN/EDIFACT rules, but I wouldn't be surprised if they said something similar. But regardless of what the standards say, there may certainly be implementations where position within a repeating data element is significant. For example, there might be a usage in which the first occurrence of an element would be loaded into the first column of a spreadsheet row while the fifth occurrence (or fourth repetition) would be loaded into the fifth column. In this parsing algorithm positional information is lost if there are empty data elements within a repeat since we don't write empty XML Elements from them. If the second repetition were missing in our example, repeats three and later would be shifted one column to the left. I'm not sure how significant this deficiency is, but investigating and perhaps providing better repeat support is among the version 1.0 requirements for the Babel Blaster project. Again, repeating data elements were not supported at all in X12 004010, which is the initial focus of these utilities.

setFileDescriptionDocument

The constructors in the CSV and flat file record readers set the file description document pointer for us. However, when processing EDI we need to read from the interchange before the SourceConverter loads the file description document. In addition, we may use several such documents while processing an interchange. This method gives us a way to set and change the file description document.

Logic for the EDIRecordReader setFileDescriptionDocument Method
Arguments:
  DOM Document File Description Document

Returns:
  Nothing

Base RecordHandler File Description Document <- From passed
   DOM Document File Description Document
Return

Note: I contemplated moving this method to the base RecordHandler class but didn't because I couldn't think of any class other than the EDIRecordReader that needed it.

writeRecord

This method is very similar to the one in the base RecordWriter class. It creates an XML Element that represents a segment (or record) and writes the contents of the DataCell Array to child Elements of the record Element. We could use that base class method if it weren't for the fact that for EDI formats we also write Elements that correspond to composite data structures. These have child Elements that correspond to the component data elements within a composite.

So, we are faced with the classic conundrum of object-oriented analysis and design: What do we put in the generalized base class and what do we put in the specialized derived class? This is a particularly thorny problem when it comes to methods that are very similar in the base and derived classes. My gut feeling tells me not to unnecessarily risk breaking something that already works. So, rather than including logic in the base class method to handle the special case of composite data structures, I'm going to develop a specialized method for this derived class. If it appears similar enough and doesn't break any of the functionality in the base class method, for Babel Blaster version 1.0 I may promote it to the base class.

The main difference between this and the base RecordWriter's writeRecord method is that we get a NodeList of the segment Grammar Element's composite data structure grammars. If a DataCell Array entry has a subField number, we find the grammar for the appropriate composite and make the new Element a child of the composite.

Logic for the EDIRecordReader writeRecord Method
Arguments:
  DOM Element Output Document Parent Element
  DOM Element Record Grammar Element

Returns:
  Status or throws exception

Element Name <- Call Grammar Element's getAttribute for
    "ElementName"
Record Element <- call Output Document's createElement, passing
    Element Name
Parent Element <- call Parent's appendChild to append
    Record Element
Current Parent <- RecordElement
Composite Parent <- null
Composite Position <- 0
Composite Grammars <- call Record Grammar's getElementsByTagName
    for "CompositeStructureDescription"
DO for all DataCells in array up through Highest Cell
  // Test if this is a component within a composite
  Component Position <- call DataCell's getSubFieldNumber
  IF Component Position != 0
    //  See if we are starting a new composite
    Cell Field Number <- Call DataCell's getFieldNumber
    IF Cell Field Number != Composite Position
      // Find the grammar and create the Element
      DO for all Elements in Composite Grammars NodeList until
          Cell Field Number = Composite Position
        Composite Position <- Call Composite Grammars item's
           getAttribute for "FieldNumber"
      ENDDO
      Composite Name <- Call Composite Grammars current item
          Element's getAttribute for "ElementName"
      Composite Parent <- call Output Document's createElement,
          passing Composite Name
      Record Element <- call Record Element appendChild to append
          Composite Parent
      Current Parent <- Composite Parent
    ENDIF
  ELSE
    // simple element
    Current Parent <- Record Element
    Composite Parent <- null
    Composite Position <- 0
  ENDIF
  Call toElement on Cell Array entry, passing Current Parent
  Clear Cell Array Entry
ENDDO
Highest Cell <- -1

I should note that this algorithm does not properly support all aspects of repeating data elements. It will work fine if the repeated data elements are simple data elements. However, it will not work correctly if the repeating unit is a composite data structure. All the child component data elements for all repetitions would be written under one composite parent Element in the output XML document. This is yet another area where current support is limited to version 004010 of X12 and where we have another requirement for Babel Blaster version 1.0. My impression is that we might fix this by adding a RepeatPosition member attribute to the DataCell class. We would set it in the parseRecord method and in this method check for a break in RepeatPosition to see if we need to write a new composite parent Element. We could also use RepeatPosition to write a RepeatPosition Attribute to the output XML Element for a repeating Element.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset