Flat File Grammar

The grammar of a flat file is described in the Grammar Element. Although the XML representation of groups of records in flat files may be fairly intuitive, a few diagrams might help make it clearer.

Figure 8.3 shows a typical stream of records in a flat file, using our cocoa invoice as an example. For brevity only the record tags appear in the figure.

Figure 8.3. Record Stream in the Invoice File


If we look only at the records we can't for certain deduce much about the logical structure of a document. We would probably suspect that the HDR record started a new document and that perhaps the LIN and DSC records were a repeating group. However, we don't know for certain just by looking at the document; we must verify our suspicions by consulting the file specification or the application designer. For our purposes, we use Table 8.1 as our specification. This allows us to interpret the stream as shown in Figure 8.4.

Figure 8.4. Record Stream in the Invoice File, with Groups Added


Figure 8.4, in essence, shows what is known as a syntax tree. Figure 8.5 converts the brackets into nodes in the tree. I show siblings at the same level in the diagram to make relationships more obvious.

Figure 8.5. Syntax Tree for the Invoice File


The logical structure in Figure 8.5 now finally starts to look like something we might see in XML. All we have to do to make the transformation complete is to change the text from record identifiers and descriptions to XML Element names (Figure 8.6).

Figure 8.6. Invoice Document in XML


Table 8.3. Child Elements of the PhysicalCharacteristics Element
Child ElementChild ElementAttributeSchema Data TypeDescriptionAllowable Values, Restrictions, or Comments
RecordFormat   Specifies the physical format of the recordOnly one of Fixed or Variable is allowed.
 FixedLengthpositiveIntegerSpecifies the physical record lengthMaximum value reflects restriction on record length as noted in restrictions list in text.
 VariableRecordTerminatorunion of U, W, and hexBinaryDesignates a UNIX-style line feed, Windows-style carriage return and line feed pair, or a hexadecimal valueU, W, or a two-character hexadecimal number from 00 through FF representinga single byte.
TagInfo   Specifies the location of the record identifier within the recordThe tag contents will be interpreted as an alpha-numeric string, with leading and trailing white-space removed. Must be the same offset and length for every record type.
  OffsetnonNegativeIntegerSpecifies the offset from zero in bytes for the first position of the tagMaximum value reflects restriction on record length as noted in restrictions list in text.
  LengthpositiveIntegerSpecifies the length of the tag in bytesMaximum value reflects restriction on field length as noted in restrictions list in text.

Table 8.4. Child Elements of the XMLOutputCharacteristics Element
Child ElementAttributeSchema Data TypeDescription DescriptionAllowable Values, Restrictions, or Comments
SchemaLocationURLvalueanyURIURL of the schema file for the output document. Will be written as the value of the root Element's noNamespaceSchemaLocation Attribute.Optional. If not specified the noNamespaceSchemaLocation Attribute will not be written. An error will occur if output validation is requested and this Element is not present.
PartnerBreak  Information about a field that dictates a different trading partner when its content changes (for example, a customer number in the first field of the invoice).Optional. Field contents are interpreted as an alphanumeric string and must be valid as a directory name for the operating system. If not specified, all output documents will be created in the output directory instead of creating a separate subdirectory for each trading partner.
 OffsetnonNegativeIntegerOffset from zero in bytes for the first position of the field.Maximum value reflects restriction on record length as noted in the restrictions list in the text.
 LengthpositiveIntegerLength of the field in bytes.Maximum value reflects restriction on field length as noted in the restrictions list in the text.

Now the transformation is complete. However, one other diagram may be helpful in fully understanding the file description documents and how the utilities use them. The logical structure of the grammar of our invoice file exactly matches the structure of the XML representation of the invoice document (Figure 8.7). The Element names in the file description document are shown in boldface type, while the invoice Elements they specify are shown in italics. Note that we define each Element in the invoice document only once and don't repeat the GroupDescription for each occurrence of the LineItemGroup Element.

Figure 8.7. Grammar Description of the Invoice Document


For a more detailed discussion of the analysis of flat file grammars, refer to the High-Level Design Considerations section. Table 8.5 shows the details of the Grammar Element and its child Nodes. All are required unless noted. The indentation in the Element column shows the approximate hierarchical relationships. The Allowable Child Elements column lists the specific details of the hierarchy.

Table 8.6 shows the data types supported for the flat file format. To those we developed for the CSV file format in Chapter 7 we add a new numeric and a new date data type.

For all types, a runtime error occurs if Truncatable is false and the length of the XML Element contents exceeds the field length.

I should make a note here about truncating versus rounding fractional digits. In these utilities I always truncate and never round. I've had enough bad experiences with floating point arithmetic that I'm taking the easy way out and just truncating. If you need to round fractional digits, you can use an XSLT transformation or whatever means you use to put the data into the proper XML source format. Or, if you want to modify the source code, you can take an approach similar to the one I discuss in the Enhancements and Alternatives section at the end of the chapter.

Table 8.5. Flat File Grammar Characteristics in the Grammar Element
ElementAllowable Child ElementsAttributeSchema Language Data TypeDescriptionAllowable Values, Restrictions, or Comments
GrammarRecordDescription, GroupDescription  Describes the grammar of both the flat file and the corre-sponding XML representation.The first child Element of the Grammar Element must be a RecordDescription Element. It may be followed by any combination of RecordDescription or GroupDescription Elements.
  ElementNameNMTOKENSpecifies the name of the document's root Element.When creating XML documents, the specified name is assigned to the document's root Element. When creating a flat file, the input XML document's root Element must match this name. Maximum length reflects restriction on length of Element names.
  TagValuetokenThe value of the Header record's record identifier field.Maximum length reflects restriction on field length. Do not include trailing spaces if the tag length is less than the length specified in the TagInfo Element.
GroupDescriptionRecordDescription, GroupDescription  Describes the grammar of a group of records.Any combination of RecordDescription or GroupDescription Elements can follow the first RecordDescription Element.
  ElementNameNMTOKENSpecifies the name of the Element representing the group.Maximum length reflects restriction on length of Element names.
  TagValuetokenThe value of the record identifier field described for the first record in the group.Maximum value reflects restriction on field length. Do not include trailing spaces if the tag length is less than the length specified in the TagInfo Element.
RecordDescriptionFieldDescription  Describes the grammar of an individual record and the corresponding XML, a RecordDescription is required representationA RecordDescription Element is required for each unique record type in the file.If a record type may appear at different for each position.
  ElementNameNMTOKENSpecifies the name of the Element representing a row.Maximum length reflects restriction on length of Element names.
  TagValuetokenThe value of the record identifier field described by the TagInfo Element above.Maximum length reflects restriction on field length. Do not include trailing spaces if the tag length is less than the length specified in the TagInfo Element.
FieldDescriptionNone  Describes the characteristics of a field in the flat file and the corresponding XML representation.One FieldDescription Element is required for each field in the flat file record. If a range of characters within the record is not covered by a field description, they will be ignored for flat file source conversions and space filled for flat file target conversions.
  ElementNameNMTOKENSpecifies the name of the Element representing the field.Maximum length reflects restriction on length of Element names.
  FieldNumberpositiveIntegerSpecifies the number of the field, starting at one.Maximum value reflects restriction on the number of fields per record.
  DataTypetokenSpecifies the data type of the field in the flat file.The supported data types developed in this chapter are shown in Table 8.6. The Grammar data type code values are used.
  OffsetnonNegative-IntegerSpecifies the offset from zero in bytes for the first position of the field.Maximum value reflects restriction on record length.
  LengthpositiveIntegerSpecifies the length of the field in bytes.Maximum value reflects restriction on field length.
  TruncatablebooleanIndicates whether or not truncation is permitted. See comments regarding truncation in Table 8.6.Optional, defaults to false.
  FillCharacterunion of single character string and hex-BinaryWhen converting to flat files as the target, the field will be padded with this character if the source XML Element content is missing or shorter than the field length.Optional, defaults to an ASCII space character. A single literal character or a two-character hexadecimal number from 00 through FF representing a single byte may be specified.

Table 8.6. Flat File Data Types
Flat File Data TypeGrammar Data Type CodeSchema Data TypeActions with Flat File as SourceActions with Flat File as TargetActions with Flat File as Target if Truncatable Is True
AlphanumericANstringLeading and trailing white-space (any character with an integer value less than or equal to a space character) is trimmed. All other white-space within the string is preserved.If the source is shorter than the field length, the field is left-justified and filled to the right with the fill character.The string is right-truncated to the field length.
Real numberRdecimalLeading zeroes and leading plus signs are removed. All whitespace is trimmed.The number is right-justified within the field. Leading characters are set according to the fill character. If the fill character is a zero, the minus sign if present is placed in the left-most position. For all other fill characters the minus sign immediately precedes the most significant digit.Fractional digits to the right of the decimal point are truncated until the Element contents are equal to the field length. An error occurs if digits to the left of the decimal exceed the field length.
Implied decimal numberNx, where x represents the number of implied decimal placesdecimalLeading zeroes and leading plus signs are removed. All whitespace is trimmed.The number is right-justified within the field. If the number source decimal number exceeds x, the number is right-truncated to x fractional digits. Zeroes are added as fractional digits if the source number has fewer than x fractional digits. Leading characters are set according to the fill character. If the fill character is a zero, the sign character is placed in the left-most position. For all other fill characters the sign character immediately precedes the most significant digit.Ignored, not truncatable.
Date in YYYYMMDD formatDYYYYMMDDdateN/AThe date is left-justified within the field and filled with the specified fill character if the field is longer than 8 characters.Ignored, not truncatable.
Date in MM/DD/YYYY formatDMMsDDsYYYYdateMonth and day may be either one or two digits each.The date is left-justified within the field and filled with the specified fill character if the field is longer than 10 characters.Ignored, not truncatable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset