Chapter 19 Text file processing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 19
Text file processing

In this chapter, you will:

Import from text files
Write text files

VBA simplifies both reading and writing from text files. This chapter covers importing from a text file and writing to a text file. Being able to write to a text file is useful when you need to write out data for another system to read or even when you need to produce HTML files.

Importing from text files

There are two basic scenarios when reading from text files. If a file contains fewer than 1,048,576 records, it is not difficult to import the file using the Workbooks.OpenText method. If the file contains more than 1,048,576 records, you have to read the file one record at a time.

Importing text files with fewer than 1,048,576 rows

Text files typically come in one of two formats. In one format, the fields in each record are separated by some delimiter, such as a comma, pipe, or tab. In the second format, each field takes a particular number of character positions. This is called a fixed-width file, and this format was very popular in the days of COBOL.

Excel can import either type of file. You can also open both types by using the OpenText method. In both cases, it is best to record the process of opening the file and then use the recorded snippet of code.

Opening a fixed-width file

Figure 19-1 shows a text file in which each field takes up a certain amount of space in the record. Writing the code to open this type of file is slightly arduous because you need to specify the length of each field. In my collection of antiques, I still have a metal ruler used by COBOL programmers to measure the number of characters in a field printed on a green-bar printer. In theory, you could change the font of your file to a monospace font and use this same method. However, using the macro recorder is a slightly more up-to-date method.

This figure shows a text file. There are eight columns of data. The columns are fixed width, so they all line up nicely in the Notepad window. — **FIGURE 19-1**This file is fixed width. Because you must specify the exact length of each field in the file, opening this file is quite involved.

Turn on the macro recorder by selecting Record Macro from the Developer tab. Use the default macro name. From the File menu, select Open. Change the Files Of Type to All Files and find your text file.

In the Text Import Wizard’s step 1, specify that the data is Fixed Width and click Next. Excel looks at your data and attempts to figure out where each field begins and ends. Figure 19-2 shows Excel’s guess on this particular file. Because the Date field is too close to the Customer field, Excel missed drawing that line.

This figure shows the Text Import Wizard step 2 of 3. In this view, lines appear between most columns. Excel has detected where the lines should be drawn. There is a problemâ€”no line was drawn between the COGS and Profit columns. — **FIGURE 19-2** Excel guesses at where each field starts and ends. In this case, it guessed incorrectly for two of the fields.

To add a new field indicator in step 2 of the wizard, click in the appropriate place in the Data Preview window. If you click in the wrong column, click the line and drag it to the right place. If Excel inadvertently put in an extra field line, double-click the line to remove it. Figure 19-3 shows the Data Preview window after the appropriate changes have been made. Note the little ruler above the data. When you click to add a field marker, Excel is actually handling the tedious work of figuring out that the Customer field starts in position 25 and has a length of 11.

Continuing from Figure 19-2, this figure is still in Step 2 of the Text to Columns wizard. Using your mouse, click to add a new line between COGS and Profit. — **FIGURE 19-3** After you add a new field marker and adjust the marker between Customer and Quantity to the right place, Excel can build the code that gives you an idea of the start position and length of each field.

In step 3 of the wizard, Excel assumes that every field is in General format. Change the format of any fields that require special handling. Click the third column and choose the appropriate format from the Column Data Format section of the dialog box. Figure 19-4 shows the selections for this file.

This figure shows Step 3 of 3 of the Text Import Wizard. The third column is marked as a date in MDY format. The COGS and Profit columns are marked as Skip Column. A button named Advanced is pointed out, but it has not been clicked yet. — **FIGURE 19-4** The third column is a date, and you do not want to import the Cost and Profit columns.

If you have date fields, click the heading above that column and change the column data format to a Date format. If you have a file with dates in year-month-day format or day-month-year format, select the drop-down menu next to Date and choose the appropriate date sequence.

If you prefer to skip some fields, click those columns and select Do Not Import Column (Skip) from the Column Data Format section. This is useful in a couple of instances. If the file includes sensitive data that you do not want to show to a client, you can leave it out of the import. For example, perhaps this report is for a customer to whom you do not want to show the cost of goods sold or profit. In this case, you can choose to skip these fields in the import. In addition, occasionally you will encounter a text file that is both fixed width and delimited by a character such as the pipe character. Setting the one-character-wide pipe columns as “do not import” is a great way to get rid of the pipe characters.

If you have text fields that contain alphabetic characters, you can choose the General format. The only time you should choose the Text format is if you have a numeric field that you explicitly need imported as text. One example of this is an account number with leading zeros or a column of ZIP Codes. In this case, change the field to Text format to ensure that ZIP Code 01234 does not lose the leading zero.

Note After you import a text file and specify that one field is text, that field exhibits seemingly bizarre behavior. Try inserting a new row and entering a formula in the middle of a column imported as text. Instead of getting the results of the formula, Excel enters the formula as text. The solution is to delete the formula, format the entire column as General, and then enter the formula again.

After opening the file, turn off the macro recorder and examine the recorded code, which should look like this:

Click here to view code image

Workbooks.OpenText Filename:="C:sales.prn", Origin:=437, StartRow:=1, _

DataType:=xlFixedWidth, FieldInfo:=Array(Array(0, 1), Array(8, 1), _

Array(17, 3), Array(27, 1), Array(54, 1), Array(62, 1), Array(71, 9), _

Array(79, 9)), TrailingMinusNumbers:=True

The most confusing part of this code is the FieldInfo parameter. You are supposed to code an array of two-element arrays. Each field in the file gets a two-element array to identify both where the field starts and what type of field it is.

The field start position is zero based. Because the Region field is in the first character position, its start position is listed as zero.

The field type is a numeric code. If you were coding this by hand, you would use the xlColumnDataType constant names; but for some reason, the macro recorder uses the harder-to-understand numeric equivalents.

By using Table 19-1, you can decode the meaning of the individual arrays in the FieldInfo array. Array(0, 1) means that this field starts zero characters from the left edge of the file and is a General format. Array(8, 1) indicates that the next field starts eight characters from the left edge of the file and is General format. Array(17, 3) indicates that the next field starts 17 characters from the left edge of the file and is a Date format in month-day-year sequence.

TABLE 19-1 xlColumnDataType values

Value	Constant	Used For
1	`xlGeneralFormat`	General
2	`xlTextFormat`	Text
3	`xlMDYFormat`	MDY date
4	`xlDMYFormat`	DMY date
5	`xlYMDFormat`	YMD date
6	`xlMYDFormat`	MYD date
7	`xlDYMFormat`	DYM date
8	`xlYDMFormat`	YDM date
9	`xlSkipColumn`	Skip Column
10	`xlEMDFormat`	EMD date (for use in Taiwan)

As you can see, the FieldInfo parameter for fixed-width files is arduous to code and confusing to look at. This is one situation in which it is easier to record the macro and copy the code snippet.

Opening a delimited file

Figure 19-5 shows a text file in which the fields are comma separated. The main task in opening such a file is to tell Excel that the delimiter in the file is a comma and then identify any special processing for each field. In this case, you definitely want to identify the third column as being a date in MDY format.

This figure shows another text file; this one has a comma between each column. The columns are not lined up nicely, but the code to import will be simpler. — **FIGURE 19-5** This file is comma delimited. Opening this file involves telling Excel to look for a comma as the delimiter and then identifying any special handling, such as treating the third column as a date. This is much easier than handling fixed-width files.

Note If you try to record the process of opening a comma-delimited file whose filename ends in .csv, Excel records the Workbooks.Open method rather than Workbooks.OpenText. If you need to control the formatting of certain columns, rename the file to have a .txt extension before recording the macro. You can then edit the recorded macro to change the filename back to a .csv extension.

Turn on the macro recorder and record the process of opening the text file. In step 1 of the wizard, specify that the file is delimited.

In step 2 of the Text Import Wizard, the Data Preview window might initially look horrible. This is because Excel defaults to assuming that the fields are separated by tab characters (see Figure 19-6 ).

Initially, the Text Import Wizard Step 2 predicts a tab as the delimiter. The data preview shows a single long field. — **FIGURE 19-6** Before you import a delimited text file, the initial Data Preview window is a confusing mess of data because Excel is looking for tab characters between fields when a comma is actually the delimiter in this file.

After you’ve cleared the Tab check box and selected the proper delimiter choice, which in this case is a comma, the Data Preview window in step 2 looks perfect, as shown in Figure 19-7.

Step 3 of the wizard is identical to step 3 for a fixed-width file. In this case, specify that the third column has a date format. Click Finish, and you have this code in the macro recorder:

Click here to view code image

Workbooks.OpenText Filename:="C:sales.txt", Origin:=437, _

StartRow:=1, DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, _

ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False, _

Comma:=True, Space:=False, Other:=False, _

FieldInfo:=Array(Array(1, 1), Array(2, 1), _

Array(3, 3), Array(4, 1), Array(5, 1), Array(6, 1), _

Array(7, 1), Array(8, 1)), TrailingMinusNumbers:=True

Although this code appears longer than the earlier code, it is actually simpler. In the FieldInfo parameter, the two element arrays consist of a sequence number, starting at 1 for the first field, and then an xlColumnDataType from Table 19-1. In this example, Array(2, 1) is saying “the second field is of general type.” Array(3, 3) is saying “the third field is a date in MDY format.” The code is longer because it explicitly specifies that each possible delimiter is set to False. Because False is the default for all delimiters, you really need only the one you will use. The following code is equivalent:

Click here to view code image

Workbooks.OpenText Filename:= "C:sales.txt", _

DataType:=xlDelimited, Comma:=True, _

FieldInfo:=Array(Array(1, 1), Array(2, 1), Array(3, 3), _

Array(4, 1), Array(5, 1), Array(6, 1), _

Array(7, 1), Array(8, 1))

After changing the delimiter to a comma in Step 2 of the Text Import Wizard, the data preview shows the data split into eight columns. — **FIGURE 19-7** After the delimiter field has been changed from a tab to a comma, the Data Preview window looks perfect. This is certainly easier than the cumbersome process in step 2 for a fixed-width file. Note that Excel ignores the commas in the Customer field when there are quotation marks around the customer.

Finally, to make the code more readable, you can use the constant names rather than the code numbers:

Click here to view code image

Workbooks.OpenText Filename:="C:sales.txt", _

DataType:=xlDelimited, _Comma:=True, _

FieldInfo:=Array(Array(1, xlGeneralFormat), _

Array(2, xlGeneralFormat), _

Array(3, xlMDYFormat), Array(4, xlGeneralFormat), _

Array(5, xlGeneralFormat), Array(6, xlGeneralFormat), _

Array(7, xlGeneralFormat), Array(8, xlGeneralFormat))

Excel has built-in options to read files in which fields are delimited by tabs, semicolons, commas, or spaces. Excel can actually handle anything as a delimiter. If someone sends pipe-delimited text, you set the Other parameter to True and specify an OtherChar parameter:

Click here to view code image

Workbooks.OpenText Filename:= "C:sales.txt", Origin:=437, _

DataType:=xlDelimited, Other:=True, OtherChar:= "|", FieldInfo:=...

Dealing with text files with more than 1,048,576 rows

If you use the Text Import Wizard to read a file that has more than 1,048,576 rows of data, you get this error: “File not loaded completely.” The first 1,048,576 rows of the file load correctly.

If you use Workbooks.OpenText to open a file that has more than 1,048,576 rows of data, you are given no indication that the file did not load completely. Excel 2019 loads the first 1,048,576 rows and allows macro execution to continue. Your only indication that there is a problem is if someone notices that the reports are not reporting all the sales. If you think that your files will ever get this large, it would be good to check whether cell A1048576 is nonblank after an import. If it is, the odds are that the entire file was not loaded.

Reading text files one row at a time

You might run into a text file that has more than 1,048,576 rows. When this happens, you have to read the text file one row at a time.

You need to open the file for INPUT as #1. You use #1 to indicate that this is the first file you are opening. If you had to open two files, you could open the second file as #2. You can then use the Line Input #1 statement to read a line of the file into a variable. The following code opens sales.txt, reads 10 lines of the file into the first 10 cells of the worksheet, and closes the file:

Click here to view code image

Sub Import10()

ThisFile = "Csales.txt"

Open ThisFile For Input As #1

For i = 1 To 10

Line Input #1, Data

Cells(i, 1).Value = Data

Next i

Close #1

End Sub

Rather than read only 10 records, you want to read until you get to the end of the file. Excel automatically updates a variable called EOF. If you open a file for input as #1, checking EOF(1) tells you whether you have read the last record.

Use a Do...While loop to keep reading records until you have reached the end of the file:

Click here to view code image

Sub ImportAll()

ThisFile = "C:sales.txt"

Open ThisFile For Input As #1

Ctr = 0

Line Input #1, Data

Ctr = Ctr + 1

Cells(Ctr, 1).Value = Data

Loop While EOF(1) = False

Close #1

End Sub

After reading records with code such as this, note in Figure 19-8 that the data is not parsed into columns. All the fields are in column A of the file.

Use the TextToColumns method to parse the records into columns. The parameters for TextToColumns are nearly identical to those for the OpenText method:

Click here to view code image

Cells(1, 1).Resize(Ctr, 1).TextToColumns Destination:=Range("A1"), _

DataType:=xlDelimited, Comma:=True, FieldInfo:=Array(Array(1, _

xlGeneralFormat), Array(2, xlMDYFormat), Array(3, xlGeneralFormat), _

Array(4, xlGeneralFormat), Array(5, xlGeneralFormat), Array(6, _

xlGeneralFormat), Array(7,xlGeneralFormat), Array(8, xlGeneralFormat), _

Array(9, xlGeneralFormat), Array(10,xlGeneralFormat), Array(11, _

xlGeneralFormat))

This figure shows an Excel file. There are 10 rows of data. All of the data for each row is in Column A. — **FIGURE 19-8** When you are reading a text file one row at a time, all the data fields end up in one long entry in column A.

Note For the remainder of your Excel session, Excel remembers the delimiter settings. There is an annoying bug (feature?) in Excel. After Excel remembers that you are using a comma or a tab as a delimiter, any time that you attempt to paste data from the Clipboard to Excel, the data is parsed automatically by the delimiters specified in the OpenText method. Therefore, if you attempt to paste some text that includes the customer ABC, Inc., the text is parsed automatically into two columns, with text up to ABC in one column and Inc. in the next column.

Rather than hard-code that you are using the #1 designator to open the text file, it is safer to use the FreeFile function. This returns an integer representing the next file number available for use by the Open statement. The complete code to read a text file smaller than 1,048,576 rows is as follows:

Click here to view code image

Sub ImportAll()

ThisFile = "C:sales.txt"

FileNumber = FreeFile

Open ThisFile For Input As #FileNumber

Ctr = 0

Line Input #FileNumber, Data

Ctr = Ctr + 1

Cells(Ctr, 1).Value = Data

Loop While EOF(FileNumber) = False

Close #FileNumber

Cells(1, 1).Resize(Ctr, 1).TextToColumns Destination:=Range("A1"), _

DataType:=xlDelimited, Comma:=True, _

FieldInfo:=Array(Array(1, xlGeneralFormat), _

Array(2, xlMDYFormat), Array(3, xlGeneralFormat), _

Array(4, xlGeneralFormat), Array(5, xlGeneralFormat), _

Array(5, xlGeneralFormat), Array(6, xlGeneralFormat), _

Array(7, xlGeneralFormat), Array(8, xlGeneralFormat), _

Array(9, xlGeneralFormat), Array(10, xlGeneralFormat), _

Array(10, xlGeneralFormat), Array(11, xlGeneralFormat))

End Sub

Reading text files with more than 1,048,576 rows

You can use the Line Input method to read a large text file. A good strategy is to read rows into cells A1:A1048575 and then begin reading additional rows into cell AA2. You can start in row 2 on the second set so that the headings can be copied from row 1 of the first data set. If the file is large enough that it fills up column AA, move to BA2, CA2, and so on.

Also, you should stop writing columns when you get to row 1048574 and leave two blank rows at the bottom. This ensures that the code Cells(Rows.Count, 1).End(xlup).Row finds the final row. The following code reads a large text file into several sets of columns:

Click here to view code image

Sub ReadLargeFile()

ThisFile = "C:sales.txt"

FileNumber = FreeFile

Open ThisFile For Input As #FileNumber

NextRow = 1

NextCol = 1

Do While Not EOF(1)

Line Input #FileNumber, Data

Cells(NextRow, NextCol).Value = Data

NextRow = NextRow + 1

If NextRow = (Rows.Count -2) Then

' Parse these records

Range(Cells(1, NextCol), Cells(Rows.Count, NextCol)) _

.TextToColumns _

Destination:=Cells(1, NextCol), DataType:=xlDelimited, _

Comma:=True, FieldInfo:=Array(Array(1, xlGeneralFormat), _

Array(2, xlMDYFormat), Array(3, xlGeneralFormat), _

Array(4, xlGeneralFormat), Array(5, xlGeneralFormat), _

Array(6, xlGeneralFormat), Array(7, xlGeneralFormat), _

Array(8, xlGeneralFormat), Array(9, xlGeneralFormat), _

Array(10, xlGeneralFormat), Array(11, xlGeneralFormat))

' Copy the headings from section 1

If NextCol > 1 Then

Range("A1:K1").Copy Destination:=Cells(1, NextCol)

End If

' Set up the next section

NextCol = NextCol + 26

NextRow = 2

End If

Loop

Close #FileNumber

' Parse the final section of records

FinalRow = NextRow - 1

If FinalRow = 1 Then

' Handle if the file coincidentally had 1048574 rows exactly

NextCol = NextCol - 26

Else

Range(Cells(2, NextCol), Cells(FinalRow, NextCol)).TextToColumns _

Destination:=Cells(1, NextCol), DataType:=xlDelimited, _

Comma:=True, FieldInfo:=Array(Array(1, xlGeneralFormat), _

Click here to view code image

Array(2, xlMDYFormat), Array(3, xlGeneralFormat), _

Array(4, xlGeneralFormat), Array(5, xlGeneralFormat), _

Array(6, xlGeneralFormat), Array(7, xlGeneralFormat), _

Array(8, xlGeneralFormat), Array(9, xlGeneralFormat), _

Array(10, xlGeneralFormat), Array(11, xlGeneralFormat))

If NextCol > 1 Then

Range("A1:K1").Copy Destination:=Cells(1, NextCol)

End If

DataSets = (NextCol - 1) / 26 + 1

End Sub

Usually you should write the DataSets variable to a named cell somewhere in the workbook so that later you know how many data sets you have in the worksheet.

As you can imagine, using this method, it is possible to read 660,601,620 rows of data into a single worksheet. The code you formerly used to filter and report the data now becomes more complex. You might find yourself creating pivot tables from each set of columns to create a data set summary and then summarizing all the summary tables with a final pivot table. At some point, you need to consider whether the application really belongs in Access. You can also consider whether the data should be stored in Access with an Excel front end, which is discussed in Chapter 21, “Using Access as a back end to enhance multiuser access to data.”

Using Power Query to load large files to the Data Model

If your goal is to create a pivot table from the text file, you can bypass the worksheet grid and load millions of rows directly into the Data Model. Now that Power Query is built in to Excel 2019, the macro recorder will record the process of importing data to the Data Model with Power Query. Use the following steps:

On the Data tab, in the Power Query group, select New Query, From File, From Text File.
Browse to the text file.
In the Power Query Home tab, open the Close And Load drop-down menu and choose Close And Load To.
In the Load To dialog box, choose Only Create Connection And Add This Data To The Data Model. Click OK. The data is loaded to the Power Pivot engine.

If you use the macro recorder during this process, your recorded code includes the M language statements required to define the query:

Click here to view code image

Sub ImportToDataModel()

' ImportToDataModel Macro

ActiveWorkbook.Queries.Add Name:="demo", Formula:= _

"let" & Chr(13) & "" & Chr(10) & _

" Source = Csv.Document(File.Contents(""C:demo.txt""), " & _

"[Delimiter="","",Encoding=1252])," & Chr(13) & "" & Chr(10) & _

" #""First Row as Header"" = Table.PromoteHeaders(Source)," & _

Chr(13) & "" & Chr(10) & _

" #""Changed Type"" = Table.TransformColumnTypes(" & _

"#""First Row as Header""," & _

"{{""StoreID"", Int64.Type}, {""Date"", type date}," & _

"{""Division"", type text}, {""Units"", Int64.Type}," & _

"{""Revenue"", Int64.Type}})" & Chr(13) & "" & Chr(10) & "i" & _

"""Changed Type"""

Workbooks("Book4").Connections.Add2 "Power Query - demo", _

"Connection to the 'demo' query in the workbook.", _

"OLEDB;Provider=Microsoft.Mashup.OleDb.1;" & _

"Data Source=$Workbook$;Location=demo", _

"""demo""", 6, True, False

End Sub

You can now use Insert, Pivot Table and specify This Workbook Data Model as the source for the pivot table.

Writing Text Files

The code for writing text files is similar to the code for reading text files. You need to open a specific file for output as #1. Then, as you loop through various records, you write them to the file by using the Print #1 statement.

Before you open a file for output, make sure that any prior examples of the file have been deleted. You can use the Kill statement to delete a file. Kill returns an error if the file was not there in the first place. In this case, you use On Error Resume Next to prevent an error.

The following code writes out a text file for use by another application:

Click here to view code image

Sub WriteFile()

ThisFile = "C:Results.txt"

' Delete yesterday's copy of the file

On Error Resume Next

Kill ThisFile

On Error GoTo 0

' Open the file

Open ThisFile For Output As #1

FinalRow = Cells(Rows.Count, 1).End(xlUp).Row

' Write out the file

For j = 1 To FinalRow

Print #1, Cells(j, 1).Value

Next j

End Sub

This is a somewhat trivial example. You can use this method to write out any type of text-based file. The code at the end of Chapter 18, “Reading from and writing to the web,” uses the same concept to write out HTML files.

Next steps

The next chapter steps outside the world of Excel and talks about how to transfer Excel data into Microsoft Word documents. Chapter 20, “Automating Word,” looks at using Excel VBA to automate and control Microsoft Word.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 19 Text file processing

Create new playlist

Sign In

Sign Up

CHAPTER 19Text file processing

Importing from text files

Importing text files with fewer than 1,048,576 rows

Opening a fixed-width file

Opening a delimited file

Dealing with text files with more than 1,048,576 rows

Reading text files one row at a time

Reading text files with more than 1,048,576 rows

Using Power Query to load large files to the Data Model

Writing Text Files

Next steps

Table of Contents for
Chapter 19 Text file processing

CHAPTER 19
Text file processing