Data loading tools

There are two methods provided by Force.com that help us in data migration, Cloud Import Wizard is a cloud-based tool and does not require any download. It is easy to use and quickly lets us upload few standard objects and all custom objects. Apex Data Loader (API based) is a downloadable utility that has many more advantages over the cloud-based tool. Let us look at both the tools in details:

Cloud-based data import wizard

The cloud-based data import wizard is a cloud-based solution to import data into the organization quickly. It is a easy-to-use tool to load accounts, contacts, leads, solutions and custom objects. We can load up to 50,000 records at a time. It is accessible only to the system administrator or profile with administration access.

To use the data importing wizard perform the following steps:

  1. Navigate to User Name | Setup | Administration Setup | Data Management as shown in the following screenshot:
    Cloud-based data import wizard
  2. We can import data in all the standard objects using the cloud-based data wizard hence Force.com provides us with the option of choosing from the following wizards:
    • Import Accounts/Contacts
    • Import Leads
    • Import Solutions

    Similarly, Force.com also provides us with a separate Import Custom Objects wizard that helps us load data in the custom object. Select the Import Custom Object Wizard for the exercise.

  3. On the next page we see basic instructions for using the wizard and on the final line we see the link to start the wizard. Click on Start the Import Wizard! as shown in the following screenshot:
    Cloud-based data import wizard
  4. A new wizard will pop up as shown in the next screenshot; it contains the list of the entire custom object available for import. Select the Media object and click on Next.
    Cloud-based data import wizard
  5. The next step confirms if we want to set the de-dupe check before inserting the records. Select Yes, to enable the de-dupe check as shown in the following screenshot:

    Note

    If the Media Number is created as an External ID field it will be reflected on this page. If it is not and there are no external IDs, the radio button would be disabled.

    Cloud-based data import wizard
  6. The Data Import wizard lets us import special relationships using name, record ID, or any external ID. On the next screen, we can choose the owner field for the records:
    Cloud-based data import wizard
  7. As the media object has record types, the next screen allows us to choose the Record Type of data to be inserted. We can insert only one record type at a time.
    Cloud-based data import wizard
  8. Select the .CSV file to upload. The file should have the column headings as the first row, this is helpful in identifying the column type and easy to map. On the next page, we can map the fields to the Salesforce columns as shown in the next screenshot. Map the columns from Excel to Salesforce fields and click on Next.
    Cloud-based data import wizard
  9. On the next screen it will show errors if any and give general warning messages regarding updates. All the fields that are universally required should be mapped with some value to upload.

    Note

    If we do not specify the OwnerID while uploading, the user who uploads the file is assigned the owner of the entire records.

  10. Finally, click on Import Now, we will receive an e-mail when the import is completed.

Apex data loader

The Apex data loader is a .net utility used to upload data to the organization. We can load any object that has API_Access. Unlike other tools, the data loader is not a cloud-based tool, but a desktop utility built for systems running on windows. Some features of the data loader are:

  • The Apex data loader is available in unlimited edition, enterprise edition, and developer edition orgs
  • It supports the CSV (Comma Separated Value) format to load data and export data
  • It is a useful tool to backup and archive your office data
  • The data loader also runs on command line

Some salient features of the data loader are listed as follows:

  • The Apex data loader is an API-based tool used to load Salesforce data
  • We can load more than 50,000 records using it and also schedule data loads
  • Data can be exported and mass deleted
  • The Apex data loader can run from the command line
  • There is no limit for 50,000 records
  • We can export data for backup and mass-delete
  • We can also schedule the data loading at regular intervals
  • Data can be imported and exported using CSV and JDBC

Downloading the data loader

Unlike other features of Force.com, the data loader is not completely on cloud. We need to physically download and install it on the machine to use. To obtain and install the data loader perform the following steps:

  1. Navigate to User name | Setup | Data Management | Data Loader.
  2. Download the Data Loader.

    Note

    An unofficial Mac version of the data loader is also available at http://www.pocketsoap.com/osx/lexiloader/

  3. Launch Install Shield Wizard and follow the instructions.

Install the data loader in the machine, before the proceeding to the next section. In the next section we will be loading data.

Using the data loader

The data loader helps us in exporting data for backup, inserting data, deleting and hard deleting the data as shown in the following screenshot:

Using the data loader

Let us export some opportunities for the purpose of an example:

  1. Click on Export on the Apex Data Loader.
  2. If you are not logged in it will ask for a login. Please ensure you append the security token to the password as shown in the following screenshot and click on Next.
    Using the data loader
  3. Select the object to extract data from and the folder in which the data should be extracted. Ensure you give the file name as .csv as shown in the following screenshot and click on Next.
    Using the data loader
  4. Prepare the query in the next screen as shown in the next screenshot. Select the fields to download from the field chooser, situated on the left. We can add conditions to the query using the condition box on the right.
  5. Don't forget to click on the Add condition button to add the condition in the query.
  6. The final query is formed in the wide input text box, which includes the fields and the conditions.
    Using the data loader
  7. Click on Finish and it will start extracting. If there are some errors the final report will show two files success.csv and error.csv.
  8. Error files will have the reason for the error in the last column.
  9. The finished data will be extracted in the file.

Upserting data

Now that we have exported the data, let us try upserting the data. For the purpose of this example we will upsert the data using the template from extracting it. It is always advisable to extract the fields that we are supposed to update, insert, or upsert so that we get a template. Perform the following steps to upsert the data:

  1. Click on the Upsert button on the main screen of Data Loader, select the object for upserting and click on Next:
    Upserting data
  2. In the next screen, choose the matching ID. If there is an external ID field defined in the drop-down there will be an option to select the external ID. This ID is important to determine the duplicates before loading.
    Upserting data
  3. In the next page we prepare the mapping to map the fields in the CSV file to the fields on the object. As shown in the next screenshot, click on Create or Edit a map. We can also save this map for future use.
    Upserting data
  4. As shown in the following screenshot, we can map the Salesforce fields displayed on the top to the fields in Excel displayed at the bottom.
    Upserting data
  5. Drag the respective fields from the Salesforce object over the fields from the CSV file. If the column headers in Excel are the same we can directly click on the Auto-Match Fields to Columns.
  6. We can also save the mappings in an external file for future use. Finish the mappings and click on Next.
  7. The wizard asks you to choose a directory in the next screen. This directory is where the success and error files are created. If some new records are inserted, the success file comes with the ID of these new records while the error file comes back with the error code.
    Upserting data
  8. Select the directory and click on Finish.

Thus we have seen the commands of export and upsert using the data loader. The commands of insert and update are very similar to upsert. However, they won't apply mapping with an external ID. The operation of delete and update requires a Salesforce ID.

Setting up the data loader

By default the data loader is configured to operate using some default parameters. We can further configure it as per our requirements to improve performance.

The data loader by default works only with the production organization. To make it operable with the sandbox, we need to modify some parameters. Let us configure the data loader for sandbox operation.

Go to Settings | Settings in the Data Loader. A new window will open as shown in the following screenshot:

Setting up the data loader

There are five major configuration changes that we need to keep in mind while setting up the Data Loader. Let us look at each one:

  1. Server Host(clear for default): Server host is the end point URL where the data loader should connect. By default the server host would be https://login.salesforce.com. This end point URL is used when we are uploading data to the production organization or the developer organization. When we are uploading data to Sandbox, however, we need to change it to https://test.salesforce.com.
  2. Use Bulk API: By default the maximum Batch size (5) is 200 while uploading using the data loader, however, if the data is large we can use the bulk API which increases the batch size to 2,000.
  3. Start at row: If our batch operations are suddenly interrupted at any point, the record number is indicated on the label of this field. We can choose to resume the operation from this number.
  4. Insert null values: When we are uploading the data using the data loader we can specify if the blank spaces should be treated as a null value. This is especially helpful while inserting Date and Date/Time field. When using the Bulk API (2) trying to insert the blank values throws an error. In this case, when using this checkbox with the null values replace all the blanks with #NA.
  5. Batch Size: The data loader loads data in batches of fixed size. The maximum batch size of a normal data loader is 200, while the maximum batch size of loading using bulk API is 2,000. With batches multiple records are inserted in a single shot, for example if we are loading 1,000 records, the normal data loader will send request to Force.com with five times the data. However, the cloud import tool can do the same operation in just a single request. We can reduce the number of batch size, but we cannot increase it beyond maximum.

Using data loader through the command line

The data loader is a very easy and intuitive tool to use for loading data. As easy as it is to use, however, it is also requires manual intervention and a user input to operate. However, many times we require automated process to upload large chunks of data. For these cases we use the data loader through the command line interface.

Command line data loader performs the same operations as the normal data loader without the GUI and the easy-to-use interface.

Configuring the command-line data loader

Before we fire up the data loader with the command line interface we need to configure it so that it can perform the desired operations.

The main configuration file is config.properties file, which is located in the Data Loader folder. The default location of data loader is: c:Program FilesSalesforce.comApex Data Loader 22.0

For help using the command line data loader give the command -help at the command-line.

As the command line data loader runs on the command prompt we have very few options to customize and format the commands during the operation. To overcome this, we prepare some files that store the basic commands and configuration needed to run the data loader from command line. Let us look at some of the files used in operation.

Preparing the process-conf.xml file

The process-conf.xml file contains a description for every potential process that could be called from the command line. Each of these processes is referred to as a Process Bean . A sample Process-conf.xml file is shown in the following screenshot:

Preparing the process-conf.xml file

The section marked in the outer rectangle is a single bean, which holds the data operation. The following are the main properties:

  • Name: Name is used to identify the bean in the config.properties file and when we call it from the command line interface, for example, accountMasterProcess.
  • sfdc.entity: The Salesforce object that is the target of the actions. The value of this property is case sensitive. For example, Account (note: Capital A).
  • process.operation: This attribute identifies the type of operation for example, insert, upsert, update, and so on.
  • process.mappingFile: This attribute identifies the mapping between the fields in the source file (dataAccess.name) and the Salesforce object. This mapping file is a .sdl file. This file can be easily created from the GUI of the data loader.
  • dataAccess.type: This attribute identifies the operation to perform on the source file, For example, if we are using the ODBC data source the property will read databaseRead, if we are including data from CSV, the property will be CSVRead.
  • sfdc.username: This stores the Force.com login name for the command line run. If there is no owner name record specified, this will be the new owner.
  • sfdc.password: The configuration file is stored as a plain text on the machine, storing the password in the file is not a good and secure way. The sfdc.password and process.encryptionKeyFile serve the purpose of encrypting the password for added security. To generate the key and the encrypted password, the data loader provides the utility Encrypt.bat with its installation. We will be looking at Encrypt.bat file in the next section.

These are the entities in process-conf.xml file. This file has to be prepared every time we need to perform operation. If we need to perform multiple operations in a single time the entire <bean> </bean> should be repeated with all the parameters inside it.

Once we configure the desired files we are now ready to run the operation. To run the command line job, the data loader comes with a process.bat file kept in the Data loader folderin folder.

To run the batch process we run the process using the name of the command as input. process ../accountMasterProcess

In this case, process is the command given to run the data loader and accountMasterProcess is the name we have given for ID in Bean attribute.

Encrypting a password using Encrypt.bat

To login to the Salesforce.com organization, we need to specify the passing in the config.properties file. For security purposes the password should be stored encrypted so that no unauthorized person can read it.

Salesforce provides us with the encrypt.bat utility that encrypts the password from plain text. The utility is available at the default location Data loader folderinencrypt.bat.

encrypt.bat runs in the command line and supports the following commands:

  • Generate a key: This command generates a key from the text we provide. Usually we should provide the password with the security code in plain text format to generate an encrypted key.
  • Encrypt text: It performs the same operation as generating a key, only it can be used to encrypt the general text provided.
  • Verify encrypted text: It matches a plain text and the encrypted password and prints a success or failure message.

Troubleshooting the data loader

If there is some error in the operation of the data loader we can access the log files of the data loader. The two log files are:

  • sdl.log: It contains a detailed chronological list of data loader log entries. Log entries marked—INFO are procedural items, such as logging in to Salesforce. Log entries marked—ERROR are problems, such as a submitted record missing a required field.
  • sdl_out.log: A supplemental log that contains additional information not captured in sdl.log. For example, it includes log entries for the creation of proxy server network connections.

These files are present in the temp folder of the system running on Microsoft Windows and can be accessed by entering %TEMP%sdl.log and %TEMP%sdl_out.log in either the Run dialog.

Apex data loader versus cloud-based import wizard

The data loader and the web-based tool perform similar operations. This is a quick comparison of what to use in what kind of cases.

Apex data loader

Cloud-based import tool wizard

Supports more than 50,000 records.

Good to import records less than 50,000.

Supports all the objects available for data important.

Supports only few important objects for data important.

We want to schedule the data imports example daily import, nightly import.

We need an automatic de-dupe check based on account name and site, contact e-mail address, or lead e-mail address.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset