Data Management

The basics of data operations

When we migrate an application on there is a very important process of migrating the data from the existing legacy system or spread sheets and loading it into the cloud application. To improve the data quality we also need to ensure that there are no duplicate entries and the data is clean.

Some basic data management operations are:

  • Data export:
    • Data can be exported for making periodic backups or downloading the entire data
    • It can also be used to get the template to insert data
  • Inserting data:
    • We can insert data into existing standard and custom objects
    • Also helpful in migrating users
  • Updating data:
    • Updating is needed to run a de-dupe check to remove duplicates
    • It is also used for enhancing or cleaning the data
  • Deleting data:
      The deleting operation is used to free up legacy data and to remove erroneous data

We will be seeing the above operations using the data loader, using a wizard as well as using the command line interface.

The importance of record IDs

Why do we have a separate section on record IDs? Once we master them, most of the data operations are a piece of cake. The record ID is a unique identifier of the record. The record ID is similar to the primary ID and foreign ID in a database table.

The record IDs are not just a bunch of random numbers, but have valuable information that can help us.

18-character and 15-character IDs uses two types of IDs, 18 character and 15 character. When we use the ID from the URL it is 15 characters, however, when we use the data loader we get 18 character IDs.

The 15-character ID is case sensitive while the 18 character ID is not case sensitive. The 18-character ID is used to migrate data from legacy systems or spread sheets, which do not recognize case-sensitive IDs. Both the IDs work with and point to the same record. However, if we are to migrate data from the spread sheets or export data to the spread sheets we have to use 18-character IDs.

The 15-character ID is case sensitive while the 18 character ID is not case sensitive. The 18-character ID is used to migrate data from legacy systems or spread sheets, which do not recognize case-sensitive IDs. Both the IDs work with and point to the same record. However, if we are to migrate data from the spread sheets or export data to the spread sheets we have to use 18-character IDs.

  1. URL of the record detail page.
  2. By running a report on the object.
  3. Data loader (API access).

URL and report will always return the 15-characters IDs while the data loader and the API return the 18-digit IDs. API, however, accepts both 15-character and 18-character IDs.

Let us now explore what we can gain from IDs in the URL.

Exploring the URL format

The easiest way to study the record ID is by understanding the standard Salesforce objects. These IDs are fixed and are of similar format across the Salesforce organization. Every custom object will have a fixed pattern, but will be unique to individual organizations only.

For the purpose of an example we will study the record ID of opportunity object. Open an opportunity record as shown in the following screenshot:


If you do not find it on display, please select the Standard Sales Application using the drop-down in the corner.

The record ID is seen in the address bar of the browser as shown in the following screenshot:

Let's observe the format of the URL of the address bar carefully, the URL is made up of different things as shown in the following screenshot:

Let us now look at the different parts of the URL separated in the preceding screenshot.

  • Instance name: The first part of the URL is the instance name. Salesforce. com provides multiple instances of the server and the instance name varies depending on the time and location from where you create the ID. For the full list of instance name and their status please visit:
    Packt Publishing
  • Salesforce Server URL: This is common for all the record IDs.
  • Record ID: The third part of the URL as seen in the preceding screenshot is the Record ID. The format of the URL is common for all the detail page of the records across the organizations. We will be looking at the Record ID in greater detail in the next section.

Record ID

One important thing to note is that the record ID is case sensitive. So 00690000003zRfq is not equal to 00690000003ZRFQ. There will be a unique ID per organization, no two records will have same ID. When we migrate data from one organization to another, the IDs are changed. The record ID is further split into two parts as shown in the following screenshot:

The first three characters of the record ID are the object identifiers, which help us to identify the object of the record.

The remaining characters are the unique record ID to identify the record. All the objects thus have three-character encoded ID, which is a prefix to their record ID elsewhere.

The following table shows us the three character prefixes and the objects that they are associated with:


Object Name



























These prefixes for the standard objects are the same across all the Salesforce organization. Now let's play with the URL a bit. If we wish to land on the list view of the standard object, all we need to do is append a /o against the object prefix example for the account object we need to add 001/o.

The final Salesforce URL in this case would be:

The instance name for every organization would be different. The name depends on the geography and the time when the Salesforce account is created, please check the instance name for your organization.

This will open the list view or the tab view of the object, similarly we can change the prefix to any standard object.

When we change the /o to /e it automatically opens the edit page for a new account:

This will open the edit page for the account directly. If we add the /e in front of a record ID, we get the update detail page for the record, for example, let's say we want to open the opportunity edit page from our original example.

Now if we add the /e in front of the URL, we get:

This will automatically open this opportunity for editing; try adding a /e against any record URL and we will get the edit page for the record with all the existing data pre-filled with information.

Only when we create the full-copy sandbox of the production organization, do we get similar IDs in productions and sandbox only once.

Record ID in the field is not editable even in code. We cannot include it in DML for insert, but will be useful for DML of update and delete.

Summarizing record IDs

Let us recap some important points we learned about record IDs:

  • A record ID is similar to the primary or foreign key in a database
  • Record IDs are case sensitive on
  • 18-character IDs are case insensitive
  • Every record ID has a prefix of the first three characters
  • Every record in multiple and similar organizations have a different ID
  • IDs are similar only for the first time when we have a full-copy sandbox of a production organization
  • 18-character IDs are used for migrating data from a legacy system that is case insensitive

Relationships — dependents first

In the general library system we have the following relationships:

The object Customer is related to Media through a junction object CustomerMedia. Media object is a master for books and video objects as shown in the following diagram:

As shown in the following diagram, Fine has a Lookup on Cards:

Finally, as shown in the following diagram Card has a Master-Detail lookup on Customer:

When we load data we have to load in the order of dependencies, for example, in the library system the CustomerMedia object is dependent on the Customer and Media object.

Hence, to load the CustomerMedia object we use the following sequence:

  1. Load Customer object data.
  2. Load Media object data.
  3. Finally load the CustomerMedia object data.

The rule of thumb when uploading the data is that every dependent object data should be loaded first. As the CustomerMedia object had two dependencies, one on Customer object and another on Media object, we loaded both the data first.

Modifying system fields

When we are loading legacy data into the Salesforce system during migration it is sometimes essential to change the Created Date, Last Modified Date, Last Modified by, and Created by entries. Normally, when we load data into the system, the dates are not modifiable and are system time stamped.

However, provides us with the facility for modifying the auditing fields only once during the first insert. This is helpful in migrating legacy data and preserving history.

To enable this feature we need to contact Salesforce customer support.

Features of modifiable system fields

Some of the important features of modifiable system fields are as follows:

  • The fields can be modified only once in the lifetime of the initial insert
  • They are accessible through API, that is, the data loader
  • All custom objects can have modifiable system fields, however, not all standard objects can have modifiable fields
  • Account, opportunity, contact, lead, case, task, and event can have modifiable fields
  • The fields are read-only for existing records

Connecting to Salesforce server through API

When loading the data from a data loader, we make an API call for the Salesforce server. The login credentials are passed through the user name + password + security token.

We can bypass the use of a security token if we white list the IP address from where the data is loaded.

The CRUD commands

The full-form of CRUD is create, read, update, and delete. provides us with all the four operations using API. Records can be inserted, updated, deleted, and extracted from the server.

The inserting process requires data to be inserted without the ID because generates the record ID when the record is created. Data is uploaded in the form of a .CSV file created using a spread sheet.

We can map individual fields to, however, to save time we can initially extract a single record with the selected fields and use it as a template to quickly map other fields.

The update process is the same as insert, but since we are updating an existing record we need an ID or an External ID during update. The delete command only needs the ID of the record to be deleted.

Apart from the general CRUD commands provides a special upsert command. Upsert is the combination of insert and update in a single call.

Upsert uses ID or external ID to match records with existing records, if no match is found or the ID is missing, it inserts the record. The upsert command is helpful in avoiding duplicates based on ID or external ID. If more than one record is matched, an error is reported.

External IDs

When we are migrating data from other systems, it is useful to have a foreign key to link data between the two systems. The external ID helps to create that link.

A custom field can be marked as an external ID to identify the primary keys of the legacy system (outside Salesforce). Text, number, e-mail, and auto-number fields can be flagged as external IDs.

Fields marked as external IDs are indexed and are available on all the objects that can have custom fields. These fields are searchable and appear in search queries.

An object can have three external ID fields. The external ID fields become the cross reference fields to the legacy systems. For example, if the following data is in the legacy spread sheets for the library system we can migrate it to Salesforce by mapping the Media Number to the Media Number External ID field.

The original data from the legacy system are given as follows:

Media Number

Media Name

Media Type


Harry Potter and the chamber of secret






Mission Impossible


We can map this legacy data to the fields using the following mapping:


(External ID)




Harry Potter and the chamber of secret






Mission Impossible


When we are migrating data from the legacy system into the system, we can upsert the data using the external ID field. This way there is no need to know the Salesforce record ID. We can prevent duplication of data using the external ID.

We can also load data into related objects without using ID. Ideally, when loading data into related objects, we need to include the Salesforce ID as the field name in the data files, but we can also include an external ID and perform an upsert. Using the external ID while loading relationship is only permitted during the upsert operation.

Exercise – migrating data from legacy system

The general library wishes to migrate the media information from the spread sheets they are using. They have a field called Media Number in the spread sheet, which is a unique identifier (primary key) in the current system. Migrate the data into the media object and avoid duplicates.

To migrate the data from the spread sheet, we first need to create an external ID field in the media object.

Create a Media_number field on the media object in Salesforce, the field can be a text field. Check the flag for external ID to flag it. While loading the data use the Media_ number fleld and use upsert to load the data.

Data loading tools

There are two methods provided by that help us in data migration, Cloud Import Wizard is a cloud-based tool and does not require any download. It is easy to use and quickly lets us upload few standard objects and all custom objects. Apex Data Loader (API based) is a downloadable utility that has many more advantages over the cloud-based tool. Let us look at both the tools in details:

Cloud-based data import wizard

The cloud-based data import wizard is a cloud-based solution to import data into the organization quickly. It is a easy-to-use tool to load accounts, contacts, leads, solutions and custom objects. We can load up to 50,000 records at a time. It is accessible only to the system administrator or profile with administration access.

The cloud-based data import wizard is a cloud-based solution to import data into the organization quickly. It is a easy-to-use tool to load accounts, contacts, leads, solutions and custom objects. We can load up to 50,000 records at a time. It is accessible only to the system administrator or profile with administration access.

  1. Navigate to User Name | Setup | Administration Setup | Data Management as shown in the following screenshot:
  2. We can import data in all the standard objects using the cloud-based data wizard hence provides us with the option of choosing from the following wizards:
    • Import Accounts/Contacts
    • Import Leads
    • Import Solutions
    Similarly, also provides us with a separate Import Custom Objects wizard that helps us load data in the custom object. Select the Import Custom Object Wizard for the exercise.
  3. On the next page we see basic instructions for using the wizard and on the final line we see the link to start the wizard. Click on Start the Import Wizard! as shown in the following screenshot:
  4. A new wizard will pop up as shown in the next screenshot; it contains the list of the entire custom object available for import. Select the Media object and click on Next.
  5. The next step confirms if we want to set the de-dupe check before inserting the records. Select Yes, to enable the de-dupe check as shown in the following screenshot:
  6. If the Media Number is created as an External ID field it will be reflected on this page. If it is not and there are no external IDs, the radio button would be disabled.

  7. The Data Import wizard lets us import special relationships using name, record ID, or any external ID. On the next screen, we can choose the owner field for the records:
  8. As the media object has record types, the next screen allows us to choose the Record Type of data to be inserted. We can insert only one record type at a time.
  9. Select the .CSV file to upload. The file should have the column headings as the first row, this is helpful in identifying the column type and easy to map. On the next page, we can map the fields to the Salesforce columns as shown in the next screenshot. Map the columns from Excel to Salesforce fields and click on Next.
  10. On the next screen it will show errors if any and give general warning messages regarding updates. All the fields that are universally required should be mapped with some value to upload.
  11. If we do not specify the OwnerID while uploading, the user who uploads the file is assigned the owner of the entire records.

  12. Finally, click on Import Now, we will receive an e-mail when the import is completed.

Apex data loader

The Apex data loader is a .net utility used to upload data to the organization. We can load any object that has API_Access. Unlike other tools, the data loader is not a cloud-based tool, but a desktop utility built for systems running on windows. Some features of the data loader are:

  • The Apex data loader is available in unlimited edition, enterprise edition, and developer edition orgs
  • It supports the CSV (Comma Separated Value) format to load data and export data
  • It is a useful tool to backup and archive your office data
  • The data loader also runs on command line

Some salient features of the data loader are listed as follows:

  • The Apex data loader is an API-based tool used to load Salesforce data
  • We can load more than 50,000 records using it and also schedule data loads
  • Data can be exported and mass deleted
  • The Apex data loader can run from the command line
  • There is no limit for 50,000 records
  • We can export data for backup and mass-delete
  • We can also schedule the data loading at regular intervals
  • Data can be imported and exported using CSV and JDBC

Downloading the data loader

Unlike other features of, the data loader is not completely on cloud. We need to physically download and install it on the machine to use. To obtain and install the data loader perform the following steps:

  1. Navigate to User name | Setup | Data Management | Data Loader.
  2. Download the Data Loader.
  3. An unofficial Mac version of the data loader is also available at Packt Publishing

  4. Launch Install Shield Wizard and follow the instructions.

Install the data loader in the machine, before the proceeding to the next section. In the next section we will be loading data.

Using the data loader

The data loader helps us in exporting data for backup, inserting data, deleting and hard deleting the data as shown in the following screenshot:

Let us export some opportunities for the purpose of an example:

  1. Click on Export on the Apex Data Loader.
  2. If you are not logged in it will ask for a login. Please ensure you append the security token to the password as shown in the following screenshot and click on Next.
  3. Select the object to extract data from and the folder in which the data should be extracted. Ensure you give the file name as .csv as shown in the following screenshot and click on Next.
  4. Prepare the query in the next screen as shown in the next screenshot. Select the fields to download from the field chooser, situated on the left. We can add conditions to the query using the condition box on the right.
  5. Don't forget to click on the Add condition button to add the condition in the query.
  6. The final query is formed in the wide input text box, which includes the fields and the conditions.
  7. Click on Finish and it will start extracting. If there are some errors the final report will show two files success.csv and error.csv.
  8. Error files will have the reason for the error in the last column.
  9. The finished data will be extracted in the file.

Upserting data

Now that we have exported the data, let us try upserting the data. For the purpose of this example we will upsert the data using the template from extracting it. It is always advisable to extract the fields that we are supposed to update, insert, or upsert so that we get a template. Perform the following steps to upsert the data:

  1. Click on the Upsert button on the main screen of Data Loader, select the object for upserting and click on Next:
  2. In the next screen, choose the matching ID. If there is an external ID field defined in the drop-down there will be an option to select the external ID. This ID is important to determine the duplicates before loading.
  3. In the next page we prepare the mapping to map the fields in the CSV file to the fields on the object. As shown in the next screenshot, click on Create or Edit a map. We can also save this map for future use.
  4. As shown in the following screenshot, we can map the Salesforce fields displayed on the top to the fields in Excel displayed at the bottom.
  5. Drag the respective fields from the Salesforce object over the fields from the CSV file. If the column headers in Excel are the same we can directly click on the Auto-Match Fields to Columns.
  6. We can also save the mappings in an external file for future use. Finish the mappings and click on Next.
  7. The wizard asks you to choose a directory in the next screen. This directory is where the success and error files are created. If some new records are inserted, the success file comes with the ID of these new records while the error file comes back with the error code.
  8. Select the directory and click on Finish.

Thus we have seen the commands of export and upsert using the data loader. The commands of insert and update are very similar to upsert. However, they won't apply mapping with an external ID. The operation of delete and update requires a Salesforce ID.

Setting up the data loader

By default the data loader is configured to operate using some default parameters. We can further configure it as per our requirements to improve performance.

The data loader by default works only with the production organization. To make it operable with the sandbox, we need to modify some parameters. Let us configure the data loader for sandbox operation.

Go to Settings | Settings in the Data Loader. A new window will open as shown in the following screenshot:


There are five major configuration changes that we need to keep in mind while setting up the Data Loader. Let us look at each one:

  1. Server Host(clear for default): Server host is the end point URL where the data loader should connect. By default the server host would be https:// This end point URL is used when we are uploading data to the production organization or the developer organization. When we are uploading data to Sandbox, however, we need to change it to https://
  2. Use Bulk API: By default the maximum Batch size (5) is 200 while uploading using the data loader, however, if the data is large we can use the bulk API which increases the batch size to 2,000.
  3. Start at row: If our batch operations are suddenly interrupted at any point, the record number is indicated on the label of this field. We can choose to resume the operation from this number.
  4. Insert null values: When we are uploading the data using the data loader we can specify if the blank spaces should be treated as a null value. This is especially helpful while inserting Date and Date/Time field. When using the Bulk API (2) trying to insert the blank values throws an error. In this case, when using this checkbox with the null values replace all the blanks with #NA.
  5. Batch Size: The data loader loads data in batches of fixed size. The maximum batch size of a normal data loader is 200, while the maximum batch size of loading using bulk API is 2,000. With batches multiple records are inserted in a single shot, for example if we are loading 1,000 records, the normal data loader will send request to with five times the data. However, the cloud import tool can do the same operation in just a single request. We can reduce the number of batch size, but we cannot increase it beyond maximum

Using data loader through the command line

The data loader is a very easy and intuitive tool to use for loading data. As easy as it is to use, however, it is also requires manual intervention and a user input to operate. However, many times we require automated process to upload large chunks of data. For these cases we use the data loader through the command line interface.

Command line data loader performs the same operations as the normal data loader without the GUI and the easy-to-use interface.

Configuring the command-line data loader

Before we fire up the data loader with the command line interface we need to configure it so that it can perform the desired operations.

The main configuration file is file, which is located in the Data Loader folder. The default location of data loader is: c:\Program Files\\Apex Data Loader 22.0

For help using the command line data loader give the command -help at the command-line.

As the command line data loader runs on the command prompt we have very few options to customize and format the commands during the operation. To overcome this, we prepare some les that store the basic commands and configuration needed to run the data loader from command line. Let us look at some of the fisles used in operation .

Preparing the process-conf.xml file

The process-conf.xml file contains a description for every potential process that could be called from the command line. Each of these processes is referred to as a Process Bean. A sample Process-conf.xml file is shown in the following screenshot:

The section marked in the outer rectangle is a single bean, which holds the data operation. The following are the main properties:

  • Name: Name is used to identify the bean in the file and when we call it from the command line interface, for example, accountMasterProcess.
  • sfdc.entity: The Salesforce object that is the target of the actions. The value of this property is case sensitive. For example, Account (note: Capital A).
  • process.operation: This attribute identifies the type of operation for example, insert, upsert, update, and so on.
  • process.mappingFile: This attribute identifies the mapping between the fields in the source file ( and the Salesforce object. This mapping file is a .sdl file. This file can be easily created from the GUI of the data loader.
  • dataAccess.type: This attribute identifies the operation to perform on the source file, For example, if we are using the ODBC data source the property will read databaseRead, if we are including data from CSV, the property will be CSVRead.
  • sfdc.username: This stores the login name for the command line run. If there is no owner name record specified, this will be the new owner.
  • sfdc.password: The configuration file is stored as a plain text on the machine, storing the password in the file is not a good and secure way. The sfdc.password and process.encryptionKeyFile serve the purpose of encrypting the password for added security. To generate the key and the encrypted password, the data loader provides the utility Encrypt.bat with its installation. We will be looking at Encrypt.bat file in the next section.

These are the entities in process-conf.xml file. This file has to be prepared every time we need to perform operation. If we need to perform multiple operations in a single time the entire &ltbean> </bean> should be repeated with all the parameters inside it.

Once we configure the desired files we are now ready to run the operation. To run the command line job, the data loader comes with a process.bat file kept in the Data loader folder\bin\ folder.

To run the batch process we run the process using the name of the command as input. process ../accountMasterProcess

In this case, process is the command given to run the data loader and accountMasterProcess is the name we have given for ID in Bean attribute .

Encrypting a password using Encrypt.bat

To login to the organization, we need to specify the passing in the file. For security purposes the password should be stored encrypted so that no unauthorized person can read it.

Salesforce provides us with the encrypt.bat utility that encrypts the password from plain text. The utility is available at the default location Data loader folder\bin\ encrypt.bat.

encrypt.bat runs in the command line and supports the following commands:

  • Generate a key: This command generates a key from the text we provide. Usually we should provide the password with the security code in plain text format to generate an encrypted key.
  • Encrypt text: It performs the same operation as generating a key, only it can be used to encrypt the general text provided.
  • Verify encrypted text: It matches a plain text and the encrypted password and prints a success or failure message.

Troubleshooting the data loader

If there is some error in the operation of the data loader we can access the log files of the data loader. The two log files are:

  • sdl.log: It contains a detailed chronological list of data loader log entries. Log entries marked - INFO are procedural items, such as logging in to Salesforce. Log entries marked - ERROR are problems, such as a submitted record missing a required field.
  • sdl_out.log: A supplemental log that contains additional information not captured in sdl.log. For example, it includes log entries for the creation of proxy server network connections.

These files are present in the temp folder of the system running on Microsoft Windows and can be accessed by entering %TEMP%\sdl.log and %TEMP%\sdl_out. log in either the Run dialog.

Apex data loader versus cloud-based import wizard

The data loader and the web-based tool perform similar operations. This is a quick comparison of what to use in what kind of cases.

Apex data loader

Cloud-based import tool wizard

Supports more than 50,000 records.

Good to import records less than 50,000.

Supports all the objects available for data important.

Supports only few important objects for data important.

We want to schedule the data imports example daily import, nightly import.

We need an automatic de-dupe check based on account name and site, contact e-mail address, or lead e-mail address.


In this article, we have seen the crucial data operations on We have studied the two ways provided by to perform massive data operations. The data loader is a client application for the bulk import or export of data. Use it to insert, update, delete, or extract Salesforce records. Similarly, we have web-based import wizards, which are available in the Data Management menu in the setup.

Further resources on this subject:

You've been reading an excerpt of: Developer Certification Handbook (DEV401)

Explore Title