More on ODI

Exclusive offer: get 50% off this eBook here
Oracle Data Integrator 11g Cookbook

Oracle Data Integrator 11g Cookbook — Save 50%

Over 60 field-tested recipes for successful data integration projects with Oracle Data Integrator with this book and ebook

$32.99    $16.50
by Christophe Dupupet Denis Gray Julien Testut Peter C. Boyd-Bowman | June 2013 | Cookbooks Enterprise Articles Oracle

Here, we are either using metadata differently than would be expected or we are digging a little deeper into concepts that we have already touched on. Truth be told, this is a catch-all article that addresses questions we've often had to answer, including is conceptions on how the product should be used, and little known secrets that we didn't want to leave aside.

To integrate ODI with external tools, we will look into how to invoke an external program that requires a password or other connectivity parameters, and how to connect to an Excel spreadsheet using ODBC. To help with performance, we will look into how to tune a standalone ODI agent and how to impact the data flow by changing the staging area location.

To load large data, we will work on an example of a KM that can load files containing LOBs. And finally, to help with lifecycle management

In this article by Christophe Dupupet,Peter C. Boyd-Bowman,Denis Gray and Julien Testut authors of Oracle Data Integrator 11g Cookbook, we will look into the use of ODI versioning as well as how to use smart export/import.

(For more resources related to this topic, see here.)

Invoking an external program that requires a password

There are many instances of external programs that will require a password to be used. For instance, you may be required to establish a VPN connection before you can even connect to your databases.

The example we will use here, because it is easier to demonstrate, will be to decrypt an encrypted file using an external program.

Getting ready

The external program that we will use to build in this recipe is AES Encrypt, an open source encryption application that can be downloaded at http://www.aescrypt.com/ download.html. After downloading the console version of the program for your operating system, unzip the executable and put it in a folder next to the file that you will want to encrypt and decrypt. The steps of this recipe assume that we have copied the executable in the c:\ temp directory. If you are using another operating system or if you decide to put the program in a different folder, you will have to adapt the instructions of this recipe accordingly.

How to do it...

  1. Open a command line console and go to the c:\temp directory. Add a text file of your choice in this directory. Here we are using a file called foo.txt:

  2. Run the following from the command line:

    c:\temp> aescrypt -e -p oracledi foo.txt

    This will generate an encrypted file named foo.txt.aes that is protected with the password oracledi. If you try to edit the content of this new file, you will see that it has indeed been encrypted:

    Rename the original foo.txt file to foo.txt.ori. We will now use ODI to decrypt the file and restore the original content.

  3. In ODI Topology, create a new Data Server in the File technology called AESCRYPT. We will use this data server as a placeholder for the password that we need for this operation. We can also use this to locate the file that we will work on. You can leave the User name empty (we do not need it here), but do set the Password value to oracledi.

    In the JDBC tab, select the ODI File JDBC Driver and use the default JDBC URL:

    jdbc:snps:dbfile.

  4. Create a physical schema under that server. Set both the Schema name and Work Schema name to c:\temp.

    Create a logical schema to point to this physical schema for the default context:

    FILE_AESCRYPT

  5. Switch to the Designer operator, and in the project of your choice, create a procedure called Ch11_Password_Decrypt_File.

    Add a new step to this procedure, and call this Decrypt.

    In the Command on Target tab, set the Technology to OS Command (leave all other fields to their default values) and type the following command (this is all one line of code, so please ignore the carriage returns that are only due to formatting):

    <%=odiRef.getInfo("SRC_SCHEMA")%>\aescrypt -d -p <@=odiRef.
    getInfo("SRC_PASS")@> <%=odiRef.getInfo("SRC_SCHEMA")%>\foo.txt.
    aes

    The technology OS Command can be used to execute any type of command at the OS level. One note of caution though: if the script or program that you execute returns anything else than 0, ODI will consider that the program failed. Typically there are two techniques that can be combined to solve this problem:

    Redirect any output or errors to other files. For instance, dir 1>dir.out 2>dir.err, where 1 represents the standard output and 2 represents the errors output.

    Use Jython code to execute the code and retrieve the return code in a variable. Then evaluate this variable to see whether the returned value indicates an error or is simply informational.

    In the Command on Source tab, set the Technology to File, set the logical Schema name to FILE_AESCRYPT, and leave all other fields to their default values. There will be no code on the source side.

  6. Save the procedure and run it.

  7. Now, if you go to the Operator window to look into the details of the code that was generated, you can see that the password is not revealed in the operator logs. We are looking in the following screenshot at the code generated for the task:

  8. Now back to the console, we can see that the original un-encrypted file, foo.txt, was restored along with its original content. You can choose to keep or delete the encrypted file foo.txt.aes.

How it works...

It is very common to have code written on both the source and target commands of a KM (LKMs and multi-technology IKMs) or in a procedure step. One technique that is not necessarily well known is to use the source technology as a placeholder for connectivity information that can be used in the code on the target side. In our example, the technology on the target side is OS Command, which does not give us much flexibility in terms of configuration and connectivity parameters. So, we use an artificial source connection to point to the data server that contains the information we need. Then we leverage that information as needed by leveraging the odiRef.getInfo substitution method to extract the required parameters. In addition, ODI makes sure that the password that we retrieve and pass into the external tool is never revealed in the operator logs, as it is encapsulated with the <@@> syntax.

There's more...

The selection of logical schemas in the source and target connections allows us to leverage any of the parameters defined in either connection. If we were to establish a VPN connection for instance, we could leverage the Host (Data Server) entry to retrieve the name of the remote server then retrieve the username and password to authenticate against this server. When dealing with external components, which require this type of information, think of leveraging Topology to securely store the connectivity information.

The additional benefit is that if you leverage ODI contexts, the same logic will work across all environments, from development to production.

Tuning a standalone ODI agent

An ODI agent orchestrates all ODI executions. As such, it is a central piece of the infrastructure. Properly tuning the agent and understanding what requires tuning will help you better manage your infrastructure.

Getting ready

All we need for this recipe is to have a standalone agent already installed on one of your systems. If you do not have an agent available, you can run the ODI installer and select the ODI Standalone Agent option. For the recipe that follows, we assume that the agent was installed on a Windows machine, in the folder c:\oracledi\products\11.1.1.6. If your agent is installed in a different directory or on a different operating system, keep this in mind as you follow these steps.

How to do it...

  1. Go to the bin directory of the agent:

    c:\oracledi\products\11.1.1.6\oracledi\agent\bin

  2. Edit the file odiparams.bat (you will have to edit the odiparams.sh file on Linux or Unix systems).

  3. In the file, identify the parameters ODI_INIT_HEAP and ODI_MAX_HEAP. You will notice that the default values are 32 and 256 respectively. Double these values to 64 and 512, then save the file:

  4. If your agent is already running, you will have to stop and restart the agent to take these new values into consideration.

How it works...

The parameters we have modified control how much memory is initially allocated to the ODI agent (ODI_INIT_HEAP) and what is the maximum amount of memory that the agent can request (ODI_MAX_HEAP). These parameters are very important because they control the amount of memory available to the agent. Most actions performed by the agents are running in memory:

  • When data is fetched with a JDBC connection, this data will be transferred through the agent memory space

  • If you use the ODI memory engine, this in-memory database will actually use the memory space of the agent

  • The default behavior of the ODI JDBC driver for XML is to load data into the in-memory database, and as such, it uses more of the agent's allocated memory

These operations are in addition to the orchestrations that the agent always performs:

  • Reading the scenarios from the repository

  • Completing the scenario code according to the context selected for the execution of that scenario

  • Updating the logs with execution results and statistics

The more operations you ask for the agent to run in memory, the more memory you will need to allocate to the agent.

Keep in mind that if you are running several agents on the same server, each agent will have its own memory space. This can be useful to segregate memory intensive operations, but this will also use more of the available memory on that server.

The same will be true with JEE agents, but in this case, the memory parameters will be part of the configuration of the server itself. Refer to your WebLogic Server documentation for more details.

When we are using JDBC to connect to the source and target servers, we can influence how much memory will be used to transfer data from source to target. If you edit any of the data servers in the Topology navigator, you will see two parameters at the bottom of the window: Array Fetch Size and Batch Update Size:

The JDBC protocol is defined so that records are processed in limited batches to make sure that memory is not exhausted immediately with very large queries. By default, these two parameters have a value of 30 when ODI ships, which means that JDBC will process the records 30 at a time. By increasing the value of these parameters, you can improve performance by retrieving more records each time. Just keep in time that by doing so, you are using more of the agent's memory.

When changing the values of the Array Fetch Size and Batch Update Size parameters, it is recommended to have the same value for the Array Fetch Size on the source data server and the Batch Update Size on the target data server. Different values can result in buffering at the agent level, which can be counter-productive in terms of performance.

There's more...

Increasing the agent's memory parameters will only work as long as there is enough memory available on the server hosting the agent. Before looking into increasing the value of these parameters, we should always try to use less memory. Techniques that do not leverage the memory space of the agent usually have better performance, if only because the agent does not have to handle the data anymore and simply behaves as an orchestrator:

  • Instead of using JDBC, try and use database loading utilities: external tables, sqlldr, and data pump are some examples available on Oracle databases. Similar utilities are available on other databases: ODI ships out of the box with KMs that support most of these utilities.

  • The in-memory engine has one large drawback: it runs in-memory, and as such is more limited than actual databases. Let's be clear: it will perform well only as long as there is enough physical memory available. After that, we are talking about degrading performance, as memory blocks are swapped to disk to leverage virtual memory. You are usually better off using an actual database, and databases today do cache data in memory when it makes sense to do so.

  • If you are handling very large XML files that cannot fit easily in memory, use the driver's property db_props and point to a .properties file that contains all the necessary parameters to connect to an external database. The benefit of this approach is that it allows you to process a lot more files in parallel (files processed in parallel all share the same agent memory space), and also much bigger files. You can look back to Chapter 9, XML and Web Services, where this topic is discussed in details.

Loading a file containing a LOB

Loading large objects always requires special considerations. Here, we will create a multi-technology IKM (that is, an IKM that connects to a remote source data server) that loads CLOBs and BLOBs using an external table definition.

Getting ready

For this recipe, we will need to create three files on disk:

  • CLOB.TXT: use notepad and write This is a CLOB in this file. Save it on disk in your c:\temp directory.

  • BLOB.DOC: use a word processor program and create this file. Write this is a BLOB in the file and save it in your c:\temp directory.

  • Use notepad and create the file DATA.TXT with the following record:

    "Sample record with CLOB and BLOB", "CLOB.TXT", "BLOB.DOC"

  • Save this file in your c:\temp directory.

  • Create a table to load the LOBs in your database:

    Create table LOB_SAMPLE(
    Description VARCHAR2(100),
    CLOB_Data CLOB,
    BLOB_Data BLOB
    );

  • You will have to reverse engineer the file DATA.TXT in a file model. Define the file with no header, use the comma as the field separator, and use the following names for the three columns: Description, Clob_File, and Blob_File. Use the double quote character (") for the text delimiter.

  • You will have to reverse engineer the LOB_SAMPLE table.

  • You will need a project where the KM IKM SQL Control Append has been imported.

How to do it...

  1. Make a copy of IKM SQL Control Append and rename the new KM IKM File SQL Append (LOB).

  2. Expand the KM in the tree and remove all the options except for INSERT, COMMIT, and DELETE_TEMORARY_OBJECTS.

  3. Edit the IKM. In the Definition tab, select the option Multi-Connections. Then, set the source Technology to File and the target Technology to Oracle.

  4. We will simplify the IKM, so delete all the steps except for the following ones:

    • Drop flow table

    • Create flow table I$

    • Insert new rows

    • Commit Transaction

    • Drop flow table

  5. Add a new step, name this step Create Oracle Directory, and copy this code:

    create or replace directory dat_dir AS '<%=snpRef.
    getSrcTablesList("", "[SCHEMA]", "", "")%>'

    Move this step up to make it the second step in the list after the first Drop flow table.

  6. Edit the two steps named Drop flow table. At the very bottom of the steps details, expand the Option entry and select Always Execute: in the original IKM, these steps were conditioned by the FLOW_CONTROL option, which we have removed.

  7. Edit the step Create Flow table I$. At the very bottom of the step details, expand the Option entry and select Always Execute. Then replace the original code with this:

    create table <%=odiRef.getTable("L", "INT_NAME", "A")%>
    (
    <%=snpRef.getColList("", "[COL_NAME]\t[DEST_WRI_DT]",
    ",\n\t", "","")%>
    )
    ORGANIZATION EXTERNAL
    (
    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY dat_dir
    ACCESS PARAMETERS
    (
    RECORDS DELIMITED BY NEWLINE
    CHARACTERSET "WE8ISO8859P1 "
    BADFILE '<%=snpRef.getSrcTablesList("",
    "[RES_NAME]", "", "")%>.bad'
    LOGFILE '<%=snpRef.getSrcTablesList("",
    "[RES_NAME]", "", "")%>.log'
    DISCARDFILE '<%=snpRef.getSrcTablesList("",
    "[RES_NAME]", "", "")%>.dsc'
    SKIP <%=snpRef.getSrcTablesList("",
    "[FILE_FIRST_ROW]", "", "")%>
    FIELDS TERMINATED BY '<%=snpRef.getSrcTablesList("",
    "[SFILE_SEP_FIELD]", "", "")%>'
    <% if(snpRef.getSrcTablesList("", "[FILE_ENC_FIELD]",
    "", "").equals("")){%>
    <%} else {%>OPTIONALLY ENCLOSED BY
    '<%=snpRef.getSrcTablesList("", "[FILE_ENC_FIELD]", "",
    "")%>' AND '<%=snpRef.getSrcTablesList("",
    "[FILE_ENC_FIELD]", "", "")%>' <%}%>
    MISSING FIELD VALUES ARE NULL
    (
    <%=snpRef.getColList("", " " +"<? if
    (\u0022[DEST_WRI_DT]\u0022.equals(\u0022CLOB\u0022))
    {?> [EXPRESSION] <?}else if
    (\u0022[DEST_WRI_DT]\u0022.equals(\u0022BLOB\u0022))
    {?> [EXPRESSION] <?}else{?>[COL_NAME] <?}?>"
    +»CHAR([LONGC])», «,\n\t\t\t», «»,»»)%>
    )
    COLUMN TRANSFORMS (<%=odiRef.getColList(«», «[COL_NAME]
    from LOBFILE ([EXPRESSION]) from (dat_dir)
    CLOB»,»,/n»,»»,»UD1»)%>
    <%=odiRef.getColList(«,», «[COL_NAME] from LOBFILE
    ([EXPRESSION]) from (dat_dir) BLOB»,»,/n»,»»,»UD2»)%>)
    )
    LOCATION (<%=snpRef.getSrcTablesList(«»,
    «'[RES_NAME]'», «», «»)%>)
    )
    PARALLEL 2
    REJECT LIMIT UNLIMITED

  8. Simplify the code of the step Insert new rows to only keep the following:

    insert into <%=odiRef.getTable("L","TARG_NAME","A")%>
    (
    <%=odiRef.getColList("", "[COL_NAME]", ",\n\t", "", "((INS and
    !TRG) and REW)")%>
    <%=odiRef.getColList(",", "[COL_NAME]", ",\n\t", "", "((INS
    and TRG) and REW)")%>
    )
    select <%=odiRef.getColList("", "[COL_NAME]", ",\n\t", "", "((INS
    and !TRG) and REW)")%>
    <%=odiRef.getColList(",", "[EXPRESSION]", ",\n\t", "", "((INS
    and TRG) and REW)")%>
    From <%=odiRef.getTable("L","INT_NAME","A")%>

  9. Now save the IKM and create a new interface called Load LOBs where you are using the file DATA.TXT as a source and the table LOB_SAMPLE as a target. In the Overview tab, select Staging Area Different From Target and select the File logical schema that points to the DATA.TXT file in the schema drop down (no worries, we will not actually create anything in the staging area). Then, map the columns as follows:

    Source columns

    Target Columns

    Mappings

    Description

    Description

    DAT.Description

    Clob_File

    CLOB_Data

    Clob_File

    Blob_File

    BLOB_Data

    Blob_File

    We have removed the alias name from the mapping of both LOBs using Clob_File and Blob_File instead of DAT.Clob_File and DAT.Blob_File, otherwise this would generate invalid code for the external table definition used in the KM.

  10. In the QuickEdit tab, select the option UD1 for the CLOB column, and the option UD2 for the BLOB column.

  11. In the Flow tab, click on the target data server to select the IKM File SQL Append (LOB).

  12. Save and run the interface. You can check for successful execution in the operator navigator. If you run a select statement against the table LOB_SAMPLE, you can confirm that you have just loaded a CLOB and a BLOB.

How it works...

In this recipe, we are taking advantage of multiple elements that are at our disposal in the redaction of KMs.

Multi-technology IKMs can only be used when the staging area is not on the target server, hence their name, since they can connect to a different technology to access the staging area.

By forcing the staging area on the source data server, we eliminate the need for an LKM; source and staging are now on the same server.

As a result, by using this multi-technology IKM, we bypass the creation of both C$ and I$ staging tables. First, since we do not have an LKM, there is no C$ table. Second, since we use an external table to map the file into the database, the data will go straight from the file into the target table, removing the need for an I$ table. One thing to remember with such IKMs though: you must use the source technology as your staging area as we did in step 9 of this recipe. This can be quite counter-intuitive when you are using a flat file as your source, but since we are not creating any staging table, we are safe doing so.

Next, because of the specific nature of CLOBs and BLOBs, we need to use the name of the columns of both the target table and source table to generate the proper code that will create the external table (this is the reason why we are removing the alias name from the mappings in step 9). This allows us to leverage the tags [COL_NAME] and [EXPRESSION] to retrieve the target and source column names respectively, as we did in step 8:

COLUMN TRANSFORMS (<%=odiRef.getColList(", "[COL_NAME] from LOBFILE
([EXPRESSION]) from (dat_dir) CLOB",",/n","","UD1")%>

Finally, we take advantage of the flags available in the QuickEdit view to explicitly mark the CLOBs and BLOBs columns as UD1 and UD2, so that the proper code from the KM can be applied to these columns specifically.

There's more...

We have over-simplified the KM to focus on the load of the LOBs, but the techniques used here can be expanded upon. By relying on other techniques as described in this book, you can avoid the manipulation of the aliases in the mappings, for instance. This could have been done by using the .replace() function available in Java.

Likewise, listing the CLOBs and BLOBs in the COLUMN TRANSFORMS section of the external table definition could have been handled with a for loop.

Several KMs have been written by the ODI community. A Google search on ODI and LOB returns too many results for us to list them all here, but they are worth a look if you are interested in this subject. A good starting point is http://java.net/projects/oracledi, where KMs are shared by the ODI developers community.

Oracle Data Integrator 11g Cookbook Over 60 field-tested recipes for successful data integration projects with Oracle Data Integrator with this book and ebook
Published: May 2013
eBook Price: $32.99
Book Price: $54.99
See more
Select your format and quantity:

Using ODI versioning

Versioning source code is a key element for a successful development lifecycle. All ODI objects can be exported as XML files if you want to store them in an external source control system, but one limitation of this approach is that the comparison of different versions of the same objects are not trivial in the XML form.

To make version comparisons easier, you can leverage versioning of objects directly within ODI. This recipe will show you how to take advantage of this feature.

Getting ready

To play with versioning, we will need some objects. You can use any objects of your choice, but for our step-by-step instructions, we will first create an interface based on the SRC_EMP and TRG_EMP tables .

To follow the steps described here, do the following:

  • If you have not done so earlier, create a model where you reverse engineer the SRC_EMP table from our source sample.

  • If you have not done so earlier, create a model where you reverse engineer the TRG_EMP table from our target sample.

  • Make sure that you have a project where the IKM SQL Control Append KM has been imported. If both source and target schemas are defined under the same data server in Topology, you will not need an LKM. If they are defined on separate data servers, you can use LKM SQL to Oracle, as we are dealing with very low volumes of data here.

How to do it...

  1. Create a new interface called Load Employees with SRC_EMP as a source and TRG_EMP as a target. All the mappings are straight mappings (no transforms). Use the IKM SQL Control Append in the Flow tab and set the FLOW_CONTROL option to False. Set the option TRUNCATE to True because there is most likely data left over from earlier recipes in your target table. Save the interface. Run the interface to make sure that it runs successfully.

  2. In the Overview tab of the interface, select the Versions panel on the left. Click on the green + sign to add a new version; keep the default version number (1.0.0.0) and add this Description: Straight mapping from SRC_EMP to TRG_EMP. Then click on OK to create the version. You should now have a version listed as follows:

  3. Now go back into the interface Mapping tab and replace the current mapping for HIREDATE with sysdate. In the Flow tab, set the option FLOW_CONTROL to True. Save, close, and re-open the interface. If you go to the Versions panel in the Overview tab, you can see that the object has been modified since a version was created:

  4. Now click on the green + sign to create a new version. Change the Version number to 2.0.0.0. In the Description field, enter this: Changed HIREDATE to sysdate. Click on OK to create the version.

  5. With several versions available, it is now possible to compare the multiple iterations of the object. You can perform a comparison by using the Versions browser available from the main menu, ODI/Version Browser…, or you can compare the different versions directly from this screen. If you press the Ctrl key, you can select both versions with your mouse cursor. Once both are highlighted, you can right-click on the selection and select Compare….

  6. Both versions of the object will be presented side by side, with a highlight on the differences between the objects:

  7. If you scroll down the description of the objects, you see more of the highlighted differences:

    To get more details on the changes from version to version, click on the + sign in front of the highlighted elements (we have removed some entries in the following image to highlight our focus):

    We can see here both the old and current version of the mapping that we have changed, along with the change in the option of our KM.

How it works...

When a user creates a version in ODI, a copy of that object is saved in the Master repository, along with the version number and the description associated to that version. When different versions of the same object are being compared, it is the objects from the Master repository that are compared. Versioning is not limited to interfaces, it is also possible to version individual objects, such as Packages and Load Plans, but also objects that are collections of objects such as Folders, Models, Model Folders, and Projects.

The version browser allows you to filter the list of objects by object types (Interfaces, folders, models, and so on) and to further limit your list by selecting the name of the objects that match that type.

There's more...

At any point in time, there is only one current version of any object in the Work repository. All previous versions are saved in the Master repository. Any version can be restored in the Work repository in order to replace the current one. The restore function is available from the Versions tab of the objects (Interfaces, folders, packages, and so on) as well as from the Versions browser.

Performing a Smart Export/Smart Import

Up until version 11.1.1.6.0 of ODI, exporting and importing objects presented a particular challenge: the management of dependencies. When importing an object, the developers had to make sure that all dependent objects were in place before the object itself was imported. For instance, before importing an interface, models and KMs referenced by that interface had to already be in place. But what if they had not been exported along with the interface? With version 11.1.1.6.0, the notion of Smart Export/Smart Import solves the dependency problem. In addition, it will give you the ability to merge the new imported objects with existing objects; you can always select whether you want to keep the existing objects or overwrite them with the imported ones, one object at a time.

Getting ready

We will reuse the Load Employees interface created in the previous recipe for illustration purposes. However, you can also follow this recipe by using another object of your choosing.

How to do it...

  1. In the Designer navigator, click on the Designer drop-down icon to display the options menu and select Export…, as depicted on the following screenshot:

    From the window that pops up, select Smart Export and click on OK.

  2. Drag-and-drop the interface Load Employees into the Selected objects panel:

  3. ODI will compute the dependencies and list all the objects that are needed if you want to re-import this interface later:

    We can see here that the KM and the models that we have used to build the interface are properly listed. Topology references are also present, assuming the user has enough privileges to access that information.

    You may have more than one project listed here if your models reference KMs (RKMs, CKMs, and JKMs) that are in these other projects. Best practice would be to use Global Knowledge Modules rather than having cross-project dependencies.

  4. Now that we have all the elements we need, select the option Export as a ZIP file at the top of the window and click on Export. After reading the Export Report, press Close.

  5. To import a file generated with Smart Export, go back to the Designer drop-down icon to display the options menu and select Import… where we selected Export… in Step 1 of this recipe. Select Smart Import and click on OK. A wizard will help you select the SmartExport.zip file created earlier. The response file would only be needed if we were replaying an earlier import, so we can leave this blank.

  6. The wizard will compare incoming objects with the content of the repository and will offer actions (merge, overwrite, create a copy, ignore, reuse).

    The lowest elements of the tree (like interface here) cannot be merged, you have to choose which interface version to use.

  7. Once you have reviewed the options for the import (we will keep the default here to Overwrite the existing objects with the ones that we are importing) you can click Next to see if there are any potential issues identified by the wizard, and if none are listed, you can click Finish to proceed with the import.

How it works...

The Smart Export and Smart Import are based on new SDK APIs available as of ODI 11.1.1.6.0. They validate dependencies before the export takes place and before the import takes place. If missing dependencies are detected, the import can still be done, but imported objects will be highlighted and missing objects will be listed so that developers can later import the missing objects to fix this inconsistency. This whole concept is extremely important, as it alleviates developers from the burden of keeping track of these dependencies.

At import time, when the object that is being imported already exists in the repository, the user has the ability to choose among the following options:

  • Overwrite: Overwrite an existing object when importing a new version of that object

  • Create Copy: Create a copy of the object that is being imported so that both the existing copy and the imported copy coexist after import

  • Reuse: Reuse the existing object and discard the imported object, but objects linked to the discarded object can still be imported

  • Ignore: Ignore the imported object and its dependencies

There's more...

The Smart Export and Smart Import features are not limited to the Designer navigator; they can be found in the Topology navigator as well.

The same features can also be leveraged from the ODI SDK for a programmatic approach to object exports and imports. Some customers have chosen this approach to automated source code management and code promotion.

Accessing an Excel spreadsheet

As much as we might want all data to reside in databases, a large amount of data still resides in other formats. In this recipe, we will look into how to read data from an Excel spreadsheet, as this can present some unique challenges.

Getting ready…

To connect to an Excel spreadsheet, ODI requires two components: the ODBC driver provided by Microsoft and the ODI ODBC/JDBC bridge that ships with ODI.

If you are using Excel on a Windows 64-bit platform, you will have to make sure that you have a 64-bit version of the ODBC driver. You can download the proper drivers from the Microsoft support website. The 64 bit ODBC driver for Office 2010 is available at the Microsoft download center at this URL: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=13255

Before getting started, we will need an Excel spreadsheet. To make things simple, you can create a spreadsheet with the following data:

EMPNO

ENAME

JOB

HIREDATE

DEPTNO

10

J Tanake

Intern

1/1/2013

10

11

I Shawnee

Intern

1/1/2013

10

12

W Welkert

Intern

1/1/2013

20

13

K Thulsberg

Intern

1/1/2013

20

Save the spreadsheet in the c:\temp directory and call it interns.xlsx.

How to do it...

  1. The first thing we have to do is to expose the data to through the ODBC driver. With the spreadsheet open in Excel, select all the data (including the header row) and name the area Interns (you can type that name in the top left corner).

    Save and close the spreadsheet.

  2. Open the Microsoft Data Sources (ODBC) tool:

    In the ODBC Data Source Administrator, you can add either a User DSN or a System DSN; the process is the same, but the User DSN will only be visible for the user currently logged-in; a system DSN will be available to all users. Click on Add… to create a new DSN and select the Microsoft Excel Driver option:

    Click on Finish and you will be prompted to name the connection and select the file; we type here Interns for the Data Source Name, and you have to click on Select Workbook to select the file c:\temp\interns.xlsx. When you click on OK, the new data source will be listed in the original window.

  3. In ODI, select the Topology navigator to add a new data server under the technology Microsoft Excel. Use the name XL_Interns. Neither username nor password are required. In the JDBC tab, select the Sun JDBC-ODBC driver sun.jdbc.odbc.JdbcOdbcDriver and update the JDBC URL with the name of the data source: jdbc:odbc:Interns. Under this data server, create a physical schema (you can keep the default name of XL_Interns_Default). Then, create a logical schema called XL_Interns to point to this physical schema.

  4. In the Designer navigator, create a model called EXCEL_Interns. Set the Technology to Microsoft Excel, and select the XL_Interns logical schema:

    Then click on the Selective Reverse-Engineering tab and select Selective Reverse- Engineering, New Datastores, and Objects to Reverse Engineer. You should see one table listed: Interns. This matches the named area that we created in the spreadsheet. Make sure that the table is selected and click on Reverse Engineer:

  5. You can now use your Excel spreadsheet as any other table in your interfaces. To confirm that the data is fully available, you can right-click on the table name under the newly created model in the object tree and select the data entry in the menu: you will see the data from the spreadsheet:

How it works...

The behavior is actually quite simple: the ODBC driver exposes all the named areas as tables. ODI uses a JDBC driver that in turn invokes the ODBC driver. The tables become visible to ODI, which can reverse engineer the metadata. Once again, the metadata is served up by the ODBC driver, which chooses the data types for the different columns.

There is an alternative syntax for the ODBC driver that combines the previous steps 2 and 3 in the sense that you can skip step 2, and enter all the necessary information when specifying the JDBC URL. Note that this syntax will vary depending on the version of Microsoft Excel that you are using.

For versions of Microsoft Excel older than 2007, you have to use the following syntax for the JDBC URL:

jdbc:odbc:Excel Files;DBQ=<file_path>

For versions of Microsoft Excel including 2007 and later, you have to use the following syntax for the JDBC URL:

jdbc:odbc:Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm,
.xlsb)};DBQ=<file_path>;DriverID=22;READONLY=true;

These syntaxes still leverage the Microsoft ODBC driver, and as such require the same naming of the tables within the spreadsheet.

You will also want to double check the data types returned by ODBC, as the driver can be picky; a number with an empty cell could be interpreted as a string for instance. Microsoft recommends to not mix text and numbers in the same column, or to set the format of the cells in the spreadsheet to force the format returned by the driver. More details are available on the Microsoft support website: http:// support.microsoft.com/kb/141284

There's more...

There are a number of situations that can be problematic with Excel spreadsheets. The first one is that you may not be running your ODI agent on a Windows machine. There are two possible solutions here:

  • Install a separate ODI agent on the Windows machine such that the ODBC driver, data source definition, and the file become local resources to that agent.

  • Instead of using an ODBC approach, you can try to connect directly with 3rd party JDBC drivers. JDBC drivers for Excel are available from http://xlsql.SourceForge.net ,http://odiexperts.com, and http://www.hxtt.com, among others. You will want to check ownership and possible restrictions around the use of these drivers with their respective owners.

The next challenge is the fact that we have to create a named area in the spreadsheet. This is a limitation of the ODBC driver, which can be annoying if your receive spreadsheets that do not have named areas. One thing to keep in mind is that the naming is only needed for the reverse engineering process. Once ODI knows of the structure of the spreadsheet, you can modify the definition of the ODI datastore (aka table) to point to the sheet itself. So if you kept the default name of Sheet1 in your spreadsheet, you can replace the name in the resource name of the data store with [SHEET1$] (brackets and uppercase are all required). Try it out and view the data: the result will be the same!

Summary

This article that addresses questions we've often had to answer, including is conceptions on how the product should be used, and little known secrets that we didn't want to leave aside.

Resources for Article :


Further resources on this subject:


Oracle Data Integrator 11g Cookbook Over 60 field-tested recipes for successful data integration projects with Oracle Data Integrator with this book and ebook
Published: May 2013
eBook Price: $32.99
Book Price: $54.99
See more
Select your format and quantity:

About the Author :


Christophe Dupupet

Christophe Dupupet is a Director in the Fusion Middleware Architects Team, where he leads the expertise on ODI. The team works closely with strategic customers that implement ODI, and helps define best practices on the product in terms of architecture, implementation, and operations.

Prior to Oracle, Christophe was part of the team that started the operations for Sunopsis in the US, where he lead the technical team (presales, support, and training). Sunopsis was acquired by Oracle in 2006.

Christophe holds an Operations Research degree from EISTI in France, a Masters Degree in Operations Research from Florida Tech, and a certificate in Management from Harvard University.

Denis Gray

Denis Gray is a Director of Product Management for Data Integration at Oracle. Denis has over 15 years of experience in the data-integration field. For the past seven years, Denis has been an integral part of Oracle Development Organization as a Product Manager within Fusion Middleware, delivering data integration solutions. Prior to this, Denis was a data integration consult for Hyperion Solutions (Oracle). Here, Denis worked at many of the largest Fortune 100 companies, building data warehouses and implementing business intelligence solutions. Denis has a Bachelor's Degree in Computer Science from the University of Missouri and currently resides in St. Louis, MO. Denis also co-authored the Packt book Getting Started with Oracle Data Integrator 11g: A Hands-on Tutorial.

Julien Testut

Julien Testut is a Product Manager in the Oracle Data Integration group focusing on Oracle Data Integrator. Julien has an extensive background in Data Integration and Data Quality solutions and is a co-author of Getting Started with Oracle Data Integrator 11g: A Hands-on Tutorial. Prior to joining Oracle, he was an Applications Engineer at Sunopsis, which was then acquired by Oracle. Julien holds a Masters degree in Software Engineering.

Peter C. Boyd-Bowman

Peter C. Boyd-Bowman is a Technical Director and Consultant with the Oracle Corporation. He has over 30 years of software engineering and database management experience, including 12 years of focused interest in data warehousing and business intelligence. Capitalizing on his extensive background in Oracle database technologies dating back to 1985, he has spent recent years specializing in data migration. After many successful project implementations using Oracle Warehouse Builder, and shortly after Oracle's acquisition of the Sunopsis Corporation, he switched his area of focus over to Oracle's flagship ETL product: Oracle Data Integrator. Peter holds a BS degree in Industrial Management and Computer Science from Purdue University and currently resides in North Carolina.

Peter is a co-author of the book Getting Started with Oracle Data Integrator 11g: A Hands-on Tutorial.

Books From Packt


Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial

OCA Oracle Database 11g: SQL Fundamentals I: A Real World Certification Guide ( 1ZO-051 )
OCA Oracle Database 11g: SQL Fundamentals I: A Real World Certification Guide ( 1ZO-051 )

Oracle 11g R1/R2 Real Application Clusters Essentials
Oracle 11g R1/R2 Real Application Clusters Essentials

Oracle Database 11gR2 Performance Tuning Cookbook
Oracle Database 11gR2 Performance Tuning Cookbook

Oracle Database 11g – Underground Advice for Database Administrators
Oracle Database 11g – Underground Advice for Database Administrators

Oracle Warehouse Builder 11g R2: Getting Started 2011
Oracle Warehouse Builder 11g R2: Getting Started 2011

Mastering Oracle Scheduler in Oracle 11g Databases
Mastering Oracle Scheduler in Oracle 11g Databases

Oracle APEX Best Practices
Oracle APEX Best Practices


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
F
g
Q
N
7
s
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software