This chapter describes how to create a model, how to reverse-engineer
this model to populate it with datastores and how to create manually
datastores of a model. This chapter also explains how to use
partitioning and check the quality of the data in a model.
Models as well as all their components are based on the relational paradigm (table, columns, keys, etc.). Models in Data Integrator only contain Metadata, that is the description of the data structures. They do not contain a copy of the actual data.
Models can be organized into model folders and the datastores of a model can be organized into sub-models. Section 18.2, "Organizing Models with Folders" describes how to create and organize model folders and sub-models.
A datastore describes data in a tabular structure. Datastores are composed of columns.
As datastores are based on the relational paradigm, it is also possible to associate the following elements to a datastore:
This framework allows to perform:
Oracle Data Integrator is able to reverse-engineer models containing datastore shortcuts. For more information, see Chapter 17, "Working with Shortcuts".
Journalizing can be applied to models, sub-models or datastores based on certain type of technologies.
For information about setting up Changed Data Capture, see Chapter 7, "Working with Changed Data Capture".
To create a Model:
To perform a Standard Reverse- Engineering:
For example, for the Oracle technology, the RKM Oracle accesses the database dictionary tables to retrieve the definition of tables, columns, keys, etc, that are created in the model.
To perform a Customized Reverse-Engineering using a RKM:
Filters are not constraints but are defined similarly to Conditions. A Filter is not used to enforce a data quality rule on a datastore, but is used to automatically filter this datastore when using it as a source.
To edit a datastore's data:
It is possible to edit a datastore's data if the connectivity used and the data server user's privileges allow it, and if the datastore structure enables to identify each row of the datastore (PK, etc.).
The partitioning methods supported depend on the technology of the datastore. For example, for the Oracle technology the following partitioning methods are supported: Range, Hash, List.
Once defined on a datastore, partitions can be selected when this datastore is used as a source or a target of an interface. Refer to Chapter 11, "Working with Integration Interfaces" for information.
If using the common format designer, you can also create partitions when performing the Generate DDL operation.
To define manually partitions and sub-partitions for a datastore:
Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, etc. In addition, such incorrect data may propagate via integration flows.
There are two different main types of controls: Static Control and Flow Control. We will examine the differences between the two.
With Oracle Data Integrator, you can refine the validation of your data by defining additional constraints, without implementing them directly in your servers. This procedure is called Static Control since it allows you to perform checks directly on existing - or static - data. Note that the active database constraints (these are those that have Defined in the Database and Active selected on the Controls tab) need no additional control from Oracle Data Integrator since they are already controlled by the database.
The result of a synchronous control is not persistent. This type of control is used to quickly evaluate the validity of a constraint definition.
5.1 Introduction to Models
A Model is the description of a set of datastores. It corresponds to a group of tabular data structures stored in a data server. A model is based on a Logical Schema defined in the topology. In a given Context, this Logical Schema is mapped to a Physical Schema. The Data Schema of this Physical Schema contains physical data structure: tables, files, JMS messages, elements from an XML file, that are represented as datastores.Models as well as all their components are based on the relational paradigm (table, columns, keys, etc.). Models in Data Integrator only contain Metadata, that is the description of the data structures. They do not contain a copy of the actual data.
Note:
Frequently used technologies have their reverse and model creation methods detailed in the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator.5.1.1 Datastores
A datastore represents a data structure. It can be a table, a flat file, a message queue or any other data structure accessible by Oracle Data Integrator.A datastore describes data in a tabular structure. Datastores are composed of columns.
As datastores are based on the relational paradigm, it is also possible to associate the following elements to a datastore:
-
Keys
A Key is a set of columns with a specific role in the relational paradigm. Primary and Alternate Keys identify each record uniquely. Non-Unique Indexes enable optimized record access.
-
References
A Reference is a functional link between two datastores. It corresponds to a Foreign Key in a relational model. For example: The INVOICE datastore references the CUSTOMER datastore through the customer number.
-
Conditions and Filters
Conditions and Filters are a WHERE-type SQL expressions attached to a datastore. They are used to validate or filter the data in this datastore.
5.1.2 Data Integrity
A model contains constraints such as Keys, References or Conditions, but also non-null flags on columns. Oracle Data Integrator includes a data integrity framework for ensuring the quality of a data model.This framework allows to perform:
-
Static Checks to verify the integrity of
the data contained in a data model. This operation is performed to
assess the quality of the data in a model when constraints do not
physically exist in the data server but are defined in Data Integrator
only.
-
Flow Check to verify the integrity of a
data flow before it is integrated into a given datastore. The data flow
is checked against the constraints defined in Oracle Data Integrator for
the datastore that is the target of the data flow.
5.1.3 Reverse-engineering
A new model is created with no datastores. Reverse-engineering is the process that populates the model in Oracle Data Integrator by retrieving metadata from the data server containing the data structures. There are two different types of reverse-engineering:-
Standard reverse-engineering uses standard
JDBC driver features to retrieve the metadata. Note that unique keys are
not reverse-engineered when using a standard reverse-engineering.
-
Customized reverse-engineering uses a
technology-specific Reverse Knowledge Module (RKM) to retrieve the
metadata, using a method specific to the given technology. This method
is recommended if a technology specific RKM exists because it usually
retrieves more information than the Standard reverse-engineering method.
See the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator for a list of available RKMs.
Oracle Data Integrator is able to reverse-engineer models containing datastore shortcuts. For more information, see Chapter 17, "Working with Shortcuts".
5.1.4 Changed Data Capture
Change Data Capture (CDC), also referred to as Journalizing, allows to trap changes occurring on the data. CDC is used in Oracle Data Integrator to eliminate the transfer of unchanged data. This feature can be used for example for data synchronization and replication.Journalizing can be applied to models, sub-models or datastores based on certain type of technologies.
For information about setting up Changed Data Capture, see Chapter 7, "Working with Changed Data Capture".
5.2 Creating and Reverse-Engineering a Model
Now that the key components of an ODI model have been described, an overview is provided on how to create and reverse-engineer a model.5.2.1 Creating a Model
A Model is a set of datastores corresponding to data structures contained in a Physical Schema.To create a Model:
-
In Designer Navigator expand the Models panel.
-
Right-click then select New Model.
-
Fill in the following fields in the Definition tab:
-
Name: Name of the model used in the user interface.
-
Technology: Select the model's technology.
-
Logical Schema: Select the Logical Schema on which your model will be based.
-
Name: Name of the model used in the user interface.
-
Go to the Reverse tab, and select a Context which will be used for the model's reverse-engineering.
Note that if there is only one context that maps the logical schema, this context will be set automatically.
-
Select Save from the File main menu.
5.2.2 Reverse-engineering a Model
To automatically populate datastores into the model you need to perform a reverse-engineering for this model.
Standard Reverse-Engineering
A Standard Reverse-Engineering uses the capacities of the JDBC driver
used to connect the data server to retrieve the model metadata.To perform a Standard Reverse- Engineering:
-
In the Reverse tab of your Model:
-
Select Standard.
-
Select the Context used for the reverse-engineering
-
Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
-
Enter in the Mask field the mask of tables
to reverse engineer. The mask selects the objects to reverse. This mask
uses the SQL LIKE syntax. The percent (%) symbol means zero or more
characters, and the underscore (_) symbol means one character.
-
Optionally, you can specify the characters to remove for the table alias.
These are the characters to delete in order to derive the alias. Note
that if the datastores already exist, the characters specified here will
not be removed from the table alias. Updating a datastore is not
applied to the table alias.
-
Select Standard.
-
In the Selective Reverse tab select Selective Reverse, New Datastores, Existing Datastores and Objects to Reverse.
-
A list of datastores to be reverse-engineered appears. Leave those you wish to reverse-engineer checked.
-
Select Save from the File main menu.
-
Click Reverse Engineer in the Model toolbar menu.
-
Oracle Data Integrator launches a reverse-engineering process for the
selected datastores. A progress bar indicates the progress of the
reverse-engineering process.
Customized Reverse-Engineering
A Customized Reverse-Engineering uses a Reverse-engineering Knowledge
Module (RKM), to retrieve metadata for a specific type of technology
and create the corresponding datastore definition in the data model.For example, for the Oracle technology, the RKM Oracle accesses the database dictionary tables to retrieve the definition of tables, columns, keys, etc, that are created in the model.
Note:
The RKM must be available as a global RKM or imported into the project. Refer to Chapter 9, "Creating an Integration Project" for more information on KM import.-
In the Reverse tab of your Model:
-
Select Customized.
-
Select the Context used for the reverse-engineering
-
Select the Types of objects to reverse-engineer. Only object of these types will be taken into account by the reverse-engineering process.
-
Enter in the Mask the mask of tables to reverse engineer.
-
Select the KM that you want to use for performing the reverse-engineering process. This KM is typically called
RKM <technology>.<name of the project>
.
-
Optionally, you can specify the characters to remove for the table alias.
These are the characters to delete in order to derive the alias. Note
that if the datastores already exist, the characters specified here will
not be removed from the table alias. Updating a datastore is not
applied to the table alias.
-
Select Customized.
-
Click Reverse Engineer in the Model toolbar menu, then Yes to validate the changes.
-
Click OK.
-
The Session Started Window appears.
-
Click OK.
5.3 Creating and Reverse-Engineering a Datastore
Although the recommended method for creating datastores in a model is reverse-engineering, it is possible to manually define datastores in a blank model. It is the recommended path for creating flat file datastores.5.3.1 Creating a Datastore
To create a datastore:-
From the Models tree in Designer Navigator, select a Model or a Sub-Model.
-
Right-click and select New Datastore.
-
In the Definition tab, fill in the following fields:
-
Name of the Datastore: This is the name that appears in the trees and that is used to reference the datastore from a project
-
Resource Name: Name of the object in the
form recognized by the data server which stores it. This may be a table
name, a file name, the name of a JMS Queue, etc.
-
Alias: This is a default alias used to prefix this datastore's columns names in expressions.
-
Name of the Datastore: This is the name that appears in the trees and that is used to reference the datastore from a project
-
If the datastore represents a flat file (delimited or fixed), in the File tab, fill in the following fields:
-
File Format: Select the type of your flat file, fixed or delimited.
-
Header: Number of header lines for the flat file.
-
Record Separator and Field Separator define the characters used to separate records (lines) in the file, and fields within one record.
Record Separator: One or several characters separating lines (or records) in the file:
-
MS-DOS: DOS carriage return
-
Unix: UNIX carriage return
-
Other: Free text you can input as characters or hexadecimal codes
-
Tabulation
-
Space
-
Other: Free text you can input as characters or hexadecimal code
-
-
File Format: Select the type of your flat file, fixed or delimited.
-
Select Save from the File main menu.
5.3.2 Reverse-Engineering File Datastores
Oracle Data Integrator provides specific methods for reverse-engineering flat files. The methods for reversing flat files are described below.5.3.2.1 Reverse-Engineering Fixed Files
Fixed files can be reversed engineered using a wizard into which the boundaries of the fixed columns and their parameters can be defined.-
Go to the Columns tab the file datastore that has a fixed format.
-
Click the Reverse button. A window opens displaying the first records of your file.
-
Click on the ruler (above the file contents) to create markers
delimiting the columns. Right-click in the ruler to delete a marker.
-
Columns are created with pre-generated names (C1, C2, and so on). You
can edit the column name by clicking in the column header line (below
the ruler).
-
In the properties panel (on the right), you can edit the parameters of the selected column.
-
You must set at least the Column Name, Datatype and Length for each column. Note that column names of File datastores cannot contain spaces.
-
Click OK when the columns definition is complete to close the wizard.
-
Select Save from the File main menu.
5.3.2.2 Reverse-Engineering Delimited Files
Delimited files can be reversed engineered using a a built-in JDBC which analyzes the file to detect the columns and reads the column names from the file header.-
Go to the Columns tab the file datastore that has a delimited format.
-
Click the Reverse button.
-
Oracle Data Integrator creates the list of columns according to your
file content. The column type and length are set to default values.
Column names are pre-generated names (C1, C2, and so on) or names taken
from the first Header line declared for this file.
-
Review and if needed modify the Column Name, Datatype and Length for each column. Note that column names of File datastores cannot contain spaces.
-
Select Save from the File main menu.
5.3.2.3 Reverse-Engineering COBOL Files
Fixed COBOL files structures are frequently described in Copybook files. Oracle Data Integrator can reverse-engineer the Copybook file structure into a datastore structure.-
Go to the Columns tab the file datastore that has a fixed format.
-
Click the Reverse COBOL Copybook button.
-
Fill in the following fields:
-
File: Location of the Copybook file.
-
Character Set: Copybook file character set.
-
Description format (EBCDIC or ASCII): Copybook file format
-
Data format (EBCDIC or ASCII): Data file format.
-
-
Click OK. The columns described in the Copybook are reverse-engineered and appear in the column list.
-
Select Save from the File main menu.
5.3.3 Adding and Deleting Datastore Columns
To add columns to a datastore:-
In the Columns tab of the datastore, click Add Column in the toolbar menu.
-
An empty line appears. Fill in the information about the new column. You should at least fill in the Name, Datatype and Length fields.
-
Repeat steps 1 and 2 for each column you want to add to the datastore.
-
Select Save from the File main menu.
-
In the Columns tab of the datastore, select the column to delete.
-
Click the Delete Column button. The column disappears from the list.
5.3.4 Adding and Deleting Constraints and Filters
Oracle Data Integrator manages constraints on data model including Keys, References, Conditions and Mandatory Columns. It includes a data integrity framework for ensuring the quality of a data model based on these constraints.Filters are not constraints but are defined similarly to Conditions. A Filter is not used to enforce a data quality rule on a datastore, but is used to automatically filter this datastore when using it as a source.
5.3.4.1 Keys
To create a key for a datastore:-
In the Designer Navigator, expand in the Model tree the model and then the datastore into which you want to add the key.
-
Select the Constraints node, right-click and select New Key.
-
Enter the Name for the constraint, and then select the Key or Index Type.
Primary Keys and Alternate Keys can be checked and can act as an update
key in an interface. Non-Unique Index are used mainly for performance
reasons.
-
In the Columns tab, select the list of columns that belong to this key.
-
In the Control tab, select whether this constraint should be checked by default in a Static or Flow check.
-
By clicking the Check button, you can retrieve the number of records that do not respect this constraint.
-
Select Save from the File main menu.
5.3.4.2 References
To create a reference between two datastores:-
In the Designer Navigator, expand in the Model tree the model and then one of the datastores into which you want to add the reference.
-
Select the Constraints node, right-click and select New Reference.
-
Enter the Name for the constraint, and then select the Type for the reference. In a User Reference the two datastores are linked based on column equality. In a Complex User Reference any expression can be used to link the two datastores. A Database Reference is a reference based on column equality that has been reverse-engineered from a database engine.
-
If you want to reference a datastore that exists in a model, select the Model and the Table that you want to link to the current datastore.
-
If you want to link a table that does not exist in a model, leave the Model and Table fields undefined, and set the Catalog, Schema and Table names to identify your datastore.
-
If you are defining a User or Database reference, in the Columns tab, define the matching columns from the two linked datastores.
-
If you are defining a Complex User reference, enter in the Expression tab the expression that relates columns from the two linked datastores.
-
In the Control tab, select whether this constraint should be checked by default in a Static or Flow check.
-
By clicking the Check button, you can retrieve the number of records that respect or do not respect this constraint.
-
Select Save from the File main menu.
5.3.4.3 Conditions
To create a condition for a datastore:-
In the Designer Navigator, expand in the Model tree the model and then one of the datastores into which you want to add the condition.
-
Select the Constraints node, right-click and select New Condition.
-
Enter the Name for the constraint, and then select the Type for the condition. An Oracle Data Integrator Condition is a condition that exists only in the model and does not exist in the database. A Database Condition is a condition that is defined in the database and has been reverse-engineered.
-
In the Where field enter the expression
that implements the condition. This expression is a SQL WHERE expression
that valid records should respect.
-
Type in the Message field the error message for this constraint.
-
In the Control tab, select whether this constraint should be checked by default in a Static or Flow check.
-
By clicking the Check button, you can retrieve the number of records that do not respect this constraint.
-
Select Save from the File main menu.
5.3.4.4 Mandatory Columns
To define mandatory columns for a datastore:-
In the Designer Navigator, expand in the Model tree the model containing the datastores.
-
Double-click the datastore containing the column that must be set as mandatory. The Datastore Editor appears.
-
In the Columns tab, check the Not Null field for each column that is mandatory.
-
Select Save from the File main menu.
5.3.4.5 Filter
To add a filter to a datastore:-
In the Designer Navigator, expand in the Model tree the model and then one of the datastores into which you want to add the filter.
-
Select the Filter node, right-click and select New Condition.
-
Enter the Name for the filter.
-
In the Where field enter the expression that implements the filter. This expression is a SQL WHERE expression used to filter source records.
-
In the Control tab, check Filter Active for Static Control if you want data from this table to be filtered prior to checking it a static control.
-
Select Save from the File main menu.
5.4 Editing and Viewing a Datastore's Data
To view a datastore's data:-
Select the datastore from the model in the Designer Navigator.
-
Right-click and select View Data.
To edit a datastore's data:
-
Select the datastore from the model in the Designer Navigator.
-
Right-click and select Data...
It is possible to edit a datastore's data if the connectivity used and the data server user's privileges allow it, and if the datastore structure enables to identify each row of the datastore (PK, etc.).
Note:
The data displayed is the data stored in the physical schema
corresponding to the model's logical schema, in the current working
context.5.5 Using Partitioning
Oracle Data Integrator is able to use database-defined partitions when processing data in partitioned tables used as source or targets of integration interfaces. These partitions are created in the datastore corresponding to the table, either through the reverse-engineering process or manually. For example with the Oracle technology, partitions are reverse-engineered using the RKM Oracle.The partitioning methods supported depend on the technology of the datastore. For example, for the Oracle technology the following partitioning methods are supported: Range, Hash, List.
Once defined on a datastore, partitions can be selected when this datastore is used as a source or a target of an interface. Refer to Chapter 11, "Working with Integration Interfaces" for information.
If using the common format designer, you can also create partitions when performing the Generate DDL operation.
5.5.1 Defining Manually Partitions and Sub-Partitions of Model Datastores
Partition information can be reverse-engineered along with the datastore structures or defined manually.
Note:
Standard reverse-engineering does not support the revers-engineering of
partitions. To reverse-engineer partitions and sub-partitions, you have
to use customized reverse-engineering.-
In the Models accordion, double-click the datastore for which you
want to define the partition or sub-partition. The Datastore Editor
opens.
-
In the Partitions tab, enter the following details to define the partition and sub-partition:
-
Partition by
Select the partitioning method. This list displays the partitioning methods supported by the technology on which the model relies.
-
Sub-Partition by
If you want to define sub-partitions in addition to partitions, select the sub-partitioning method. This list displays the partitioning methods supported by the technology on which the model relies.
-
Partition by
-
Click Add Partition.
-
In the Name field, enter a name for the partition, for example:
FY08
.
-
In the Description field, enter a description for the partition, for example:
Operations for Fiscal Year 08
.
-
If you want to add:
-
From the File menu, select Save.
5.6 Checking Data Quality in a Model
Data Quality control is essential in ensuring the overall consistency of the data in your information system's applications.Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, etc. In addition, such incorrect data may propagate via integration flows.
5.6.1 Introduction to Data Integrity
Oracle Data Integrator provides a working environment to detect these constraint violation and store them for recycling or reporting purposes.There are two different main types of controls: Static Control and Flow Control. We will examine the differences between the two.
Static Control
Static Control implies the existence of rules that are used to verify
the integrity of your application data. Some of these rules (referred
to as constraints) may already be implemented in your data servers
(using primary keys, reference constraints, etc.)With Oracle Data Integrator, you can refine the validation of your data by defining additional constraints, without implementing them directly in your servers. This procedure is called Static Control since it allows you to perform checks directly on existing - or static - data. Note that the active database constraints (these are those that have Defined in the Database and Active selected on the Controls tab) need no additional control from Oracle Data Integrator since they are already controlled by the database.
Flow Control
The information systems targeted by transformation and integration processes often implement their own declarative rules. The Flow Control
function is used to verify an application's incoming data according to
these constraints before loading the data into these targets. Setting up
flow control is detailed in to Chapter 11, "Working with Integration Interfaces".5.6.2 Checking a Constraint
While creating a constraint in Oracle Data Integrator, it is possible to retrieve the number of lines violating this constraint. This action, referred as Synchronous Control is performed from the Control tab of the given constraint Editor by clicking the Check button.The result of a synchronous control is not persistent. This type of control is used to quickly evaluate the validity of a constraint definition.
5.6.3 Perform a Static Check on a Model, Sub-Model or Datastore
To perform a Static Check on a Model, Sub-Model or Datastore:-
In the Models tree in the Designer Navigator, select the model that you want to check.
-
Double-click this model to edit it.
-
In the Control tab of the model Editor, select the Check Knowledge Module (CKM) used in the static check.
-
From the File menu, select Save All.
-
Right-click the model, sub-model or datastore that you want to check in the Model tree in the Designer Navigator and select Control > Check.
-
In the Execution window, select the execution parameters:
-
Select the Context into which the data must be checked.
-
Select the Logical Agent that will run the check tasks.
-
Check the Delete Errors from the Checked Tables option if you want rows detected as erroneous to be removed from the checked tables.
-
Select the Context into which the data must be checked.
-
Click OK.
-
The Session Started Window appears.
-
Click OK.
5.6.4 Reviewing Erroneous Records
To view a datastore's errors:-
Select the datastore from the model in the Designer Navigator.
-
Right-click and select Control > Errors....
No comments:
Post a Comment
Note: only a member of this blog may post a comment.