Karim Khalifa: 2012

Thursday, July 12, 2012

Matching/Joining Data Approaches in IBM Datastage

Joining/Matching data between different data source is considered one of the most performance challenging task while implementing an ETL using IBM DataStage, The purpose of this post is to provide guidance for choosing the suitable joining/matching approach according to the source data size

Assume we have the following two data source tables that are related to each other

In order to consider the suitable joining matching approach we might have the following cases:

1) “The data Size of child table is >= Parent table” And “loading Parent table could fit the memory”

Solution approach: Use lookup stage

2) “The data Size of child table is >= Parent table” And “loading Parent table could not fit the memory”

Solution approach: Use Join stage

3) “The data Size of child table is <= Parent table” And “Child table is relatively Large compared with parent table” E.g. child table is 50 million records and parent table is 30 million records

Solution approach: Use Join stage

4) “The data Size of child table is <= Parent table” And “Child table is relatively small compared with parent table” E.g. child table is 1,000 records and parent table is 30 million records

Solution approach: Use database inner join

SELECT *

FROM CHILD_TABLE INNER JOIN PARENT_TABLE
ON CHILD_TABLE.COLUMN_NAME=PARENT_TABLE.COLUMN_NAME

Enjoy :)

Adding Automatic filters to Excel output in Cognos reports

One of the challenging report requirement is to have automatic filters in Cognos report for an excel output report ,The purpose of this post is to illustrate the trick of adding automatic filters in an Excel output report

The following are the required steps

1) Build a simple list report

2) Add two HTML Item …one before the list report component and the other after list report component

3) Inside the first HTML item add the following Html Code

<table>
<td x:autofilterrange="$A$1:$D$1" /td>

4) Inside the second HTML item add the following HTML code

</table>

Enjoy :)

Sunday, June 10, 2012

Setting DataStage Job Parameters through ETL configuration files

Setting the DataStage Parameters through easily accessible/manageable text files has been nontrivial task as a result the purpose of this post is to provide a simple guidance to set the Datastage project variables in an easy and simple way

Setting the user define variable and environment variable Text file has many advantages

During the deployment process,Manually copy/past the user defined variables between the infosphere Datastage administrator accross different servers is a time consuming task
using an accessible/manageable text files rather than infosphere Datastage administrator could be an excellent work around (infosphere Datastage administrator requires root user and on the other hand accessing the DSParms file requires only DS administrator)

Datastage Environment and User Defined variables file info

Scope: Project
File name: DSParams
Location (Unix server): #DataStageInstalledDirectory/InformationServer/Server/Projects/Project_Name

DSParams file structure :the DSParms file is composed of many sections but we are concerned only with 2 sections

1. [EnvVarDefns] Section: refers to the variables specification

Format is: <EnvVarName>\<Category>\<JobType>\<Type>[+]\<Default>\<SetAction>\<Scope>\<PromptText>\<HelpText>

e.g. Parameter_name\User Defined\-1\String\\0\Project\This is my Parm\

Item	Description
EnvVarName	is the name of the enviroment variable that will be set in OSH
Category	Is the category name where the environment variable will appear. In format <cat1>/<subcat1>/<subcat2>/... etc
JobType	is the job type number (0 = Server, 3 = Parallel, -1 = all)
Type	is one of: Number, String, FilePath, DirPath, List, Boolean, UserDef If List, then format of Type is futher divided: ...<EnvVarName>\List/<Item1>/<ItemDisplay1>/<Item2>/<ItemDisplay2>\<Scope>... (ItemDisplay values must be left blank here, and added as localised strings in envvar.cls If Boolean, then the value the envvar is set to should be irrelevant If proceeded by a '+' character, then the value set here is appended to any existing value already set in the shell, separated by a ':' character
Default	The default value
SetAction	What action should be taken when setting the environment variable at job run time: 0 = Always set if the environment variable has been overriden. 1 = Only set the environment variable if its value is different to its default 2 = Explicitly unset the environment variable if the value is set to the default, otherwise same as 1 3 = Always set the environment variable 4 = Osh Boolean. Set if true, no action if false
Scope	is one of: Project, Design, RunTime
PromptText	is the text displayed to prompt the user for the env var. If "" then <EnvVarName> will be used
HelpText	is a longer description of the env var.

2. [EnvVarValues] Section: refers to set the value of the variables
e.g. "Parmeter_name"\1\" Parmeter_value"