Services To Generate ETL Load Code

how would you like to take: a) a source model, b) target model, c) cross-reference (source-to-target) spread-sheet, push it in to a service, and automatically generate target etl loading code?   i’ve got a service i’m working on that does just that.  i hope it’s available soon.  for now, here’s some preliminary information & an example that generates owb / omb script code from the inputs mentioned above.

for now, i generate only oracle warehouse builder code.  shortly (within the next few weeks) i’ll be generating informatica v9 loading code.

required inputs:

  • source data model ddl
  • target data model ddl
  • xlsx or xls (excel spreadsheet) source to target cross-reference

right now, i generate code to load data from a staging area to a hub and a link target, this afternoon i’ll be generating code to load and end-date the satellites.

example source data model ddl:

create table "stage"."ahltat_diagnosis"
(
   "doc_ref"             number   constraint  "sys_c0011445" not null ,
   "createtime"          date  null ,
   "createuserncid"      number(20)  null ,
   "modifyuserncid"      number(20)  null ,
   "modifytime"          date  null ,
   "priority"            varchar2(50 byte)  null ,
   "diagnosescomment"    clob  null ,
   "extendedicd9"        varchar2(50 byte)  null ,
   "sensitivity"         varchar2(50 byte)  null ,
   "updatedon"           date  null ,
   "updatedby"           number(20)  null ,
   "tmip_status"         number  null ,
   "updateflag"          varchar2(1 byte)  null ,
   "cdr_dataid"          number(20)  null ,
   "createdon"           date  null ,
   "createdby"           number(20)  null ,
   "prefix"              varchar2(5 byte)  null ,
   "clinicianncid"       number(20)  null ,
   "encounternumber"     number(20)  null ,
   "unit_number"         number(10)  null ,
   "medcinid"            number(20)  null ,
   "dataid"              number  null ,
   "diagnosisncid"       number(20)  null ,
   "facilityncid"        number(20)  null ,
   "billingcodencid"     number(20)  null ,
   "icd9ncid"            number  null ,
   "sourcencid"          number(20)  null ,
   "newfollowup"         varchar2(50 byte)  null ,
   "pointofcarefacilityncid"  number(20)  null ,
   "statusncid"          number(20)  null ,
   "chronicityncid"      number(20)  null ,
   "noteddate"           date  null ,
   "onsetdate"           date  null ,
   "rs_forcenull"        varchar2(15 byte)  null
);
 
comment on table "stage"."ahltat_diagnosis" is 'records diagnosis information provided by the clinician.';
comment on column "stage"."ahltat_diagnosis"."createtime" is 'createtime is the time created the diagnoses.';
comment on column "stage"."ahltat_diagnosis"."createuserncid" is 'createuserncid is the create user numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."modifyuserncid" is 'modifyuserncid is the modify user numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."modifytime" is 'time modified.';
comment on column "stage"."ahltat_diagnosis"."priority" is 'indicates the order, whether the primary diagnosis or secondary, etc., diagnosis.';
comment on column "stage"."ahltat_diagnosis"."diagnosescomment" is 'text field for amplification of the diagnosis.';
comment on column "stage"."ahltat_diagnosis"."sensitivity" is 'sensitivity indicates the sensitivity of the diagnoses.';
comment on column "stage"."ahltat_diagnosis"."updatedon" is 'updatedon is the date who updated.';
comment on column "stage"."ahltat_diagnosis"."updatedby" is 'updatedby is who updated.';
comment on column "stage"."ahltat_diagnosis"."tmip_status" is 'the tmip_status is for data manager transfer from lddb to tmip.';
comment on column "stage"."ahltat_diagnosis"."updateflag" is 'updateflag is the flag to indicate if update.';
comment on column "stage"."ahltat_diagnosis"."cdr_dataid" is 'crd_dataid is the crd data identification number.';
comment on column "stage"."ahltat_diagnosis"."createdon" is 'createdon is the created date.';
comment on column "stage"."ahltat_diagnosis"."createdby" is 'createdby is who created.';
comment on column "stage"."ahltat_diagnosis"."clinicianncid" is 'clinicncid is the clinic numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."encounternumber" is 'sequential number automatically assigned to an encounter for a selected patient.';
comment on column "stage"."ahltat_diagnosis"."unit_number" is 'unit_number is number for the unit.';
comment on column "stage"."ahltat_diagnosis"."medcinid" is 'medcinid is the mediine identification number.';
comment on column "stage"."ahltat_diagnosis"."dataid" is 'dataid is the unique data id for the patient.';
comment on column "stage"."ahltat_diagnosis"."diagnosisncid" is 'diagnosysncid is the diagnosys numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."facilityncid" is 'facilityncid is the facility unique numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."billingcodencid" is 'billingcodencid is the billiing code numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."icd9ncid" is 'icd9ncid is the icd9 numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."sourcencid" is 'sourcencid is the source numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."newfollowup" is 'newfollowup is the follow up.';
comment on column "stage"."ahltat_diagnosis"."pointofcarefacilityncid" is 'point of care facility unique id. it is unique for the machine/server on which the clinical note was created.';
comment on column "stage"."ahltat_diagnosis"."statusncid" is 'statusncid is the status numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."chronicityncid" is 'chronicityncid is the chronicity numerical concept identification number.';
comment on column "stage"."ahltat_diagnosis"."noteddate" is 'notedate is the date noted.';
comment on column "stage"."ahltat_diagnosis"."onsetdate" is 'onsetdate the date on set.';

target ddl:

create table vault.hub_facility (
   fac_sqn number (12) not null,
   fac_ldts date not null,
   fac_rsrc varchar2 (20 byte) not null,
   fac_id varchar2 (100 byte) not null
) ;
comment on table vault.hub_facility is '<business_name>facilities</business_name>';
comment on column vault.hub_facility.fac_id is 'facilityncid is the facility unique numerical concept identification number.';

cross-reference example:

source table source column group target table target column
ahltat_diagnosis doc_ref 1 sat_ahltat_diagnosis doc_ref
  dataid 1 hub_diagnosis diagnosis_dataid
  facilityncid 1 hub_facility fac_id
  diagnosisncid 1 sat_ahltat_diagnosis diagnosisncid
  encounternumber 1 hub_event evnt_id
  clinicianncid 1 hub_clinician clinician_ncid
  unit_number 1 hub_unit unit_id
  medcinid 1 hub_medcin medcin_id
  createtime 1 sat_ahltat_diagnosis createtime
  createuserncid 1 sat_ahltat_diagnosis createuserncid

omb source script:

ombcreate table 'ahltat_diagnosis' 
  set properties (description) values ('records diagnosis information provided by the clinician.') 
  add column 'doc_ref' 
  set properties (datatype,not_null,description) values ('number',0,'') 
  add column 'createtime' 
  set properties (datatype,not_null,description) values ('date',0,'createtime is the time created the diagnoses.') 
  add column 'createuserncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'createuserncid is the create user numerical concept identification number.') 
  add column 'modifyuserncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'modifyuserncid is the modify user numerical concept identification number.') 
  add column 'modifytime' 
  set properties (datatype,not_null,description) values ('date',0,'time modified.') 
  add column 'priority' 
  set properties (datatype,length,not_null,description) values ('varchar2',50,0,'indicates the order, whether the primary diagnosis or secondary, etc., diagnosis.') 
  add column 'diagnosescomment' 
  set properties (datatype,not_null,description) values ('clob',0,'text field for amplification of the diagnosis.') 
  add column 'extendedicd9' 
  set properties (datatype,length,not_null,description) values ('varchar2',50,0,'') 
  add column 'sensitivity' 
  set properties (datatype,length,not_null,description) values ('varchar2',50,0,'sensitivity indicates the sensitivity of the diagnoses.') 
  add column 'updatedon' 
  set properties (datatype,not_null,description) values ('date',0,'updatedon is the date who updated.') 
  add column 'updatedby' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'updatedby is who updated.') 
  add column 'tmip_status' 
  set properties (datatype,not_null,description) values ('number',0,'the tmip_status is for data manager transfer from lddb to tmip.') 
  add column 'updateflag' 
  set properties (datatype,length,not_null,description) values ('varchar2',1,0,'updateflag is the flag to indicate if update.') 
  add column 'cdr_dataid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'crd_dataid is the crd data identification number.') 
  add column 'createdon' 
  set properties (datatype,not_null,description) values ('date',0,'createdon is the created date.') 
  add column 'createdby' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'createdby is who created.') 
  add column 'prefix' 
  set properties (datatype,length,not_null,description) values ('varchar2',5,0,'') 
  add column 'clinicianncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'clinicncid is the clinic numerical concept identification number.') 
  add column 'encounternumber' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'sequential number automatically assigned to an encounter for a selected patient.') 
  add column 'unit_number' 
  set properties (datatype,precision,scale,not_null,description) values ('number',10,0,0,'unit_number is number for the unit.') 
  add column 'medcinid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'medcinid is the mediine identification number.') 
  add column 'dataid' 
  set properties (datatype,not_null,description) values ('number',0,'dataid is the unique data id for the patient.') 
  add column 'diagnosisncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'diagnosysncid is the diagnosys numerical concept identification number.') 
  add column 'facilityncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'facilityncid is the facility unique numerical concept identification number.') 
  add column 'billingcodencid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'billingcodencid is the billiing code numerical concept identification number.') 
  add column 'icd9ncid' 
  set properties (datatype,not_null,description) values ('number',0,'icd9ncid is the icd9 numerical concept identification number.') 
  add column 'sourcencid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'sourcencid is the source numerical concept identification number.') 
  add column 'newfollowup' 
  set properties (datatype,length,not_null,description) values ('varchar2',50,0,'newfollowup is the follow up.') 
  add column 'pointofcarefacilityncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'point of care facility unique id. it is unique for the machine/server on which the clinical note was created.') 
  add column 'statusncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'statusncid is the status numerical concept identification number.') 
  add column 'chronicityncid' 
  set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'chronicityncid is the chronicity numerical concept identification number.') 
  add column 'noteddate' 
  set properties (datatype,not_null,description) values ('date',0,'notedate is the date noted.') 
  add column 'onsetdate' 
  set properties (datatype,not_null,description) values ('date',0,'onsetdate the date on set.') 
  add column 'rs_forcenull' 
  set properties (datatype,length,not_null,description) values ('varchar2',15,0,'')
ombcommit

omb target script:

ombcreate table 'hub_facility' 
  set properties (description) values ('<business_name>facilities</business_name>') 
  add column 'fac_sqn' 
  set properties (datatype,precision,scale,not_null,description) values ('number',12,0,0,'') 
  add column 'fac_ldts' 
  set properties (datatype,not_null,description) values ('date',0,'') 
  add column 'fac_rsrc' 
  set properties (datatype,length,not_null,description) values ('varchar2',20,0,'') 
  add column 'fac_id' 
  set properties (datatype,length,not_null,description) values ('varchar2',100,0,'facilityncid is the facility unique numerical concept identification number.')
ombcommit

omb sequence script:

ombcreate sequence 'sqn_hub_facility' 
  set properties(deployable,increment_by,start_with) 
  values ('false',1,0)
ombcommit

omb mapping script:

ombcreate mapping 'm_hub_diagnosis_ahltat_diagnos' 
 set properties (business_name, description) 
 values ('load hub_diagnosis from ahltat_diagnosis', 'rapidace generated mapping') 
 add table operator 'hub_diagnosis' bound to table 'hub_diagnosis'
 add table operator 'ahltat_diagnosis' bound to table '../stage/ahltat_diagnosis'
 add deduplicator operator 'dd_unique'
 add expression operator 'exp_chg_datatype'
 add lookup operator 'lkp_target_key' bound to table 'hub_diagnosis'
 add filter operator 'fil_existing_recs'
 add attribute 'dataid' of group 'inoutgrp1' of operator 'dd_unique'
   set properties (datatype,description) values ('number','dataid is the unique data id for the patient.')
 add attribute 'dataid' of group 'ingrp1' of operator 'exp_chg_datatype'
   set properties (datatype,description) values ('number','dataid is the unique data id for the patient.')
 add attribute 'odiagnosis_dataid' of group 'outgrp1' of operator 'exp_chg_datatype'
   set properties (datatype,description,expression) values ('number','dataid is the unique data id for the patient.','ingrp1.dataid')
 add attribute 'odiagnosis_dataid' of group 'ingrp1' of operator 'lkp_target_key'
   set properties (datatype,description) values ('number','dataid is the unique data id for the patient.')
 add attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'fil_existing_recs'
   set properties (datatype,description) values ('number','dataid is the unique data id for the patient.')
 add connection from attribute 'dataid' of group 'inoutgrp1' of operator 'ahltat_diagnosis'
   to attribute 'dataid' of group 'inoutgrp1' of operator 'dd_unique'
 add connection from attribute 'dataid' of group 'inoutgrp1' of operator 'dd_unique'
   to attribute 'dataid' of group 'ingrp1' of operator 'exp_chg_datatype'
 add connection from attribute 'odiagnosis_dataid' of group 'outgrp1' of operator 'exp_chg_datatype'
   to attribute 'odiagnosis_dataid' of group 'ingrp1' of operator 'lkp_target_key'
 add connection from attribute 'odiagnosis_dataid' of group 'outgrp1' of operator 'exp_chg_datatype'
   to attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'fil_existing_recs'
 add connection from attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'fil_existing_recs'
   to attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'hub_diagnosis'
 add attribute 'ldiagnosis_sqn' of group 'inoutgrp1' of operator 'fil_existing_recs'
   set properties (datatype,precision,scale,description) values ('number',12,0,'')
 add connection from attribute 'diagnosis_sqn' of group 'outgrp1' of operator 'lkp_target_key'
   to attribute 'ldiagnosis_sqn' of group 'inoutgrp1' of operator 'fil_existing_recs'
 add constant operator 'cnst_ldts_rsrc'
 add attribute 'diagnosis_ldts' of group 'outgrp1' of operator 'cnst_ldts_rsrc'
   set properties (datatype,description,expression) values ('date','the date and time the record was loaded.','sysdate()')
 add attribute 'diagnosis_rsrc' of group 'outgrp1' of operator 'cnst_ldts_rsrc'
   set properties (datatype,length,description,expression) values ('varchar2',20,'the record source of the data.','[ombtosettablestring 'ahltat_diagnosis']')
 add connection from attribute 'diagnosis_ldts' of group 'outgrp1' of operator 'cnst_ldts_rsrc'
   to attribute 'diagnosis_ldts' of group 'inoutgrp1' of operator 'hub_diagnosis'
 add connection from attribute 'diagnosis_rsrc' of group 'outgrp1' of operator 'cnst_ldts_rsrc'
   to attribute 'diagnosis_rsrc' of group 'inoutgrp1' of operator 'hub_diagnosis'
 add sequence operator 'sqn_hub_diagnosis' bound to sequence 'sqn_hub_diagnosis'
 add connection from attribute 'nextval' of group 'outgrp1' of operator 'sqn_hub_diagnosis'
   to attribute 'diagnosis_sqn' of group 'inoutgrp1' of operator 'hub_diagnosis'
ombcommit
ombalter mapping 'm_hub_diagnosis_ahltat_diagnos'
 modify operator 'fil_existing_recs' set properties (filter_condition) values ('inoutgrp1.ldiagnosis_sqn is null')
 modify operator 'lkp_target_key' set properties (lookup_condition) values ('ingrp1.odiagnosis_dataid = outgrp1.diagnosis_dataid')

this is all pattern based generation.  if you change the cross-reference, it will change the generation.  if you change the source or target model, it will change the generation… 

i can produce 100% error free etl mappings for multiple etl tools this way.  would you be interested in a service like this?  

you can contact me directly for my technology (which is rapidace based, or you can contact my friends)  at: http://www.analytixids.com – they’ve got some really cool technology.

cheers,
dan linstedt

Tags: , , , , ,

No comments yet.

Leave a Reply

*