Services To Generate ETL Load Code
how would you like to take: a) a source model, b) target model, c) cross-reference (source-to-target) spread-sheet, push it in to a service, and automatically generate target etl loading code? i’ve got a service i’m working on that does just that. i hope it’s available soon. for now, here’s some preliminary information & an example that generates owb / omb script code from the inputs mentioned above.
for now, i generate only oracle warehouse builder code. shortly (within the next few weeks) i’ll be generating informatica v9 loading code.
required inputs:
- source data model ddl
- target data model ddl
- xlsx or xls (excel spreadsheet) source to target cross-reference
right now, i generate code to load data from a staging area to a hub and a link target, this afternoon i’ll be generating code to load and end-date the satellites.
example source data model ddl:
create table "stage"."ahltat_diagnosis" ( "doc_ref" number constraint "sys_c0011445" not null , "createtime" date null , "createuserncid" number(20) null , "modifyuserncid" number(20) null , "modifytime" date null , "priority" varchar2(50 byte) null , "diagnosescomment" clob null , "extendedicd9" varchar2(50 byte) null , "sensitivity" varchar2(50 byte) null , "updatedon" date null , "updatedby" number(20) null , "tmip_status" number null , "updateflag" varchar2(1 byte) null , "cdr_dataid" number(20) null , "createdon" date null , "createdby" number(20) null , "prefix" varchar2(5 byte) null , "clinicianncid" number(20) null , "encounternumber" number(20) null , "unit_number" number(10) null , "medcinid" number(20) null , "dataid" number null , "diagnosisncid" number(20) null , "facilityncid" number(20) null , "billingcodencid" number(20) null , "icd9ncid" number null , "sourcencid" number(20) null , "newfollowup" varchar2(50 byte) null , "pointofcarefacilityncid" number(20) null , "statusncid" number(20) null , "chronicityncid" number(20) null , "noteddate" date null , "onsetdate" date null , "rs_forcenull" varchar2(15 byte) null );
comment on table "stage"."ahltat_diagnosis" is 'records diagnosis information provided by the clinician.'; comment on column "stage"."ahltat_diagnosis"."createtime" is 'createtime is the time created the diagnoses.'; comment on column "stage"."ahltat_diagnosis"."createuserncid" is 'createuserncid is the create user numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."modifyuserncid" is 'modifyuserncid is the modify user numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."modifytime" is 'time modified.'; comment on column "stage"."ahltat_diagnosis"."priority" is 'indicates the order, whether the primary diagnosis or secondary, etc., diagnosis.'; comment on column "stage"."ahltat_diagnosis"."diagnosescomment" is 'text field for amplification of the diagnosis.'; comment on column "stage"."ahltat_diagnosis"."sensitivity" is 'sensitivity indicates the sensitivity of the diagnoses.'; comment on column "stage"."ahltat_diagnosis"."updatedon" is 'updatedon is the date who updated.'; comment on column "stage"."ahltat_diagnosis"."updatedby" is 'updatedby is who updated.'; comment on column "stage"."ahltat_diagnosis"."tmip_status" is 'the tmip_status is for data manager transfer from lddb to tmip.'; comment on column "stage"."ahltat_diagnosis"."updateflag" is 'updateflag is the flag to indicate if update.'; comment on column "stage"."ahltat_diagnosis"."cdr_dataid" is 'crd_dataid is the crd data identification number.'; comment on column "stage"."ahltat_diagnosis"."createdon" is 'createdon is the created date.'; comment on column "stage"."ahltat_diagnosis"."createdby" is 'createdby is who created.'; comment on column "stage"."ahltat_diagnosis"."clinicianncid" is 'clinicncid is the clinic numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."encounternumber" is 'sequential number automatically assigned to an encounter for a selected patient.'; comment on column "stage"."ahltat_diagnosis"."unit_number" is 'unit_number is number for the unit.'; comment on column "stage"."ahltat_diagnosis"."medcinid" is 'medcinid is the mediine identification number.'; comment on column "stage"."ahltat_diagnosis"."dataid" is 'dataid is the unique data id for the patient.'; comment on column "stage"."ahltat_diagnosis"."diagnosisncid" is 'diagnosysncid is the diagnosys numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."facilityncid" is 'facilityncid is the facility unique numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."billingcodencid" is 'billingcodencid is the billiing code numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."icd9ncid" is 'icd9ncid is the icd9 numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."sourcencid" is 'sourcencid is the source numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."newfollowup" is 'newfollowup is the follow up.'; comment on column "stage"."ahltat_diagnosis"."pointofcarefacilityncid" is 'point of care facility unique id. it is unique for the machine/server on which the clinical note was created.'; comment on column "stage"."ahltat_diagnosis"."statusncid" is 'statusncid is the status numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."chronicityncid" is 'chronicityncid is the chronicity numerical concept identification number.'; comment on column "stage"."ahltat_diagnosis"."noteddate" is 'notedate is the date noted.'; comment on column "stage"."ahltat_diagnosis"."onsetdate" is 'onsetdate the date on set.';
target ddl:
create table vault.hub_facility ( fac_sqn number (12) not null, fac_ldts date not null, fac_rsrc varchar2 (20 byte) not null, fac_id varchar2 (100 byte) not null ) ;
comment on table vault.hub_facility is '<business_name>facilities</business_name>'; comment on column vault.hub_facility.fac_id is 'facilityncid is the facility unique numerical concept identification number.';
cross-reference example:
source table | source column | group | target table | target column |
ahltat_diagnosis | doc_ref | 1 | sat_ahltat_diagnosis | doc_ref |
dataid | 1 | hub_diagnosis | diagnosis_dataid | |
facilityncid | 1 | hub_facility | fac_id | |
diagnosisncid | 1 | sat_ahltat_diagnosis | diagnosisncid | |
encounternumber | 1 | hub_event | evnt_id | |
clinicianncid | 1 | hub_clinician | clinician_ncid | |
unit_number | 1 | hub_unit | unit_id | |
medcinid | 1 | hub_medcin | medcin_id | |
createtime | 1 | sat_ahltat_diagnosis | createtime | |
createuserncid | 1 | sat_ahltat_diagnosis | createuserncid |
omb source script:
ombcreate table 'ahltat_diagnosis' set properties (description) values ('records diagnosis information provided by the clinician.') add column 'doc_ref' set properties (datatype,not_null,description) values ('number',0,'') add column 'createtime' set properties (datatype,not_null,description) values ('date',0,'createtime is the time created the diagnoses.') add column 'createuserncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'createuserncid is the create user numerical concept identification number.') add column 'modifyuserncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'modifyuserncid is the modify user numerical concept identification number.') add column 'modifytime' set properties (datatype,not_null,description) values ('date',0,'time modified.') add column 'priority' set properties (datatype,length,not_null,description) values ('varchar2',50,0,'indicates the order, whether the primary diagnosis or secondary, etc., diagnosis.') add column 'diagnosescomment' set properties (datatype,not_null,description) values ('clob',0,'text field for amplification of the diagnosis.') add column 'extendedicd9' set properties (datatype,length,not_null,description) values ('varchar2',50,0,'') add column 'sensitivity' set properties (datatype,length,not_null,description) values ('varchar2',50,0,'sensitivity indicates the sensitivity of the diagnoses.') add column 'updatedon' set properties (datatype,not_null,description) values ('date',0,'updatedon is the date who updated.') add column 'updatedby' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'updatedby is who updated.') add column 'tmip_status' set properties (datatype,not_null,description) values ('number',0,'the tmip_status is for data manager transfer from lddb to tmip.') add column 'updateflag' set properties (datatype,length,not_null,description) values ('varchar2',1,0,'updateflag is the flag to indicate if update.') add column 'cdr_dataid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'crd_dataid is the crd data identification number.') add column 'createdon' set properties (datatype,not_null,description) values ('date',0,'createdon is the created date.') add column 'createdby' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'createdby is who created.') add column 'prefix' set properties (datatype,length,not_null,description) values ('varchar2',5,0,'') add column 'clinicianncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'clinicncid is the clinic numerical concept identification number.') add column 'encounternumber' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'sequential number automatically assigned to an encounter for a selected patient.') add column 'unit_number' set properties (datatype,precision,scale,not_null,description) values ('number',10,0,0,'unit_number is number for the unit.') add column 'medcinid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'medcinid is the mediine identification number.') add column 'dataid' set properties (datatype,not_null,description) values ('number',0,'dataid is the unique data id for the patient.') add column 'diagnosisncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'diagnosysncid is the diagnosys numerical concept identification number.') add column 'facilityncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'facilityncid is the facility unique numerical concept identification number.') add column 'billingcodencid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'billingcodencid is the billiing code numerical concept identification number.') add column 'icd9ncid' set properties (datatype,not_null,description) values ('number',0,'icd9ncid is the icd9 numerical concept identification number.') add column 'sourcencid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'sourcencid is the source numerical concept identification number.') add column 'newfollowup' set properties (datatype,length,not_null,description) values ('varchar2',50,0,'newfollowup is the follow up.') add column 'pointofcarefacilityncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'point of care facility unique id. it is unique for the machine/server on which the clinical note was created.') add column 'statusncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'statusncid is the status numerical concept identification number.') add column 'chronicityncid' set properties (datatype,precision,scale,not_null,description) values ('number',20,0,0,'chronicityncid is the chronicity numerical concept identification number.') add column 'noteddate' set properties (datatype,not_null,description) values ('date',0,'notedate is the date noted.') add column 'onsetdate' set properties (datatype,not_null,description) values ('date',0,'onsetdate the date on set.') add column 'rs_forcenull' set properties (datatype,length,not_null,description) values ('varchar2',15,0,'') ombcommit
omb target script:
ombcreate table 'hub_facility' set properties (description) values ('<business_name>facilities</business_name>') add column 'fac_sqn' set properties (datatype,precision,scale,not_null,description) values ('number',12,0,0,'') add column 'fac_ldts' set properties (datatype,not_null,description) values ('date',0,'') add column 'fac_rsrc' set properties (datatype,length,not_null,description) values ('varchar2',20,0,'') add column 'fac_id' set properties (datatype,length,not_null,description) values ('varchar2',100,0,'facilityncid is the facility unique numerical concept identification number.') ombcommit
omb sequence script:
ombcreate sequence 'sqn_hub_facility' set properties(deployable,increment_by,start_with) values ('false',1,0) ombcommit
omb mapping script:
ombcreate mapping 'm_hub_diagnosis_ahltat_diagnos' set properties (business_name, description) values ('load hub_diagnosis from ahltat_diagnosis', 'rapidace generated mapping') add table operator 'hub_diagnosis' bound to table 'hub_diagnosis' add table operator 'ahltat_diagnosis' bound to table '../stage/ahltat_diagnosis' add deduplicator operator 'dd_unique' add expression operator 'exp_chg_datatype' add lookup operator 'lkp_target_key' bound to table 'hub_diagnosis' add filter operator 'fil_existing_recs' add attribute 'dataid' of group 'inoutgrp1' of operator 'dd_unique' set properties (datatype,description) values ('number','dataid is the unique data id for the patient.') add attribute 'dataid' of group 'ingrp1' of operator 'exp_chg_datatype' set properties (datatype,description) values ('number','dataid is the unique data id for the patient.') add attribute 'odiagnosis_dataid' of group 'outgrp1' of operator 'exp_chg_datatype' set properties (datatype,description,expression) values ('number','dataid is the unique data id for the patient.','ingrp1.dataid') add attribute 'odiagnosis_dataid' of group 'ingrp1' of operator 'lkp_target_key' set properties (datatype,description) values ('number','dataid is the unique data id for the patient.') add attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'fil_existing_recs' set properties (datatype,description) values ('number','dataid is the unique data id for the patient.') add connection from attribute 'dataid' of group 'inoutgrp1' of operator 'ahltat_diagnosis' to attribute 'dataid' of group 'inoutgrp1' of operator 'dd_unique' add connection from attribute 'dataid' of group 'inoutgrp1' of operator 'dd_unique' to attribute 'dataid' of group 'ingrp1' of operator 'exp_chg_datatype' add connection from attribute 'odiagnosis_dataid' of group 'outgrp1' of operator 'exp_chg_datatype' to attribute 'odiagnosis_dataid' of group 'ingrp1' of operator 'lkp_target_key' add connection from attribute 'odiagnosis_dataid' of group 'outgrp1' of operator 'exp_chg_datatype' to attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'fil_existing_recs' add connection from attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'fil_existing_recs' to attribute 'diagnosis_dataid' of group 'inoutgrp1' of operator 'hub_diagnosis' add attribute 'ldiagnosis_sqn' of group 'inoutgrp1' of operator 'fil_existing_recs' set properties (datatype,precision,scale,description) values ('number',12,0,'') add connection from attribute 'diagnosis_sqn' of group 'outgrp1' of operator 'lkp_target_key' to attribute 'ldiagnosis_sqn' of group 'inoutgrp1' of operator 'fil_existing_recs' add constant operator 'cnst_ldts_rsrc' add attribute 'diagnosis_ldts' of group 'outgrp1' of operator 'cnst_ldts_rsrc' set properties (datatype,description,expression) values ('date','the date and time the record was loaded.','sysdate()') add attribute 'diagnosis_rsrc' of group 'outgrp1' of operator 'cnst_ldts_rsrc' set properties (datatype,length,description,expression) values ('varchar2',20,'the record source of the data.','[ombtosettablestring 'ahltat_diagnosis']') add connection from attribute 'diagnosis_ldts' of group 'outgrp1' of operator 'cnst_ldts_rsrc' to attribute 'diagnosis_ldts' of group 'inoutgrp1' of operator 'hub_diagnosis' add connection from attribute 'diagnosis_rsrc' of group 'outgrp1' of operator 'cnst_ldts_rsrc' to attribute 'diagnosis_rsrc' of group 'inoutgrp1' of operator 'hub_diagnosis' add sequence operator 'sqn_hub_diagnosis' bound to sequence 'sqn_hub_diagnosis' add connection from attribute 'nextval' of group 'outgrp1' of operator 'sqn_hub_diagnosis' to attribute 'diagnosis_sqn' of group 'inoutgrp1' of operator 'hub_diagnosis' ombcommit ombalter mapping 'm_hub_diagnosis_ahltat_diagnos' modify operator 'fil_existing_recs' set properties (filter_condition) values ('inoutgrp1.ldiagnosis_sqn is null') modify operator 'lkp_target_key' set properties (lookup_condition) values ('ingrp1.odiagnosis_dataid = outgrp1.diagnosis_dataid')
this is all pattern based generation. if you change the cross-reference, it will change the generation. if you change the source or target model, it will change the generation…
i can produce 100% error free etl mappings for multiple etl tools this way. would you be interested in a service like this?
you can contact me directly for my technology (which is rapidace based, or you can contact my friends) at: http://www.analytixids.com – they’ve got some really cool technology.
cheers,
dan linstedt
No comments yet.