Environment: win7, SQL Server 2008 R2
Application: Microsoft SQL Management Studio 2008 R2, Business Intelligence 2008 - SSIS
SSIS competency level: Novice
ETL Process ( it works fine - no issues): The following flowcharts illustrate basic ETL process, where the data is being transformed to a staging table [destination table]. The staging table consist of the following fields
(id, ssn, Fname, Lname, Subject_cd, Test_dt, Score, comments, ind_response)
![]()
![]()
After running ETL package in SSIS, the data were loaded to the Staging table (destination table)The following code shows the data being created in the staging table to use it later for another data transaction
Code:
CREATE TABLE Staging_Table (
id CHAR(9)
,ssn CHAR(9) NOT NULL
,Fname VARCHAR(50) NOT NULL
,Lname VARCHAR(50) NOT NULL
,Subject_cd char (2)
,Test_dt datetime
,Score char(2), comments nvarchar(250),ind_response nvarchar(250)
);
INSERT INTO Staging_Table (
id
,ssn
,Fname
,Lname
,Test_dt
,Score
,Subject_cd, comments ,ind_response
)
VALUES (
'123456781','123549874'
,'Sally', 'Johnson','20111125',
'3','QB', 'N/A', '1243212221144121321411123332411121'
);
INSERT INTO Staging_Table (
id
,ssn
,Fname
,Lname
,Test_dt
,Subject_cd
,Score,comments ,ind_response
)
VALUES (
'123456792','003549874'
,'Will', 'Smith','20101025',
'AD','3','Test was good','1231121223334121334121412'
);
INSERT INTO Staging_Table (
id
,ssn
,Fname
,Lname
,Test_dt
,Score
,Subject_cd, comments ,ind_response
)
VALUES (
'120056783','993549800'
,'William', 'Wahab','20090110',
'1', 'FR','no comments', '111111111111222224121312144412'
);
INSERT INTO Staging_Table (
id
,ssn
,Fname
,Lname
,Test_dt
,Score
,Subject_cd
)
VALUES (
'213450081','128749890'
,'Douglas', 'Mike','20140214',
'+2','CH'
);
Requirement: Automate ETL process or data transaction and to schedule time to load the data in a monthly basis. Are any prerequisite to perform this task such enable certain system stored procedures. I want to eliminate manual work.
Problem: How to load staging_table data into three entity tables, while there is referential integrity data constraints.
- If matched: Check whether SSN exists in the SSN table. Insert the records (id, Subject_cd, Score, test_dt,comments, ind_response) in the [ind_subject_scores].
- If not matched: Insert the records to the following tables: SSN, Individual, then [ind_subject_scores].
- If matched in the [ind_subject_scores] then update the additional elements (comments, ind_response) in the table
Table #1:Parent
CREATE TABLE SSN (
id CHAR(9)
,ssn CHAR(9) NOT NULL
CONSTRAINT [FK_individual] FOREIGN KEY([id])
REFERENCES [individual] ([id])
);
INSERT INTO ssn (
id
,ssn
)
VALUES (
'12001212','993549800'
);
Table #2: child
CREATE TABLE individual
( id CHAR(9) NOT NULL ,
Fname VARCHAR(50) NOT NULL
,Lname VARCHAR(50) NOT NULL
, email VARCHAR(50) NULL )
INSERT INTO individual
( id ,Fname ,Lname, email )
VALUES ( '12001212','William', 'Wahab', 'fake@yahoo.com' );
Table #3.a
CREATE TABLE [dbo].[ind_subject_scores](
[ind_scr_id] [int] IDENTITY(1,1) NOT NULL,
[id] [char](9) NULL,
[subject_cd] [char](2) NULL,
[score] [varchar](2) NULL,
[test_dt] datetime
)
INSERT INTO [dbo].[ind_subject_scores] (
id
,Test_dt
,Score
,Subject_cd
)
VALUES (
897841239, '20110101'
,'2'
,'FR'
);
INSERT INTO [dbo].[ind_subject_scores] (
id
,Test_dt
,Score
,Subject_cd
)
VALUES (
80041239, '20110115'
,'2'
,'CH'
);
Table #3.b - Insert new additional elements which was requested later. These data can be updated from the staging table if matched occurred
ALTER Table [dbo].[ind_subject_scores]
add [comments] [nvarchar](250) NULL
ALTER Table dbo.ind_subject_scores
add [ind_response] [nvarchar](250) NULL