I recently started working on SSIS.
I have a SSIS package which loads around a Million records from various Excels files. This is pretty fast like a minute or 2 .
After this is done cleansing activity starts and this is where my problem starts. I have like 20 domains in my DQS in which some of them do basic checks like length of data in column is not more than a certain value(like Name cant be more than 50) and some
validate the data based on regular expression like phone number matches a pattern etc.
I also have a composite domain on top of these which internally have 6-7 rules.
like rule 1 if Domain 1 is a particular value Domain 2 cant be empty
When i cleanse 1M records directly from DQS it takes around 40 min to cleanse. When i do the same task on same data through SSIS it takes more than 8 hours and aborts some times
I have tried to increase the DefaultBufferMaxRows to 100000,
DefaultBufferSize to 104857600 for the project.
I also tried to update DCChunkSize to 100000 from 10000 in [A_CONFIGURATION] table.
I even used Balanced Data Load Distributor to split the load into 4 cleansing tasks but none or all of them couldn't make a significant improvement. These slightly slightly reduced time by an hour or so, but sometimes it fails due to
memory constraints.
Is this the max performance we could get in SSIS?
Also takes uses 100% CPU every time and doesn't free most of it even after finishing or stopping Visual studio. I have to end SQLServer From task manager.
I am running this on 16GB machine
rp