Hi,
I want to know exactly how SQL Server handles the data from an SSIS package.
I have a package which loads 120millions rows.
I do multiple lookups & transforms before inserting in my DW. Here I'm loading 60 columns.
the dataflow loads the first 80millions correctly at around 18000rows/sec, but after this the performance drops and I load at 3000 rows/sec and less...
so took 1h30 for 80millions then 4h for 40millions...
the DW destination table has a clustered index for the partition we need + a simple index on 1 column (same column as used in the first dataflow).
so, I know that the cluster index impacts the loading, because SQL Server may have to move pages, which decrease the performance. ok, correct.
but why this occurs suddenly?
and more interesting...
if I stop my package after the first 80millions rows and start it again to load the other 40million rows, the performance didn't degrade during the loading of the additional 40millions!
so if the problem is only reorganizing the indexes the problem should occurs during the loads of the missing 40 millions rows too, not only when I try to load everything!
for me its like SQL Server is not able to handle correctly the loads after X amount of data processed, after a certain point SQL Server cant do more.
does anybody has an explanation or reason for this behavior?
If I load few columns and not all the required ones, the process takes 30min for the 120milions rows without any performance issue, while the same indexes and partitions are used. so its really related to the amount of data in GB not the number of rows.