Hi everyone, I'm looking for some advise on the best method for joining two data-sets. Each data-set is on a different db platform so I was looking to do this within SSIS. In my situation I am using a source component in a data flow to get a list of prospective customers. The list of prospective customers would be around 80,000 rows with each execution. I then need to determine if the prospective customers became a customer by searching a corporate data warehouse containing millions of customer rows and return the customer id. My first attempt was to use the Lookup Component against the data warehouse, but I ran into memory limitations of course. I'm contemplating using a Merge Join along with an Order By sort within each db query, but I'm concerned how performance will be when the millions of data warehouse row move through the buffer. Obviously the row-by-row processing of a Script Component or OLE DB command should be avoided. Is there any other option I'm not thinking of? My preferred method would be avoid returning millions of rows from the data warehouse and instead return only the 80,000 rows needed, but I can't figure out a way to incorporate that logic into the data warehouse query within SSIS.
↧