Hi,
I am working on the ETL design for the CRM system and would need to device a matching logic while matching person information in the CRM database ( person table ) with the information for the person in the stg table.
The Match is based on:
1. First Name
2.Last Name
3. Email Address
4.Email Address
5.Phone Number
6. Address Line1
7.Address Line2
8.City
9.State
10.Postal code
If there is a match found a new record in the CRM person table would not be created. But if the match is not found a new record will be created. Due the nature of the data coming in from different sources I tried using the Fuzzy Lookup Transformation in SSIS. Based on the _similarity score and the threshold defined (>=8.85) the record is either ignored or created in the CRM database.
Now when database will be back filled the person table would have 10 M-12 M rows. I have read that it is best practice to use the Fuzzy Lookup Transformation in SSIS with small datasets . The staging data sets would be approx 4000 records daily but the reference data set would be 10-12 M records.
What is the best method to achieve such kind of matching logic in SSIS ? Is it a good approach to use to stored procedure and function (Tsql) to achieve this ? Which approach would be faster and optimal ? ( Considering we have indexes set up on the person table)
Please advice ......
Thanks
EVA05