Quantcast
Channel: SQL Server Integration Services forum
Viewing all articles
Browse latest Browse all 24688

Fuzzy Matching Transformation

$
0
0

Hi, 

I am working on the ETL design for the CRM system and would need to device a matching logic while matching person information in the CRM database ( person table ) with the information for the person in the stg table. 

The Match is based on:

1. First Name 

2.Last Name 

3. Email Address 

4.Email Address 

5.Phone Number 

6. Address Line1 

7.Address Line2 

8.City 

9.State

10.Postal code 

If there is a match found a new record in the CRM person table would not be created. But if the match is not found a new record will be created. Due the nature of the data coming in from different sources I tried using the Fuzzy Lookup Transformation in SSIS. Based on the _similarity score and the threshold defined (>=8.85) the record is either ignored or created in the CRM database. 

Now when database will be back filled the person table would have 10 M-12 M rows.  I have read that it is best practice to use the Fuzzy Lookup Transformation in SSIS with small datasets . The staging data sets would be approx 4000 records daily but the reference data set would be 10-12 M records. 

What is the best method to achieve such kind of matching logic in SSIS ? Is it a good approach to use to stored procedure and function (Tsql) to achieve this ? Which approach would be faster and optimal ? ( Considering we have indexes set up on the person table) 

Please advice ......

Thanks


EVA05


Viewing all articles
Browse latest Browse all 24688

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>