I am not sure if I am using this correctly, but maybe someone can suggest a better solution for my question.
I work in the healthcare area, and am handling the survey data ETL's. There are around 8 different survey areas and based on information received from them for the visit they reference, I want to pull in more info from our invoicing database. My idea is this:
1.) Pull in the flat file to an ODBC staging table
2.) Cache all invoice records that fall between the MIN(Date of Service) and MAX(Date of Service) from the staging table.
3.) First lookup the information needed on patientID, providerID, date of service, and billing location.
4.) For the surveys that didn't match on those 4 columns, try looking up based on patientID, date of service, and billing location (since I could be 99% sure this would still return the record I need).
5.) For the remaining surveys, lookup based just on patientID and date of service. These records will be flagged for manual review because clearly, if a patient has multiple appointments in the same day, this will be prone to error.
However, in trying to use only 3 of the columns in the lookup, I get the error saying basically that I need to utilize all 4. Is there a way around this, or is there an entirely different way I should be approaching this? The reason I thought cache transform was the answer is because I will need to run a different package for each lookup, as the data and logic between each survey will vary, but the invoice data "pool" will stay the same regardless.
Thank you in advance.