Integrating a system that provides a web service API can be a painful exercise, because you're often forced to call a web service for every row in the pipeline. This is incredibly slow.
I'm upgrading our SSIS integration with dynamics CRM to start using the CRM executeMultipleRequest. This allows you to put multiple request objects into a collection, and then call the CRM service once for the collection, significantly reducing the number of web service calls.
In pseudocode, the basic algorithm to perform the batching is pretty simple:
But you're not going to be able to redirectRowToErrorOutput(), because that row context no longer makes sense. The row context would be the thousandth row (the one that filled the batch), but you want to get the error information that relates to all of the rows in the batch.
You could create a connection to a database from within your script task, iterate over your response collection and use your connection to write each error response into a table. But it seems "wrong" to me to be creating a connection to a database and writing rows from *within* the script task. Burying that kind of code in there seems to go against the overall design intent behind SSIS.
Another possibility might be to iterate over the response collection and call FireError for any response containing a fault, and add an SSIS event handler which parses the error info and writes to an error logging table. This "feels" slightly better to me, but it's still a little bit ugly, because getting useful information into the error handler (eg, the ID of the entity that caused the error, the exception information, and so on) has to be done through globals. Yuck! Furthermore, the event handler can't be tied to the script task itself, it can only be tied to the whole data flow, meaning that you're going to have to decide whether the error being handled is actually one of the errors you want to be handling (ie, just errors you are raising via FireError). It's all rather unwieldy.
Really, the ideal solution would be to be able to direct all of the rows in the response batch into the SSIS component error output, adding on the exception information as additional derived columns for the row, and then process them like you would process any other error rows. But I don't see any way of doing that given that the row context is gone.
Does anyone have a solution to this that that isn't a nasty, obfuscated hack?
I'm upgrading our SSIS integration with dynamics CRM to start using the CRM executeMultipleRequest. This allows you to put multiple request objects into a collection, and then call the CRM service once for the collection, significantly reducing the number of web service calls.
In pseudocode, the basic algorithm to perform the batching is pretty simple:
public override void Input0_ProcessInputRow(Input0Buffer Row) { batch.add(Row); if batch.size > 1000 processBatch(); // call the batch web service batch.clear; }But what about dealing with the results? The processBatch method is going to recieve a response collection object containing one result for each row in the batch. If an error occurred, you're going to want to know which request caused the error, and what the problem was.
But you're not going to be able to redirectRowToErrorOutput(), because that row context no longer makes sense. The row context would be the thousandth row (the one that filled the batch), but you want to get the error information that relates to all of the rows in the batch.
You could create a connection to a database from within your script task, iterate over your response collection and use your connection to write each error response into a table. But it seems "wrong" to me to be creating a connection to a database and writing rows from *within* the script task. Burying that kind of code in there seems to go against the overall design intent behind SSIS.
Another possibility might be to iterate over the response collection and call FireError for any response containing a fault, and add an SSIS event handler which parses the error info and writes to an error logging table. This "feels" slightly better to me, but it's still a little bit ugly, because getting useful information into the error handler (eg, the ID of the entity that caused the error, the exception information, and so on) has to be done through globals. Yuck! Furthermore, the event handler can't be tied to the script task itself, it can only be tied to the whole data flow, meaning that you're going to have to decide whether the error being handled is actually one of the errors you want to be handling (ie, just errors you are raising via FireError). It's all rather unwieldy.
Really, the ideal solution would be to be able to direct all of the rows in the response batch into the SSIS component error output, adding on the exception information as additional derived columns for the row, and then process them like you would process any other error rows. But I don't see any way of doing that given that the row context is gone.
Does anyone have a solution to this that that isn't a nasty, obfuscated hack?