Quantcast
Channel: SQL Server Integration Services forum
Viewing all articles
Browse latest Browse all 24688

FlatFileSource CSV: What options for text with embedded commas?

$
0
0

Is there a simple transform when a csv field is sometimes enclosed in double quotes, and sometimes not?

I am processing an inbound csv file.  For the most part, it is well defined and consistent.  The exception occurs when a text field has an embedded comma.  When it does, the field is enclosed in double quotes. When it does not, the field is not enclosed in double quotes.
I am using a Flat File Connection Manager with a Delimited format, using the comma as the Column delimiter. The Text qualifier in the General properties sheet is set to <none>.

Under these conditions, when a text field encloses in quotes with an embedded comma is encountered, the quote character is treated as a regular character and the embedded comma is treated as a column delimiter.
If I set the Text qualifier to the double quote character, the Data Flow task fails when it encounters a field that is not delimited by the double quote character.

My current work around is to first process the file in a C# script that uses a regular expression to find a field enclosed in quotes with an embedded comma, and strip the quotes and replace the comma with a tilde. I then update the target table by replacing each tilde character with a comma.  I believe this is sufficient for my needs, but it will not handle the following scenarios:
The field is the first or last field in the row
The field has more than one embedded comma
This is my "find" regular expression: ,"([^"]*),([^"]*)",
This is my "replace" regular expression: ,$1~$2,

Can anyone suggest an alternative approach, or a more robust regular expression pattern?

Thanks,
Ed


Viewing all articles
Browse latest Browse all 24688

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>