Wednesday, August 26, 2009

Re: Correcting malformed csv

> I have a .csv file created by a semi-brain dead application that didn't
> properly escape quote characters within the fields. I've been trying to come
> up with a way to replace one quote character with the proper two quote
> characters within the quoted fields, but haven't yet struck upon the magic
> sequence.
> As an example, the line
>
> "ID",""Brief" Description","State","Assigned User",""Detailed" Description",
>
> (Note the quotes around "Brief" and "Detailed".)
>
> needs to be changed into
>
> "ID","""Brief"" Description","State","Assigned User","""Detailed""
> Description",

Ugh...horrible application. I haven't banged against the edge
cases on this, but for fairly "clean" source data, you might try
the following:

:%s/\%(^\|,\)"\zs.\{-}\ze"\%($\|,\)/\=substitute(submatch(0),'"',
'""','g')/g

It may fail on pathological cases where you have a string like

...,"She said "hello", to me",...

because of the comma-after-the-quote (the regexp doesn't check
for internal quote-parity) but otherwise, it should knock out the
worst offenders so you can focus on the pathological that may remain.

-tim


--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

No comments:

Post a Comment