Saturday, January 23, 2016

Re: problem: editing a _corrupted_ CP1252 file

On 22.01.16 17:45, Kenneth Reid Beesley wrote:
>
> I have a number of 8-bit text files that _should_ be in CP1252, but they may
> contain byte values that are undefined for CP1252, e.g. \x81, \x8D, \x8F, \x90 and \x9d.
> I.e. these are potentially corrupted files that are mostly legal CP1252, should be legal
> CP1252, and I have to make them legal CP1252.

Have you considered using e.g. tr to translate everything in one go?
E.g.

$ tr '\201\215\217\220\235' 'ABCDE' < filename

In that line, \201 is octal for \x81, etc. The replacement characters
could also be specified in octal, if they're sufficiently weird. It
won't handle unicode, but that's not required here.

The job could also be done by sed or awk. Doing it by hand seems rather
laborious.

Erik

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: