Saturday, August 30, 2014

Re: an update for the romanian spell checker

sâmbătă, 30 august 2014, 19:10:59 UTC+3, Bram Moolenaar a scris:
> Vanilla Ice wrote:
>
>
>
> > > > miercuri, 13 august 2014, 15:51:25 UTC+3, Bram Moolenaar a scris:
>
> > > > > Vanilla Ice wrote:
>
> > > > >
>
> > > > >> <snip>
>
> > > > >
>
> > > > > Thanks for figuring this out. Can you please send me the .dic and .aff
>
> > > > >
>
> > > > > files you used to generate this .spl file? Or better: the URL of the
>
> > > > >
>
> > > > > files to be downloaded and a diff on top of that.
>
> > > > >
>
> > > >
>
> > > > By all means. I hope that there aren't other mistakes (I loaded a
>
> > > > text-formatted translation of "Caves of Steel" by Asimov and it
>
> > > > looked ok). In any case, it should be better than the old one,
>
> > > > which is quite unusable for correct fonts/new grammar rules.
>
> > > >
>
> > > > 1. The URL is: http://sourceforge.net/projects/rospell/files/Romanian%20dictionaries/dict-3.3.10/ro_RO.3.3.10.zip
>
> > > >
>
> > > > and
>
> > > > 2. The diff file is attached to this post.
>
> > >
>
> > > I tried using that zip file with the diff, but I get lots of errors.
>
> > >
>
> > > The first one is easy to fix: Comment-out the KEY line.
>
> > >
>
> > > I get lots of "Trailing text" errors.
>
> > > I get a few "Duplicate character in MAP" errors.
>
> > >
>
> > > For the cp1250 encoded file I get lots of "Conversion failure" errors.
>
> > >
>
> > > Did you not see these errors or did you just ignore them?
>
> >
>
> > The "trailing text" errors (a lot of them!) are caused by the fact
>
> > that the authors of the spell files didn't put a comment start after
>
> > the 5th element. From what i've seen in the vim spell.c source, the
>
> > error messages are actually warnings, i.e. they don't affect the
>
> > parsing.
>
>
>
> I think these are grammar annotations. The problem is that they could
>
> be mistakes. I think we need to add a flag that tells the parser that
>
> the fifth item is to be ignored. We could edit the file and put the
>
> changes in the diff, but that means the diff gets outdated very quickly.

Indeed. Perhaps the 'mkspell' code should be as rigorous/relaxed as the parsing program which the authors of the spell files use day by day when testing their creation (hunspell?).

> > I don't know what the effect of the "Duplicate character in MAP" error
>
> > is, but i've seen no invalid spell behaviour with the new .spl.
>
> >
>
> > Also, i just generated the ro.cp1250.spl file in gvim/Windows using
>
> >
>
> > :set encoding=cp1250
>
> > :mkspell ro ro_RO in the right folder, and had no other errors than the above ones (no "Conversion failure" errors)
>
>
>
> I'm doing this on Ubuntu. The locale handling is a bit complicated and
>
> iconv is quite strict. I hope someone can make it work on Ubuntu
>
> without errors. The errors might be harmless, but they will cause other
>
> errors to go unnoticed.

I did a bit of reading, and - one of the issues is the fact that the "right" diacritics in Romanian are *not* in the cp1250 standard. But the wrong ones (cedilla form) were used nonetheless with Windows versions up to including XP. The right format is ș,ț (no cedilla). Also the new Romanian writing rules have changed the usage of î inside words with â. The right diacritics, i read, are in the ISO8859-16 (Latin10) standard. Ideally, for spelling there should be some standard to allow both forms and adapt to the user's preferences ... So, one language, two set of rules.

But, on my ArchLinux machine, i can't get Vim to save/convert the ro_RO.aff from utf-8 to iso8859-16 without errors, or to set encoding to this format and run mkspell without conversion errors.

Regards

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: