Friday, September 17, 2010

Re: Myspell -> Hunspell plan?

Dominique Pelle wrote:

> >> Vim-7.3 currently creates spelling dictionaries from Myspell dictionarie=
> s.
> >> I am wondering whether there is any plan to support Hunspell dictionarie=
> s.
> >>
> >> The French dictionary at http://www.dicollecte.org/download.php?prj=3Dfr=
> states:
> >>
> >> =3D=3D=3D [ fr] =3D=3D=3D
> >> Ces dictionnaires pour Myspell ne seront plus mis =C3=83 =C2=A0jour, Mys=
> pell ayant
> >> =C3=83=C2=A9t=C3=83=C2=A9 remplac=C3=83=C2=A9 par Hunspell dans la plupa=
> rt des applications.
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>
> >> Which means in English:
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> These dictionaries for Myspell won't be kept up-to-date, Myspell
> >> being replaced by Hunspell in most applications.
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>
> >> It's a pity if we can't use the latest dictionaries in Vim anymore.
> >> I have no idea how much work is involved in supporting Hunspell.
> >>
> >> When trying to run :mkspell on the Hunspell French dictionary,
> >> available at...
> >>
> >> http://www.dicollecte.org/download/fr/hunspell-fr-moderne-v3.8.zip
> >>
> >> ... Vim reports the following messages:
> >>
> >> Unrecognized or duplicate item in fr-moderne.aff line 10: WORDCHARS
> >> Unrecognized or duplicate item in fr-moderne.aff line 98: KEY
> >> Unrecognized or duplicate item in fr-moderne.aff line 100: ICONV
> >> ...snip...
> >> Unrecognized or duplicate item in fr-moderne.aff line 135: OCONV
> >> Unrecognized or duplicate item in fr-moderne.aff line 154: BREAK
> >> Unrecognized or duplicate item in fr-moderne.aff line 155: BREAK
> >> Reading dictionary file fr-moderne.dic ...
> >> First duplicate word in fr-moderne.dic line 3815: V
> >> 392 duplicate word(s) in fr-moderne.dic
> >> Compressing word tree...
> >> Compressed 4390813 of 4735831 nodes; 345018 (7%) remaining
> >> Compressed 313845 of 391932 nodes; 78087 (19%) remaining
> >> Writing spell file fr.utf-8.spl ...
> >> Done!
> >> Estimated runtime memory use: 2116435 bytes
> >>
> >> It creates a dictionary for Vim, but when doing :spelldump to see
> >> words in the created dictionay, I see a lot of junk (words beginning
> >> with 0, words with /=3D at the end for example) so Vim does not
> >> understand Hunspell files.
> >>
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> # file: /home/pel/.vim/spell/fr-moderne.utf-8.spl
> >> 0amp=C3=83=C2=A8re
> >> 0becquerel
> >> 0calorie
> >> ...snip...
> >> =C3=82=C2=B5m/=3D
> >> =C3=82=C2=B5mol/=3D
> >> =C3=82=C2=B5s/=3D
> >> =C3=82=C2=B5var/=3D
> >> =C3=82=C2=B5=C3=A2=E2=80=9E=C2=A6/=3D
> >> =C3=83=E2=80=A6/=3D
> >> =C3=83=E2=80=B0pinay-sur-Seine
> >> =C3=83=E2=80=B0tats-Unis
> >> =C3=83=C5=BDle-de-France
> >> =C3=83=C5=BDle-du-Prince-=C3=83=E2=80=B0douard
> >> =C3=83=C5=BDles-de-la-Madeleine
> >> =C3=A2=E2=80=9E=C2=A6/=3D
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>
> >> The help file spell.txt has notes about WORDCHARS, KEY, BREAK
> >> which don't seem essentials but there is no note about ICONV
> >> and OCONV in Vim's help. =C2=A0I see some doc here:
> >> http://manpages.ubuntu.com/manpages/lucid/man4/hunspell.4.html
> >
> > Hunspell uses the same kind of files, but adds more options. =C2=A0Vim sh=
> ould
> > be able to use most of the Hunspell files, with some modifications.
> >
> > I don't know what the ICONV and OCONV items mean.
> > The page you refer to simply say input and output conversion, without
> > explaining what that means. =C2=A0It's a common problem for Hunspell that
> > it's largely undefined how it works. =C2=A0You may need to look at the so=
> urce
> > code...
> >
> > For the dictionaries, it's usually best to get them from the OpenOffice
> > site, as that's what is downloaded automatically, thus should be kept
> > up-to-date.
>
>
> Warning: this message uses Unicode characters.
>
>
> Yes, the Hunspell documentation is not very clear. From looking
> at the dictionary "fr-modern.aff", I looks like ICONV and OCONV
> define aliases for Unicode characters that are equivalent or similar
> enough to be equivalent. File "fr-modern.aff" contains:
>
> ICONV 32
> ICONV a=CC=80 =C3=A0
> ICONV a=CC=82 =C3=A2
> ICONV a=CC=88 =C3=A4
> ICONV e=CC=81 =C3=A9
> ICONV e=CC=80 =C3=A8
> ICONV e=CC=82 =C3=AA
> ICONV e=CC=88 =C3=AB
> ICONV i=CC=82 =C3=AE
> ICONV i=CC=88 =C3=AF
> ICONV o=CC=82 =C3=B4
> ICONV o=CC=88 =C3=B6
> ICONV u=CC=80 =C3=B9
> ICONV u=CC=82 =C3=BB
> ICONV u=CC=88 =C3=BC
> ICONV y=CC=88 =C3=BF
> ICONV c=CC=A7 =C3=A7
> ICONV A=CC=80 =C3=80
> ICONV A=CC=82 =C3=82
> ICONV A=CC=88 =C3=84
> ICONV E=CC=81 =C3=89
> ICONV E=CC=80 =C3=88
> ICONV E=CC=82 =C3=8A
> ICONV E=CC=88 =C3=8B
> ICONV I=CC=82 =C3=8E
> ICONV I=CC=88 =C3=8F
> ICONV O=CC=82 =C3=94
> ICONV O=CC=88 =C3=96
> ICONV U=CC=80 =C3=99
> ICONV U=CC=82 =C3=9B
> ICONV U=CC=88 =C3=9C
> ICONV Y=CC=88 =C5=B8
> ICONV C=CC=A7 =C3=87
>
> OCONV 1
> OCONV ' =E2=80=99
>
> The first line with ICONV (resp. OCONV) is followed by a number
> indicating the number of ICONV entries (resp. OCONV).
>
> Not sure how essential it is to support. I don't think
> it explains the odd words I see with ":spelldump".
>
> The first few incorrect words given by ":spelldump" are units:
>
> 0amp=C3=A8re
> 0becquerel
> 0calorie
> (etc)
>
> They appear like this in the "fr-modern.dic" file
> (http://www.dicollecte.org/download.php?prj=3Dfr):
>
> amp=C3=A8re/Um()
> becquerel/Um()
> calorie/Um()
>
> And in fr-modern.aff file, I see:
>
> NEEDAFFIX ()
>
>
> PFX Um Y 29
> PFX Um 0 0/S. .
> PFX Um 0 l' [a=C3=A0=C3=A2e=C3=A8=C3=A9=C3=AAi=C3=AEo=C3=B4uy=C5=93=C3=A6]
> PFX Um 0 d'/S. [a=C3=A0=C3=A2e=C3=A8=C3=A9=C3=AAi=C3=AEo=C3=B4uy=C5=93=C3=
> =A6]
> PFX Um 0 yotta/S. .
> PFX Um 0 zetta/S. .
> PFX Um 0 exa/S. .
> PFX Um 0 l'exa .
> PFX Um 0 d'exa/S. .
> PFX Um 0 peta/S. .
> PFX Um 0 t=C3=A9ra/S. .
> PFX Um 0 giga/S. .
> PFX Um 0 m=C3=A9ga/S. .
> PFX Um 0 kilo/S. .
> PFX Um 0 hecto/S. .
> PFX Um 0 l'hecto .
> PFX Um 0 d'hecto/S. .
> PFX Um 0 d=C3=A9ca/S. .
> PFX Um 0 d=C3=A9ci/S. .
> PFX Um 0 centi/S. .
> PFX Um 0 milli/S. .
> PFX Um 0 micro/S. .
> PFX Um 0 nano/S. .
> PFX Um 0 pico/S. .
> PFX Um 0 femto/S. .
> PFX Um 0 atto/S. .
> PFX Um 0 l'atto .
> PFX Um 0 d'atto/S. .
> PFX Um 0 zepto/S. .
> PFX Um 0 yocto/S. .
>
>
> I wonder why there is an entry "PFX Um 0 0/S. ."
>
> This is causing the weird words "0amp=C3=A8re", "0becquerel",
> "0calorie" (etc. for many other units).
>
> I see that the word "amp=C3=A8re" does not exist in ":spelldump" without pr=
> efix
> (it should be there).
>
> The entry "PFX Um 0 0/S. ." must have a special meaning (such as:
> empty prefix) which is misinterpreted by Vim. But the doc is certainly
> quite unclear to me:
>
> http://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/

You can probably fix it by changing:

calorie/Um()

to:

calorie/S.
calorie/Um()

And removing this prefix:

PFX Um 0 0/S. .

Hopefully a :s command can do the change to the .dic file.

--
Although the scythe isn't pre-eminent among the weapons of war, anyone who
has been on the wrong end of, say, a peasants' revolt will know that in
skilled hands it is fearsome.
-- (Terry Pratchett, Mort)

/// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ download, build and distribute -- http://www.A-A-P.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: