Thursday, September 16, 2010

Re: Myspell -> Hunspell plan?

Bram Moolenaar <Bram@moolenaar.net>:

> Dominique Pelle wrote:
>
>> Vim-7.3 currently creates spelling dictionaries from Myspell dictionaries.
>> I am wondering whether there is any plan to support Hunspell dictionaries.
>>
>> The French dictionary at http://www.dicollecte.org/download.php?prj=fr states:
>>
>> === [ fr] ===
>> Ces dictionnaires pour Myspell ne seront plus mis à  jour, Myspell ayant
>> été remplacé par Hunspell dans la plupart des applications.
>> =========
>>
>> Which means in English:
>> =========
>> These dictionaries for Myspell won't be kept up-to-date, Myspell
>> being replaced by Hunspell in most applications.
>> =========
>>
>> It's a pity if we can't use the latest dictionaries in Vim anymore.
>> I have no idea how much work is involved in supporting Hunspell.
>>
>> When trying to run :mkspell on the Hunspell French dictionary,
>> available at...
>>
>> http://www.dicollecte.org/download/fr/hunspell-fr-moderne-v3.8.zip
>>
>> ... Vim reports the following messages:
>>
>> Unrecognized or duplicate item in fr-moderne.aff line 10: WORDCHARS
>> Unrecognized or duplicate item in fr-moderne.aff line 98: KEY
>> Unrecognized or duplicate item in fr-moderne.aff line 100: ICONV
>> ...snip...
>> Unrecognized or duplicate item in fr-moderne.aff line 135: OCONV
>> Unrecognized or duplicate item in fr-moderne.aff line 154: BREAK
>> Unrecognized or duplicate item in fr-moderne.aff line 155: BREAK
>> Reading dictionary file fr-moderne.dic ...
>> First duplicate word in fr-moderne.dic line 3815: V
>> 392 duplicate word(s) in fr-moderne.dic
>> Compressing word tree...
>> Compressed 4390813 of 4735831 nodes; 345018 (7%) remaining
>> Compressed 313845 of 391932 nodes; 78087 (19%) remaining
>> Writing spell file fr.utf-8.spl ...
>> Done!
>> Estimated runtime memory use: 2116435 bytes
>>
>> It creates a dictionary for Vim, but when doing :spelldump to see
>> words in the created dictionay, I see a lot of junk (words beginning
>> with 0, words with /= at the end for example) so Vim does not
>> understand Hunspell files.
>>
>> =====================
>> # file: /home/pel/.vim/spell/fr-moderne.utf-8.spl
>> 0ampère
>> 0becquerel
>> 0calorie
>> ...snip...
>> µm/=
>> µmol/=
>> µs/=
>> µvar/=
>> µΩ/=
>> Ã…/=
>> Épinay-sur-Seine
>> États-Unis
>> ÃŽle-de-France
>> Île-du-Prince-Édouard
>> ÃŽles-de-la-Madeleine
>> Ω/=
>> =====================
>>
>> The help file spell.txt has notes about WORDCHARS, KEY, BREAK
>> which don't seem essentials but there is no note about ICONV
>> and OCONV in Vim's help.  I see some doc here:
>> http://manpages.ubuntu.com/manpages/lucid/man4/hunspell.4.html
>
> Hunspell uses the same kind of files, but adds more options.  Vim should
> be able to use most of the Hunspell files, with some modifications.
>
> I don't know what the ICONV and OCONV items mean.
> The page you refer to simply say input and output conversion, without
> explaining what that means.  It's a common problem for Hunspell that
> it's largely undefined how it works.  You may need to look at the source
> code...
>
> For the dictionaries, it's usually best to get them from the OpenOffice
> site, as that's what is downloaded automatically, thus should be kept
> up-to-date.


Warning: this message uses Unicode characters.


Yes, the Hunspell documentation is not very clear. From looking
at the dictionary "fr-modern.aff", I looks like ICONV and OCONV
define aliases for Unicode characters that are equivalent or similar
enough to be equivalent. File "fr-modern.aff" contains:

ICONV 32
ICONV à à
ICONV â â
ICONV ä ä
ICONV é é
ICONV è è
ICONV ê ê
ICONV ë ë
ICONV î î
ICONV ï ï
ICONV ô ô
ICONV ö ö
ICONV ù ù
ICONV û û
ICONV ü ü
ICONV ÿ ÿ
ICONV ç ç
ICONV À À
ICONV Â Â
ICONV Ä Ä
ICONV É É
ICONV È È
ICONV Ê Ê
ICONV Ë Ë
ICONV Î Î
ICONV Ï Ï
ICONV Ô Ô
ICONV Ö Ö
ICONV Ù Ù
ICONV Û Û
ICONV Ü Ü
ICONV Ÿ Ÿ
ICONV Ç Ç

OCONV 1
OCONV ' '

The first line with ICONV (resp. OCONV) is followed by a number
indicating the number of ICONV entries (resp. OCONV).

Not sure how essential it is to support. I don't think
it explains the odd words I see with ":spelldump".

The first few incorrect words given by ":spelldump" are units:

0ampère
0becquerel
0calorie
(etc)

They appear like this in the "fr-modern.dic" file
(http://www.dicollecte.org/download.php?prj=fr):

ampère/Um()
becquerel/Um()
calorie/Um()

And in fr-modern.aff file, I see:

NEEDAFFIX ()


PFX Um Y 29
PFX Um 0 0/S. .
PFX Um 0 l' [aàâeèéêiîoôuyœæ]
PFX Um 0 d'/S. [aàâeèéêiîoôuyœæ]
PFX Um 0 yotta/S. .
PFX Um 0 zetta/S. .
PFX Um 0 exa/S. .
PFX Um 0 l'exa .
PFX Um 0 d'exa/S. .
PFX Um 0 peta/S. .
PFX Um 0 téra/S. .
PFX Um 0 giga/S. .
PFX Um 0 méga/S. .
PFX Um 0 kilo/S. .
PFX Um 0 hecto/S. .
PFX Um 0 l'hecto .
PFX Um 0 d'hecto/S. .
PFX Um 0 déca/S. .
PFX Um 0 déci/S. .
PFX Um 0 centi/S. .
PFX Um 0 milli/S. .
PFX Um 0 micro/S. .
PFX Um 0 nano/S. .
PFX Um 0 pico/S. .
PFX Um 0 femto/S. .
PFX Um 0 atto/S. .
PFX Um 0 l'atto .
PFX Um 0 d'atto/S. .
PFX Um 0 zepto/S. .
PFX Um 0 yocto/S. .


I wonder why there is an entry "PFX Um 0 0/S. ."

This is causing the weird words "0ampère", "0becquerel",
"0calorie" (etc. for many other units).

I see that the word "ampère" does not exist in ":spelldump" without prefix
(it should be there).

The entry "PFX Um 0 0/S. ." must have a special meaning (such as:
empty prefix) which is misinterpreted by Vim. But the doc is certainly
quite unclear to me:

http://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/

Regards
-- Dominique

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: