> lun, 29 Mar 2010, Tony Mechelynck skribis:
>> On 29/03/10 02:31, Tim Chase wrote:
>>> Tony Mechelynck wrote:
>>>> Matt gave you a solution in several steps, but here's a single-step
>>>> one, taking advantage of both case-matching and case-insensitive
>>>> operators, of |sub-replace-expression| and of the ternary operator ?:
>>>> as in (condition ? result_if_true : result_if_false) |expr1| :
>>>>
>>>> command -nargs=0 -range=% -bar Xeo
>>>> \<line1>,<line2>s/\c[scujgh]x/\=(
>>>> \ submatch(0) ==# 'sx' ? 'ŝ' :
>>>> \ submatch(0) ==# 'cx' ? 'ĉ' :
>>>> \ submatch(0) ==# 'ux' ? 'ŭ' :
>>>> \ submatch(0) ==# 'jx' ? 'ĵ' :
>>>> \ submatch(0) ==# 'gx' ? 'ĝ' :
>>>> \ submatch(0) ==# 'hx' ? 'ĥ' :
>>>> \ submatch(0) ==? 'SX' ? 'Ŝ' :
>>>> \ submatch(0) ==? 'CX' ? 'Ĉ' :
>>>> \ submatch(0) ==? 'UX' ? 'Ŭ' :
>>>> \ submatch(0) ==? 'JX' ? 'Ĵ' :
>>>> \ submatch(0) ==? 'GX' ? 'Ĝ' : 'Ĥ' )/g
>>>>
>>>> For maximum efficiency, the most frequent cases should be tested
>>>> first, but the use of ==# and ==? to enable (for instance) both Cx and
>>>> CX for Ĉ but only cx for ĉ requires lowercase to come first. (This
>>>> will identify cX as Ĉ but I think it can be tolerated.)
>>>
>>> In Vim7+, I'd be tempted to tweak Tony's solution so it uses a
>>> literal/in-line dict for the conversions, something like (broken into
>>> multiple lines without the requisite "\" characters but could just as
>>> easily be one line):
>>>
>>> s/\c[scujgh]x/\=get({
>>> 'sx':'ŝ',
>>> 'cx':'ĉ',
>>> 'ux':'ŭ',
>>> 'jx':'ĵ',
>>> 'gx':'ĝ',
>>> 'hx':'ĥ',
>>> 'SX':'Ŝ',
>>> 'CX':'Ĉ',
>>> 'UX':'Ŭ',
>>> 'JX':'Ĵ',
>>> 'GX':'Ĝ',
>>> 'HX':'Ĥ'
>>> }, submatch(0), '??default??')/g
>>>
>>> (which should have the benefit of a linear lookup time, and is a lot
>>> less hassle to maintain, IMHO)
>>>
>>> You might have to do some case-folding with tolower()/toupper() on the
>>> submatch(0), or include additional entries for other case-combinations.
>>>
>>> -tim
>>>
>>>
>>>
>>
>> IIUC, the usual practice (beyond lowercase) is to use CX, GX, HX etc.
>> in all-caps titles, and Cx, Gx, Hx, etc. for the initial capital of
>> words (proper names, first-in-sentence, etc.) where the other letters
>> are in lowercase. (ŭ is extremely rare at the start of a word; it
>> mostly occurs after a vowel. ĥ is rather infrequent in any position.)
>>
>> For the default ("not found in table"), I'd just use submatch(0)
>> again, i.e., "don't change".
>>
>> Oh, and for logarithmic time my solution could use dichotomic
>> searching (taking advantage of the fact that both the result_if_true
>> and the result_if_false of a ?: construct can in turn be ?:
>> expressions, each of which can, etc.) but for such a small set of
>> possibilities I don't think there would be a very big performance
>> gain.
>
> I guess ĥ was once more commonly used but had been replaced by k or ĉ
> in common words such as ĥemio or ĥino. The occurrence frequency of ŭ
> should be a little bit more than that for ĥ.
ŭ after vowel is quite frequent: laŭ (according to), preskaŭ (almost),
hodiaŭ (today), eŭfonio (euphony), poŭpo (poop [of a ship]), etc. The
Academy has condoned the replacemenbt of -rĥ- by -rk- (Appendix 8 to the
list of official radicals); in other words usage may vary (e.g. ĥoro vs.
koruso for English "choir" [singing company]; for the choir of a church
[part of the building] I suppose ĥorejo or maybe korusejo would be used).
>
> Just in the danger of far too off-topic, how does esperanto handle
> foreign proper name for the accusative case. eg.
>
> Einstein estimas Bohnn
>
> how to tell who is the agent and who is the direct object?
>
Either Esperantize the whole word according to established forms or to
Rule 15 of the Fundamenta Gramatiko ("En Nederlando mi vizitis
Mastriĥton, Hagon, Roterdamon, Amsterdamon, sed ne Groningon") or add
-on (for a substantive) to the object ("Einstein estimas Bohn-on", if it
is about someone named Bohn), if there is a risk of ambiguity. If the
presence of an adjective, or of several names in apposition, avoids
ambiguity, then the un-Esperantized name may remain unchanged: Johanon
Sebastianon Bach; Bruegel la Maljunan.
Best regards,
Tony.
--
Eight Megabytes And Continually Swapping.
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
To unsubscribe from this group, send email to vim_use+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.
No comments:
Post a Comment