On Friday, November 21, 2014 1:31:20 AM UTC-8, Erik Christiansen wrote:
> On 20.11.14 11:54, porphyry5 wrote:
> > Annoyingly, I cannot see any clear way to use an associative array in
> > this case, because of that pesky word suffix list. I believe 'if
> > (word in wd-list)' must either return "no match" or "exact match". In
> > the binary search I check for exact matches, on failure immediately
> > followed by a partial match test, 'if (word ~ wd-list[index])' to
> > indicate the need to append suffixes to the partial matching wd-list
> > entry.
>
> Ah ... perhaps the easiest way is to detect recognisable suffixes on
> input words, and strip them for the initial match attempt, i.e. only the
> part of the word you want is hashed. A flag, or non-null
> "found_this_suffix" string variable, retained from the partial-match
> generating pre-stripping, then guides any additional actions, if
> required.
>
> To cover the case where a word with a recognisable suffix is in the list
> with suffix, rather than without, a check-on-match-failure for the
> unstripped word could be performed. It would only occur when a suffix is
> detected, and would in also be faster than a search.
>
> That essentially reverses the order of match vs suffix handling.
> Speed-wise, I'd expect the pre-strip to be quite a bit faster than the
> partial match, since the suffix list is unlikely to number 200,000
> entries.
>
> Erik
>
> --
> Britain had first obtained a commercial Enigma machine back in 1927, by simply
> purchasing one in the open in Germany. The machine was analysed and a diagnostic
> report written on how it worked. - http://www.bbc.co.uk/news/magazine-17486464
This is becoming a most interesting project. Checking further in my 2000 odd error file I began discovering certain words that the binary search should have found, but didn't. The cause being that the search did not necessarily hit the root word if it was followed by variants of that root.
This redefined the problem to: you have to find the root of the word of interest if you can't find the word itself. To ensure that meant use an associative array, which also allowed an associative array for suffixes.
Cuts out many lines of code, the entire identification process is now just
function bs() {
r=""
if (v in b) return r
j=1
n=length(v)
while (j<n) {
if (substr(v, 1, j) in b && substr(v, j+1) in s) return r
j++
}
r="@@"
return r
}
However I smell a rat, its so astonishingly fast, less than 2 seconds vs ~20 seconds for the previous version, and reports only 1050 total errors, against more than 3000 total before, though that did include good words reported as errors.
I can't see that it should make any difference, but I note you did recommend working from the back end and isolating the suffix first. That's very easily done, so I think I'll try it as an easy check on the reliability of the result. If it produces exactly the same errors that would be most encouraging.
Oh, happy day, it produces exactly the same output with
function bs() {
r=""
if (v in b) return r
n=1
j=length(v)
while (j>n) {
if (substr(v, j) in s && substr(v, 1, j-1) in b) return r
j--
}
r="@@"
return r
}
Now just so long as my enthusiasm for quick and easy answers isn't blinding me to some lurking gotcha...
--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Sunday, November 23, 2014
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment