Rebbe Malachi: Re: :%s//\=@o/gce ignores c flag in key mapping

Thursday, November 20, 2014

Re: :%s//\=@o/gce ignores c flag in key mapping

On Wednesday, November 19, 2014 8:04:59 PM UTC-8, Erik Christiansen wrote:
> On 19.11.14 12:21, porphyry5 wrote:
> > So if I read this hash table stuff correctly any data item that one
> > wants to look up in an associative array generates its own address in
> > memory, by using (a constant number of bytes of?) its value as a
> > single binary number and modifying that with some arithmetic algorithm
> > to produce an address in the available range. That is a really sweet
> > concept, but it must be rather wasteful of memory, lots of unused
> > spaces within its range.
>
> In awk, at least (and probably most later adopters, such as perl), the
> memory is malloc-ed, i.e. builds as needed, and array entries can be of
> any size within available memory. The fact that an array need not be
> dimensioned, that array entries are created when referenced, and can be
> deleted, means that the implementation is not actually a simple array.
>
> Hash buckets, in other areas I've encountered, have usually been
> implemented as linked lists, but that is not cast in stone. If there is
> only one element at the hashed address, as is needed in encryption
> hashes, then there's no list there to walk. I have not examined the code
> to check how the hash table is implemented.
>
> > I use ultra-cheap, under-powered laptops (currently acer c720, new
> > laptop for < $200) which has only 2gb memory, but for 200,000 odd
> > words that's 10K available per word, lots of available waste space.
>
> Associative arrays will certainly conserve memory. The cost of this is
> that hashing a word takes one iteration of a small loop per character,
> before using that to address the hash table. (of pointers to the array
> elements) They are still more efficient than most other methods.
>
> For examples on the use of them, googling for the "Effective AWK
> Programming" manual should allow you to download a copy. The book
> The AWK Programming Language" by the eponymous authors of the language
> is published by Addison Wesley. Its concise presentation of illuminating
> examples, and permuted index, facilitate rapid extraction of the
> knowledge desired, without faffing around in overly elaborate verbosity.
>
> Any good reference will warn that "if (word in array) ... " syntax is
> needed for testing membership, since "if (array[word] != 0) ..." will
> create the element before testing it. (Please don't ask me how I know
> that.)
>
> > I really appreciate you hammering home this point about associative
> > arrays, I always assumed, their being so convenient to use, that they
> > had to be a very performance-costly luxury. Who knew.
>
> You are welcome. The language came out in 1977, and I've found it very
> useful and quick to code, in three decades in IT. Its C-like syntax
> makes it less of a write-only language than some others, I find.
>
> It is though, interpreted, not compiled, so manually looping over long
> lists is slow. Hashing should sort that out.
>
> > But I will forthwith rewrite the awk program with associative arrays
> > and compare the timing for the two methods. Thank you very much for
> > the trouble you have gone to on this.
>
> You are most welcome. All I did was point out the fruit hanging on the
> tree. Seeing someone new productively making jam with it is thanks enough.
>
> Erik
>
> --
> Unix isn't hard, it's just a lot. (Ascribed to one of its originators)

Annoyingly, I cannot see any clear way to use an associative array in this case, because of that pesky word suffix list. I believe 'if (word in wd-list)' must either return "no match" or "exact match". In the binary search I check for exact matches, on failure immediately followed by a partial match test, 'if (word ~ wd-list[index])' to indicate the need to append suffixes to the partial matching wd-list entry.

Pity really, but what I have can be lived with, it takes about 20 seconds to process a 700K text.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rebbe Malachi

Thursday, November 20, 2014

Re: :%s//\=@o/gce ignores c flag in key mapping

No comments:

Blog Archive

About Me