Wednesday, November 19, 2014

Re: :%s//\=@o/gce ignores c flag in key mapping

On 19.11.14 12:21, porphyry5 wrote:
> So if I read this hash table stuff correctly any data item that one
> wants to look up in an associative array generates its own address in
> memory, by using (a constant number of bytes of?) its value as a
> single binary number and modifying that with some arithmetic algorithm
> to produce an address in the available range. That is a really sweet
> concept, but it must be rather wasteful of memory, lots of unused
> spaces within its range.

In awk, at least (and probably most later adopters, such as perl), the
memory is malloc-ed, i.e. builds as needed, and array entries can be of
any size within available memory. The fact that an array need not be
dimensioned, that array entries are created when referenced, and can be
deleted, means that the implementation is not actually a simple array.

Hash buckets, in other areas I've encountered, have usually been
implemented as linked lists, but that is not cast in stone. If there is
only one element at the hashed address, as is needed in encryption
hashes, then there's no list there to walk. I have not examined the code
to check how the hash table is implemented.

> I use ultra-cheap, under-powered laptops (currently acer c720, new
> laptop for < $200) which has only 2gb memory, but for 200,000 odd
> words that's 10K available per word, lots of available waste space.

Associative arrays will certainly conserve memory. The cost of this is
that hashing a word takes one iteration of a small loop per character,
before using that to address the hash table. (of pointers to the array
elements) They are still more efficient than most other methods.

For examples on the use of them, googling for the "Effective AWK
Programming" manual should allow you to download a copy. The book
The AWK Programming Language" by the eponymous authors of the language
is published by Addison Wesley. Its concise presentation of illuminating
examples, and permuted index, facilitate rapid extraction of the
knowledge desired, without faffing around in overly elaborate verbosity.

Any good reference will warn that "if (word in array) ... " syntax is
needed for testing membership, since "if (array[word] != 0) ..." will
create the element before testing it. (Please don't ask me how I know
that.)

> I really appreciate you hammering home this point about associative
> arrays, I always assumed, their being so convenient to use, that they
> had to be a very performance-costly luxury. Who knew.

You are welcome. The language came out in 1977, and I've found it very
useful and quick to code, in three decades in IT. Its C-like syntax
makes it less of a write-only language than some others, I find.

It is though, interpreted, not compiled, so manually looping over long
lists is slow. Hashing should sort that out.

> But I will forthwith rewrite the awk program with associative arrays
> and compare the timing for the two methods. Thank you very much for
> the trouble you have gone to on this.

You are most welcome. All I did was point out the fruit hanging on the
tree. Seeing someone new productively making jam with it is thanks enough.

Erik

--
Unix isn't hard, it's just a lot. (Ascribed to one of its originators)

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: