Sunday, September 30, 2012

Re: Matching/Sorting line terminations

Thanks -- the lines are in English 400 years old, hence the eccentric spelling.

The sorting you suggest is just what produced the first list, the raw
file: it is mixed up with lines without repetitions, and those
repetitions that occur are not ordered: so that in the whole file you
might find two repetitions here and three there, with a six between
them, and odd single lines getting in the way. Over 33,000 lines, this
becomes impossible to manage by hand.

'his', 'her' etc are not related or linked: they are different terms
and do not count as repetitions.

In the first group (repeating two terms) there are three separate
groups of terminations: 'to abate', 'gan abate' and 'by might'.

In the next group (repeating three terms) I accidentally left out the
second line in each case -- again there are supposed to be three
different pairs. And in the six repeating terms group, I missed out
the first line of the group.

Sorry for the missing lines, and thanks for your comments,

Julian


On Sun, Sep 30, 2012 at 7:20 PM, Tony Mechelynck
<antoine.mechelynck@gmail.com> wrote:
> On 30/09/12 18:14, jbl wrote:
>>
>> Hi: The first difficulty with the problem I describe below is that I
>> don't know what the key terms would be to search Google accurately. I
>> have searched for a long time already. So if anyone could even tell me
>> what it is I am looking for I'd be very grateful.
>>
>> The problem is this: I have a large file of poetry in alphabetical
>> order sorted on the last term in each line, I post an except in
>> sample1 below. I want to sort it so that lines that share, say, the
>> last two terms (on the right) with the last two terms of any other
>> line are in one group, those lines that share the last three terms in
>> another and so on up to seven places -- as in sample2 below.
>>
>> The first difficulty I have is getting the search terms into an :ex
>> command -- I need to find for each line whether there are any others
>> that match it to seven terminal places, then to six and so on. I could
>> do the simple locating with something cumbersome like this:
>>
>> map ö $BB2yW: p0ig/ A$/m0 map ä $ByW: p0ig/ A$/m0
>>
>> and so on up to seven places. But it must be possible to generalize
>> that somehow. What would be the general form of an expression for
>> finding the last 'x' words of a line in the same position somewhere
>> else in the file?
>>
>> Apart from the crudeness of the operation, the trouble would be
>> exporting (redirecting?) the results automatically and keeping the
>> exported results in order (as in sample2 below). And also how to
>> iterate it usefully through the whole file.
>>
>> If I started at the top of the raw file, iterating something like
>> these commands, checking each line and exporting the results to a
>> single file, the resulting file would be identical to the original
>> file. I need, I think, to be able to eliminate those lines which do
>> not share any terminations with any other lines. I think starting
>> (somehow) with seven places then six and down to two, would leave me
>> the non-sharing lines by themselves in the original file(?).
>>
>> But I'm not even sure what the strategic logic should be: exactly what
>> tasks should I be trying to get the program to perform? The process
>> needs to be automated because the file is 33,000 lines long. As I say,
>> if someone could tell me what key terms, what types of operations, I
>> should be looking for on Google, it would help a great deal.
>>
>> Many thanks for any help, JBL
>> Vim 7.x Debian/Win7
>>
>> Here are the samples, one before (from the raw file) and one after (as
>> I'd like the whole thing organized).
>>
>> Raw Lines
>> 6.4.30.7 All these our ioyes and all our blisse abate
>> 2.12.15.9 And after them did driue with all her power and might
>> 3.9.14.4 And both full liefe his boasting to abate
>> 6.6.27.9 And layd at him amaine with all his will and might
>> 6.1.38.2 At once did heaue with all their powre and might
>> 6.1.12.7 But through misfortune which did me abase
>> 5.11.57.9 Did set vpon those troupes with all his powre and might
>> 6.2.26.5 For deare affection and vnfayned zeale
>> 3.2.13.6 For hardy thing it is to weene by might
>> 4.9.6.9 He her vnwares attacht and captiue held by might
>> 6.1.32.9 He spide come pricking on with al his powre and might
>> 6.6.31.9 He stayd his second strooke and did his hand abase
>> 3.8.51.6 Mote not mislike you also to abate
>> 3.8.28.7 Ne ought your burning fury mote abate
>> 1.7.35.1 No magicke arts hereof had any might
>> 5.8.46.8 She at her ran with all her force and might
>> 1.10.2.8 She cast to bring him where he chearen might
>> 3.7.35.3 That at the last his fiercenesse gan abate
>> 4.8.17.8 That her inburning wrath she gan abate
>> 1.10.47.7 That hill they scale with all their powre and might
>> 4.6.3.4 The armes he bore his speare he gan abase
>> 5.9.39.4 To all assayes; his name was called Zele
>> 2.9.7.4 To serue that Queene with all my powre and might
>> 2.1.26.7 When suddenly that warriour gan abace
>> 6.12.23.9 Where he him found despoyling all with maine and might
>> 1.5.1.8 With greatest honour he atchieuen might
>> 4.8.1.7 With sufferaunce soft which rigour can abate
>> 5.5.30.1 With that she turn'd her head as halfe abashed
>>
>> Sorted lines
>> ---Lines not repeating final term (=Unique lines):
>> FQ 2.1.26.7 When suddenly that warriour gan abace
>> FQ 5.5.30.1 With that she turn'd her head as halfe abashed
>> FQ 6.2.26.5 For deare affection and vnfayned zeale
>> FQ 5.9.39.4 To all assayes; his name was called Zele
>>
>> ---Lines repeating final term only:
>> FQ 6.1.12.7 But through misfortune which did me abase
>> FQ 6.6.31.9 He stayd his second strooke and did his hand abase
>> FQ 4.6.3.4 The armes he bore his speare he gan abase
>> FQ 3.8.28.7 Ne ought your burning fury mote abate
>> FQ 4.8.1.7 With sufferaunce soft which rigour can abate
>> FQ 6.4.30.7 All these our ioyes and all our blisse abate
>> FQ 1.7.35.1 No magicke arts hereof had any might
>> FQ 1.10.2.8 She cast to bring him where he chearen might
>> FQ 1.5.1.8 With greatest honour he atchieuen might
>> FQ 6.6.27.9 And layd at him amaine with all his will and might
>
> abase == abate == might? I guess I'm too stupid.
>
>>
>> ---Lines repeating final two terms:
>> FQ 3.9.14.4 And both full liefe his boasting to abate
>> FQ 3.8.51.6 Mote not mislike you also to abate
>> FQ 4.8.17.8 That her inburning wrath she gan abate
>> FQ 3.7.35.3 That at the last his fiercenesse gan abate
>> FQ 3.2.13.6 For hardy thing it is to weene by might
>> FQ 4.9.6.9 He her vnwares attacht and captiue held by might
>>
>> ---Lines repeating final three terms:
>> FQ 5.8.46.8 She at her ran with all her force and might
>> FQ 6.12.23.9 Where he him found despoyling all with maine and might
>> FQ 2.9.7.4 To serue that Queene with all my powre and might
>
> force == maine == powre (sic) ? You will have to explain me that
>
>
>> ........
>>
>> ---Lines repeating final six terms:
>> FQ 2.12.15.9 And after them did driue with all her power and might
>> FQ 5.11.57.9 Did set vpon those troupes with all his powre and might
>> FQ 6.1.32.9 He spide come pricking on with all his powre and might
>> FQ 1.10.47.7 That hill they scale with all their powre and might
>> FQ 6.1.38.2 At once did heaue with all their powre and might
>
> I suppose "powre" is four times a typo.
> her == his == their? Or are there three different sets of lines, one of them
> a singleton?
>
>>
>
> This sounds like a "decorate - sort - undecorate" problem:
> 1. Put each line into "sortable" order (in this case, reverse the order of
> the terms, so that the last term comes at the start of the line, then one
> space, then the last but one, then one space, etc.);
> 2. Sort
> 3. Put the lines back like they used to be (i.e., reverse the order of the
> terms again).
>
> Note that no "dumb" computer will be able to find out that "his", "her" and
> "their" are to be sorted together, unless you somehow program it into the
> logic of your steps 1 and 3.
>
>
> Best regards,
> Tony.
> --
> "I'd love to go out with you, but I'm taking punk totem pole carving."
>
> --
> You received this message from the "vim_use" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php



--
J.B. Lethbridge
(Gen. Ed. The Manchester Spenser)
English Seminar
Tuebingen University
WIlhelmstrasse 50
Tuebingen
72074 Germany

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: