Friday, October 23, 2015

Re: How to extract information from large file based on search pattern

Hi pra007!

On Do, 22 Okt 2015, pra007 wrote:

> I am using vim for windows
>
> I have large (800 mb plus) file containing following formate
>
> the file is space separated
>
> 8232394 06774483 N 19850910 19870818 19910818 EXP.
> 8309716 06774483 N 19850910 19870818 19910319 REM.
> 4687262 06908244 N 19860917 19870818 19990815 EXP.
> 4687262 06908244 N 19860917 19870818 19990309 REM.
> 4687262 06908244 N 19860917 19870818 19950221 M184
> 4687262 06908244 N 19860917 19870818 19910108 M173
> 4687262 06908244 N 19860917 19870818 19880802 ASPN
> 4687263 06868897 N 19860527 19870818 19990128 M185
> 4687263 06868897 N 19860527 19870818 19950509 RMPN
> 4687263 06868897 N 19860527 19870818 19950509 ASPN
> 4687263 06868897 N 19860527 19870818 19950119 M184
> 4687263 06868897 N 19860527 19870818 19910311 ASPN
> 4687263 06868897 N 19860527 19870818 19910124 M173
> 4687264 06882047 N 19860703 19870818 19990815 EXP.
> 4687264 06882047 N 19860703 19870818 19990309 REM.
> 4687264 06882047 N 19860703 19870818 19950503 RMPN
> 4687264 06882047 N 19860703 19870818 19950503 ASPN
> 4687264 06882047 N 19860703 19870818 19950119 M184
> 4687264 06882047 N 19860703 19870818 19910311 ASPN
> RE45781 14176526 N 20140210 20151027 20150929 ASPN
> RE45786 14260890 N 20140424 20151027 20150929 ASPN
> RE45790 14454285 Y 20140807 20151103 20151008 ASPN
> RE45793 13445791 N 20120412 20151103 20151006 ASPN
>
> I have another .txt file (small) containing following formate
> 4687264
> 4687264
> 4687264
> RE45781
> RE45786
> RE45790
> RE45793
>
> Now I want to extract lines from big file having match from the small file
> with respect to column 1 which will only contain lines which are presnet in
> small txt file
>
> The result file should look like this
>
> 4687264 06882047 N 19860703 19870818 19990815 EXP.
> 4687264 06882047 N 19860703 19870818 19990309 REM.
> 4687264 06882047 N 19860703 19870818 19950503 RMPN
> 4687264 06882047 N 19860703 19870818 19950503 ASPN
> 4687264 06882047 N 19860703 19870818 19950119 M184
> 4687264 06882047 N 19860703 19870818 19910311 ASPN
> RE45781 14176526 N 20140210 20151027 20150929 ASPN
> RE45786 14260890 N 20140424 20151027 20150929 ASPN
> RE45790 14454285 Y 20140807 20151103 20151008 ASPN
> RE45793 13445791 N 20120412 20151103 20151006 ASPN
>
> Is there any way?

If you have awk available it should be trivial and fast:
#v+
0 14908 chrisbra@debian /tmp % awk 'NR==FNR {a[$1]}
NR!=FNR && $1 in a' ids.txt large_file.txt
4687264 06882047 N 19860703 19870818 19990815 EXP.
4687264 06882047 N 19860703 19870818 19990309 REM.
4687264 06882047 N 19860703 19870818 19950503 RMPN
4687264 06882047 N 19860703 19870818 19950503 ASPN
4687264 06882047 N 19860703 19870818 19950119 M184
4687264 06882047 N 19860703 19870818 19910311 ASPN
RE45781 14176526 N 20140210 20151027 20150929 ASPN
RE45786 14260890 N 20140424 20151027 20150929 ASPN
RE45790 14454285 Y 20140807 20151103 20151008 ASPN
RE45793 13445791 N 20120412 20151103 20151006 ASPN
#v-

It can also be done with VimL, but this will most like be slower.

Something like this should do it:

1) Open your file with the ids:
:let ids=getline(1,'$')
:let @/='^'.join(ids, '\V\|')
:e logfile
:v//d

(Not this changes your logfile. So use 'u' to undo the modification')


Best,
Christian
--
Chef: "Wir brauchen auch eine SQL-Datenbank!"
Angestellter denkt: "Weiß er wovon er spricht oder hat er das nur
wieder irgendwo aufgeschnappt?"
sagt: "OK, welche Farbe soll sie denn haben?"
Chef: "Nun, ich denke Flieder hat das meiste RAM!"

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: