Friday, October 23, 2015

Re: How to extract information from large file based on search pattern

On 23 October 2015, Christian Brabandt <cblists@256bit.org> wrote:
> Hi pra007!
>
> On Do, 22 Okt 2015, pra007 wrote:
>
> > I am using vim for windows
> >
> > I have large (800 mb plus) file containing following formate
> >
> > the file is space separated
> >
> > 8232394 06774483 N 19850910 19870818 19910818 EXP.
> > 8309716 06774483 N 19850910 19870818 19910319 REM.
> > 4687262 06908244 N 19860917 19870818 19990815 EXP.
> > 4687262 06908244 N 19860917 19870818 19990309 REM.
> > 4687262 06908244 N 19860917 19870818 19950221 M184
> > 4687262 06908244 N 19860917 19870818 19910108 M173
> > 4687262 06908244 N 19860917 19870818 19880802 ASPN
> > 4687263 06868897 N 19860527 19870818 19990128 M185
> > 4687263 06868897 N 19860527 19870818 19950509 RMPN
> > 4687263 06868897 N 19860527 19870818 19950509 ASPN
> > 4687263 06868897 N 19860527 19870818 19950119 M184
> > 4687263 06868897 N 19860527 19870818 19910311 ASPN
> > 4687263 06868897 N 19860527 19870818 19910124 M173
> > 4687264 06882047 N 19860703 19870818 19990815 EXP.
> > 4687264 06882047 N 19860703 19870818 19990309 REM.
> > 4687264 06882047 N 19860703 19870818 19950503 RMPN
> > 4687264 06882047 N 19860703 19870818 19950503 ASPN
> > 4687264 06882047 N 19860703 19870818 19950119 M184
> > 4687264 06882047 N 19860703 19870818 19910311 ASPN
> > RE45781 14176526 N 20140210 20151027 20150929 ASPN
> > RE45786 14260890 N 20140424 20151027 20150929 ASPN
> > RE45790 14454285 Y 20140807 20151103 20151008 ASPN
> > RE45793 13445791 N 20120412 20151103 20151006 ASPN
> >
> > I have another .txt file (small) containing following formate
> > 4687264
> > 4687264
> > 4687264
> > RE45781
> > RE45786
> > RE45790
> > RE45793
> >
> > Now I want to extract lines from big file having match from the small file
> > with respect to column 1 which will only contain lines which are presnet in
> > small txt file
> >
> > The result file should look like this
> >
> > 4687264 06882047 N 19860703 19870818 19990815 EXP.
> > 4687264 06882047 N 19860703 19870818 19990309 REM.
> > 4687264 06882047 N 19860703 19870818 19950503 RMPN
> > 4687264 06882047 N 19860703 19870818 19950503 ASPN
> > 4687264 06882047 N 19860703 19870818 19950119 M184
> > 4687264 06882047 N 19860703 19870818 19910311 ASPN
> > RE45781 14176526 N 20140210 20151027 20150929 ASPN
> > RE45786 14260890 N 20140424 20151027 20150929 ASPN
> > RE45790 14454285 Y 20140807 20151103 20151008 ASPN
> > RE45793 13445791 N 20120412 20151103 20151006 ASPN
> >
> > Is there any way?

The easy way:

fgrep -wf small_file.txt big_file.txt

> If you have awk available it should be trivial and fast:
> #v+
> 0 14908 chrisbra@debian /tmp % awk 'NR==FNR {a[$1]}
[...]

This reads the big file in memory. Possibly not the best approach
with 800 MB of data...

/lcd

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: