Monday, March 7, 2011

Re: How to process 1000 files xml to 1 file?

On 8/03/11 6:04 AM, AMDx64BT wrote:
> -- DESCRIPTION -----------------------
>
> I would like to write a script with awk or vim to process Lab Blood Tests in xml
> format to import with SPSS.
>
> Each blood test is an xml file.
>
> If I have (for example):
> 1000 Lab Blood Tests (1000 xml files)
> 250 patients
> 4 blood tests/patient
>
> Each Blood Test file has the name format: rapport_33405954.xml
>
> Each blood test has a variable number of components but I am interested to analyze
> only 3 elements: K, Na and Ca. (These elements are not included in all the Blood
> Tests.)
>
>
> -- PATIENT -----------------------
>
> -- SOURCE:
>
> <Patient>
> <lbpa_Npa>1234</lbpa_Npa>
> <lbpa_Nai>02-Oct-1923 00:00:00</lbpa_Nai>
> <Entree>15-Oct-1582 01:00:00</Entree>
> <Pid>0</Pid>
> <Ncas>0</Ncas>
> <pre10 />
> <lbpa_Pre>Peter</lbpa_Pre>
> <lbpa_Num_Npat>1234567</lbpa_Num_Npat>
> <lbrq_Nom1 />
> <lbpa_Adr2>Paris</lbpa_Adr2>
> <lbrq_Nom2 />
> <lbpa_Sexe>M</lbpa_Sexe>
> <nom10 />
> <lbrq_Rid>0</lbrq_Rid>
> <Actif />
> <lbpa_Adr />
> <lbpa_Nom>Smith</lbpa_Nom>
> <Adm />
> </Patient>
>
> -- RESULT:
>
> (first_name second_name, date_born)
> lbpa_Nom lbpa_Pre, lbpa_Nai
> Smith Peter, 1923.10.02
>
>
> -- DATE TAKEN BLOOD -----------------------
>
> --SOURCE:
>
> <Demande>
> <Entree>15-Oct-1582 01:00:00</Entree>
> <lbde_Rid>12345</lbde_Rid>
> <lbde_Nlab>12345</lbde_Nlab>
> <Sortie>15-Oct-1582 01:00:00</Sortie>
> <NarunaFile />
> <Ncas>0</Ncas>
> <Etabl />
> <lbde_Num_Npat>12345</lbde_Num_Npat>
> <Naruna />
> <Date_Mod>01-Jan-1900 00:00:00</Date_Mod>
> <Taille>0</Taille>
> <lbde_pid>12345/111</lbde_pid>
> <TCollection>0</TCollection>
> <Semgr>0</Semgr>
> <lbrq_nom1 />
> <lbde_Dtprv>02-Mar-2011 06:00:00</lbde_Dtprv>
> <Pathologique>FALSE</Pathologique>
> <lbrq_nom2 />
> <Bacterio>FALSE</Bacterio>
> <Volume>0</Volume>
> <Type_www />
> <Poids>0</Poids>
> <lbde_Dtdem>02-Mar-2011 07:18:32</lbde_Dtdem>
> <PasVue>FALSE</PasVue>
> <par />
> <Domaine />
> </Demande>
>
> -- RESULT:
>
> (Date_taken_blood)
> lbde_Dtprv
> 2011.03.02
>
>
> -- ELEMENT -----------------------
>
> -- SOURCE:
>
> <Analyse>
> <OrdreImpression>12345</OrdreImpression>
> <CodeMateriel />
> <TypeLigne>0</TypeLigne>
> <Formulaire>21</Formulaire>
> <Norme>136 - 145 mmol/l</Norme>
> <Code>2039</Code>
> <Commentaire />
> <Anterieur />
> <TypeResultat>0</TypeResultat>
> <Resultat>136</Resultat>
> <Unite>mmol/l</Unite>
> <Remarque />
> <Clos>O</Clos>
> <Libelle>Sodium</Libelle>
> </Analyse>
>
> -- RESULT:
>
> (Element number)
> Libelle Resultat
> Sodium 136
>
>
> -- SORT ELEMENT BY DATE -----------------------
>
> Sodium 05.01.2011 --> Na1
> Sodium 08.01.2011 --> Na3
> Sodium 06.01.2011 --> Na2
>
>
> -- FINAL RESULT -----------------------
>
> From 1000 files I want to obtain a file with this format. To be able to import it
> with SPSS:
>
> Na1 Na2 K1 K2
> Smith Peter 19231002 136 133 4 3.5
> Gates Edward 19801204 145 166 3.1 3.4
>
> (In this case the date of Na1 of Smith and Gates, could be different, but the
> variable Na1 is the same)
>
> Any advice is appreciated

I personally would not attempt this kind of thing with Vim, but would
use a scripting language. My language of choice would be PHP, as that is
what I'm most familiar with, but many languages would suffice. Since
your XML files seem nicely structured, I would use PHP's SimpleXML,
rather than more complicated DOM functions. It would probably only take
a screenful or two of code.

You could do it with Vim, too, but it would be stretching Vimscript
quite a lot, and probably would involve a bunch of tricky hacks and not
be particularly readable or easy to modify/fix/extend.

So...Is using a scripting language an option? What platform are you on?
What do you have available/installed? Are you willing to investigate
installing something different to perform this task?

BTW, what's your name?

Ben.

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: