I googled for yacc xml grammar. This url has a yacc grammar for download:
http://www.w3.org/XML/9707/XML-in-C
I downloaded the tar - it has 3 files: scanner.l, parser.y, main.c
There is no make file - but compilation is simple:
> yacc -d parser.y
> flex scanner.l
> gcc y.tab.c lex.yy.c main.c
I created "input" an file
<Patient>
<lbpa_Npa>1234</lbpa_Npa>
<lbpa_Nai>02-Oct-1923 00:00:00</lbpa_Nai>
<Entree>15-Oct-1582 01:00:00</Entree>
<Pid>0</Pid>
<Ncas>0</Ncas>
<pre10 />
<lbpa_Pre>Peter</lbpa_Pre>
<lbpa_Num_Npat>1234567</lbpa_Num_Npat>
<lbrq_Nom1 />
<lbpa_Adr2>Paris</lbpa_Adr2>
<lbrq_Nom2 />
<lbpa_Sexe>M</lbpa_Sexe>
<nom10 />
<lbrq_Rid>0</lbrq_Rid>
<Actif />
<lbpa_Adr />
<lbpa_Nom>Smith</lbpa_Nom>
<Adm />
</Patient>
then did ./a.out < input
and the parser split it into tokens.
<Patient>
<lbpa_Npa>1234</lbpa_Npa>
<lbpa_Nai>02-Oct-1923 00:00:00</lbpa_Nai>
<Entree>15-Oct-1582 01:00:00</Entree>
<Pid>0</Pid>
<Ncas>0</Ncas>
<pre10 />
<lbpa_Pre>Peter</lbpa_Pre>
<lbpa_Num_Npat>1234567</lbpa_Num_Npat>
<lbrq_Nom1 />
<lbpa_Adr2>Paris</lbpa_Adr2>
<lbrq_Nom2 />
<lbpa_Sexe>M</lbpa_Sexe>
<nom10 />
<lbrq_Rid>0</lbrq_Rid>
<Actif />
<lbpa_Adr />
<lbpa_Nom>Smith</lbpa_Nom>
<Adm />
</Patient>
<Demande>
<Entree>15-Oct-1582 01:00:00</Entree>
<lbde_Rid>12345</lbde_Rid>
<lbde_Nlab>12345</lbde_Nlab>
<Sortie>15-Oct-1582 01:00:00</Sortie>
<NarunaFile />
<Ncas>0</Ncas>
<Etabl />
<lbde_Num_Npat>12345</lbde_Num_Npat>
<Naruna />
<Date_Mod>01-Jan-1900 00:00:00</Date_Mod>
<Taille>0</Taille>
<lbde_pid>12345/111</lbde_pid>
<TCollection>0</TCollection>
<Semgr>0</Semgr>
<lbrq_nom1 />
<lbde_Dtprv>02-Mar-2011 06:00:00</lbde_Dtprv>
<Pathologique>FALSE</Pathologique>
<lbrq_nom2 />
<Bacterio>FALSE</Bacterio>
<Volume>0</Volume>
<Type_www />
<Poids>0</Poids>
<lbde_Dtdem>02-Mar-2011 07:18:32</lbde_Dtdem>
<PasVue>FALSE</PasVue>
<par />
<Domaine />
</Demande>
So maybe you need to put the "Demande"/ "Analyse" data into separate files (using a vim macro perhaps?) and associate it with the original (like a join on some key (patient id?) ):
For example rapport_33405954_Patient.xml, rapport_33405954_Demande.xml, rapport_33405954_Analyse.xml and call the yacc parser 3ice on each file and modify the parser to generate a data file with SPSS labels.
Code skeleton ...
int patient_key=0;
for (string s = xml files in dir /* but dont match against _Patient, _Demande, _Analyse xml files - just the originals*/ ) {
string s_patient = s + String("_Patient.xml"); // not exactly since your file name is "rapport_33405954.xml"
//you have to insert "_Patient" before the .xml
yyin = fopen (s_patient.c_str(), "rb");
yyrestartt(yyin);
yyparse(); // in parser dump required data to some safe place using patient key
// at this point patient_key has been set by modified parser
fclose(yyin);
string s_demande = s + String("_Demande.xml");
yyin = fopen (s_demande.c_str(), "rb");
yyrestartt(yyin);
yyparse(); // in parser dump required data to safe place using patient key
fclose(yyin);
string s_analyse = s + String("_Analyse.xml");
yyin = fopen (s_analyse.c_str(), "rb");
yyrestartt(yyin);
yyparse(); // in parser dump required data to safe place using patient key
fclose(yyin);
// you have everything that you need in safe place - output spss labels
// output spss data
}
> Each blood test has a variable number of components
Bests,
Neil
-- DESCRIPTION -----------------------
I would like to write a script with awk or vim to process Lab Blood Tests in xml format to import with SPSS.
Each blood test is an xml file.
If I have (for example):
1000 Lab Blood Tests (1000 xml files)
250 patients
4 blood tests/patientEach Blood Test file has the name format: rapport_33405954.xml
Each blood test has a variable number of components but I am interested to analyze only 3 elements: K, Na and Ca. (These elements are not included in all the Blood Tests.)
-- PATIENT ------------------------- SOURCE:
<Patient>
<lbpa_Npa>1234</lbpa_Npa>
<lbpa_Nai>02-Oct-1923 00:00:00</lbpa_Nai>
<Entree>15-Oct-1582 01:00:00</Entree>
<Pid>0</Pid>
<Ncas>0</Ncas>
<pre10 />
<lbpa_Pre>Peter</lbpa_Pre>
<lbpa_Num_Npat>1234567</lbpa_Num_Npat>
<lbrq_Nom1 />
<lbpa_Adr2>Paris</lbpa_Adr2>
<lbrq_Nom2 />
<lbpa_Sexe>M</lbpa_Sexe>
<nom10 />
<lbrq_Rid>0</lbrq_Rid>
<Actif />
<lbpa_Adr />
<lbpa_Nom>Smith</lbpa_Nom>
<Adm />
</Patient>-- RESULT:
(first_name second_name, date_born)
lbpa_Nom lbpa_Pre, lbpa_Nai
Smith Peter, 1923.10.02
-- DATE TAKEN BLOOD -------------------------SOURCE:
<Demande>
<Entree>15-Oct-1582 01:00:00</Entree>
<lbde_Rid>12345</lbde_Rid>
<lbde_Nlab>12345</lbde_Nlab>
<Sortie>15-Oct-1582 01:00:00</Sortie>
<NarunaFile />
<Ncas>0</Ncas>
<Etabl />
<lbde_Num_Npat>12345</lbde_Num_Npat>
<Naruna />
<Date_Mod>01-Jan-1900 00:00:00</Date_Mod>
<Taille>0</Taille>
<lbde_pid>12345/111</lbde_pid>
<TCollection>0</TCollection>
<Semgr>0</Semgr>
<lbrq_nom1 />
<lbde_Dtprv>02-Mar-2011 06:00:00</lbde_Dtprv>
<Pathologique>FALSE</Pathologique>
<lbrq_nom2 />
<Bacterio>FALSE</Bacterio>
<Volume>0</Volume>
<Type_www />
<Poids>0</Poids>
<lbde_Dtdem>02-Mar-2011 07:18:32</lbde_Dtdem>
<PasVue>FALSE</PasVue>
<par />
<Domaine />
</Demande>-- RESULT:
(Date_taken_blood)
lbde_Dtprv
2011.03.02
-- ELEMENT ------------------------- SOURCE:
<Analyse>
<OrdreImpression>12345</OrdreImpression>
<CodeMateriel />
<TypeLigne>0</TypeLigne>
<Formulaire>21</Formulaire>
<Norme>136 - 145 mmol/l</Norme>
<Code>2039</Code>
<Commentaire />
<Anterieur />
<TypeResultat>0</TypeResultat>
<Resultat>136</Resultat>
<Unite>mmol/l</Unite>
<Remarque />
<Clos>O</Clos>
<Libelle>Sodium</Libelle>
</Analyse>-- RESULT:
(Element number)
Libelle Resultat
Sodium 136
-- SORT ELEMENT BY DATE -----------------------Sodium 05.01.2011 --> Na1
Sodium 08.01.2011 --> Na3
Sodium 06.01.2011 --> Na2
-- FINAL RESULT -----------------------From 1000 files I want to obtain a file with this format. To be able to import it with SPSS:
Na1 Na2 K1 K2
Smith Peter 19231002 136 133 4 3.5
Gates Edward 19801204 145 166 3.1 3.4(In this case the date of Na1 of Smith and Gates, could be different, but the variable Na1 is the same)
--
Any advice is appreciated
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
No comments:
Post a Comment