Parsing XML Nodes w/Same Tag

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Parsing XML Nodes w/Same Tag

Mark Talluto via use-livecode
My longest running in house production app is an audio transcriber. Very successful little gadget, running in xTalk since 2001

We have over 1,000 XML files from an audio archive of transcripts.

Now I'm digging in and getting the data out.

I'm not facile with xml routines but did my best with the help of Bernd new, actually useable, dictionary.

But ran into a bug  in 9 DP5  (I think… ) OR I am doing something wrong

given transcripts formatted with nodes like this:

<?xml version="1.0" encoding="UTF-8"?>
  <subject>Three Words of Existence</subject>
  <category>God and Lords of Dharma</category>
  <duration>18 min, 36 secs</duration>
  <given_location>San Francisco</given_location>
          Subtopic: three worlds: 0:3:56
          Subtopic: temple: 0:4:7
                             [Radio Announcer: Ravi Peruman introduces Gurudeva]
                             Gurudeva says ......
                             More content here
                             Subtopic: three worlds: 0:3:56
                             All about temple
                             Subtopic: temple: 0:4:7

My script looks like this

put revXMLChildContents(pTree, "/audio_transcript/header",tab,return,false,4) into fld "productionNotes"  # this works… I get all the contents
put revXMLNodeContents(pTree,"/audio_transcript/transcript_text/p") into tText # this works but we only get the first <p> content

# so I presume (like I said… parsing xml is new to me) we need to loop/iterate over the sibling <p> tags..
put revXMLNumberOfChildren(pTree,"/audio_transcript/transcript_text/","p",4) # return "6

# the following line should provide us what we need, I think, to set up a repeat loop  using the indexed node function
# and this is a) according to the dictionary b) and the script will compile:

put revXMLChildNames(pTree,"/audio_transcript/transcript_text/", return,"p",true)

I get a "green" OK in the script editor, but when I run it. we get this output, which is expected


and presumably I can use that list to now fetch the contents of all those nodes (haven't figured that out yet)

but the engine fires an error msg (even though the script compiled without complaining)  when we run it..

button "Load Transcript": execution error at line 22 (Handler: can't find handler) near "", char 89

it is breaking on the end of this line

put revXMLChildNames(pTree,"/audio_transcript/transcript_text/", return,"p",true)

even though the script compiles… isn't this a bug? If it a) is what the dictionary says it should be and b) compiles, why the error?

if not, what am I doing wrong?

The full button script is below… and you see my "fumbling" to fetch the content of all the "p" nodes. There seems to be some oddity relating to multiples nodes all having the same tag.

global theTape
on mouseUp
put theTape into tTranscript
set the itemdel to "."
put "xml" into item -1 of tTranscript
if there is a file tTranscript then
put url ("file:/" & tTranscript) into tTranscriptXML
answer "Sorry, there is no transcript in the same folder as the audio" with "OK"
exit to top
end if
put revXMLCreateTree(tTranscriptXML,false, true,true) into pTree
if pTree is not an integer then
answer "Problem with the XML. Open in a text editor" with "OK"
end if
put revXMLChildContents(pTree, "/audio_transcript/header",tab,return,false,4) into fld "productionNotes"
put revXMLNodeContents(pTree,"/audio_transcript/transcript_text/p") into tText
put revXMLNumberOfChildren(pTree,"/audio_transcript/transcript_text/","p",4)
put revXMLChildNames(pTree,"/audio_transcript/transcript_text/", return,"p",true)

#this script complies, but breaks on the above line when run
--put revXMLNextSibling(pTree,"/audio_transcript/transcript_text/p") into nextSibling
--put revXMLNodeContents(pTree,nextSibling) after tText # feeble attempt fails, need to do some loop but don't know how.
# no robust examples to follow, any help appreciated!
--put revXMLNodeContents(pTree, "audio_transcript/header/duration") into tTranscriptHTML # works for single node (of course)
--set the htmltext of fld "transcript" of stack "Audio_transcriber" to tTranscriptHTML

end mouseUp

use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences: