Read data from an XML file
You want to import a dataset stored in the XML file format by manually coding how to extract the relevant information
Solution: The XML (eXtensible Markup Language) was designed to transport and store data and XML has seen widespread use in interchanging data over the Internet.
An XML file consists of a series of elements which form a document tree. The tree starts at the root and branches to the lowest level of the tree. XML documents must contain a root node (or element) which is "the parent" of all other nodes, and all nodes can have their own sub nodes ("child elements").
XML package provides numerous tools for parsing
and generating XML in R. Since XML is such a flexible format, the
XML package primarily consists of functions that must
be combined to parse and extract information from a specific type of
xmlTreeParse function is the work-horse for
importing general XML documents.
xmlTreeParse parses an
XML file and stores the tree in an R structure. We subsequently
traverse the tree and extract data from the relevant
xmlTreeParse requires a file name or location as
input for where to find the XML file, and it returns an R XML object
with the parsed XML file. The
useInternalNodes option can
be set to TRUE to increase parsing speed.
xmlRoot should be called to get a pointer to the
top-level node or parent of the XML tree. The
can be set to
FALSE to prevent R from skipping over
document type definitions in the XML file if those are present. The
XML tree structure works like a recursive list-like object and the
individual nodes in the tree are accessed using named or numbered
[]. The XML tree can be traversed with the
proper indices and for each node we can get the parent and list of
children sub-nodes using the
xmlChildren functions, respectively.
Information can be extracted from a node using one of the
xmlName, xmlValue, xmlGetAttr and
functions, which return the node name, node contents, a named
attribute and all attributes, respectively.
See rule 1.3 in The R Primer for a worked example which also shows the use of XPath.
Back to tips.