Format xml with XML::LibXML
2022-12-09 perl XML::LibXML format XPathHere are few notes for formatting XML files using XML::LibXML. Simple example for the different formatting styles using toString serializing function
use v5.16;
use XML::LibXML;
my $data = '<tool><name>TPP1</name><version><major>1</major><minor>0</minor><revision>0</revision></version><installer>TPP1_setup.exe</installer><support_files/></tool>';
my $xml = XML::LibXML->new->load_xml(string => $data);
say $xml->toString(0);
If format is 0
the document is dumped as it was originally parsed
<?xml version="1.0"?>
<tool><name>TPP1</name><version><major>1</major><minor>0</minor><revision>0</revision></version><installer>TPP1_setup.exe</installer><support_files/></tool>
If format is 1
, libxml2 will add ignorable white spaces, so the nodes content is easier to read. Existing text nodes will not be altered
say $xml->toString(1);
Generates
<?xml version="1.0"?>
<tool>
<name>TPP1</name>
<version>
<major>1</major>
<minor>0</minor>
<revision>0</revision>
</version>
<installer>TPP1_setup.exe</installer>
<support_files/>
</tool>
But this still does not provide optimal results if the XML contains some whitespace in the nodes (if it is not significant, we would like to have it removed). When the whitespace is only between the nodes, it helps to specify no_blanks
parser option to load_xml
method.
use v5.16;
use XML::LibXML;
my $data = <<END_XML;
<nested_nodes>
<nested_node>
<configuration>A</configuration>
<model>45</model>
<added_node>
<ID>
<type>D</type>
<serial>3</serial>
<kVal>3</kVal>
</ID>
</added_node>
</nested_node>
</nested_nodes>
END_XML
say XML::LibXML->load_xml(string => $data)->toString(1);
Prints
<?xml version="1.0"?>
<nested_nodes>
<nested_node>
<configuration>A</configuration>
<model>45</model>
<added_node>
<ID>
<type>D</type>
<serial>3</serial>
<kVal>3</kVal>
</ID>
</added_node>
</nested_node>
</nested_nodes>
Where with loading like this
say XML::LibXML->load_xml(string => $data, { no_blanks => 1 })->toString(1);
Prints
<?xml version="1.0"?>
<nested_nodes>
<nested_node>
<configuration>A</configuration>
<model>45</model>
<added_node>
<ID>
<type>D</type>
<serial>3</serial>
<kVal>3</kVal>
</ID>
</added_node>
</nested_node>
</nested_nodes>
Last instance when the whitespace is combined with actual values and it is not significant when adjacent to tags, it is possible to trim it like this
use v5.16;
use XML::LibXML;
use Text::Trim qw(trim);
my $root = XML::LibXML->load_xml(location => 'input.xml', { no_blanks => 1 })->documentElement;
for my $node ($root->findnodes('//text()')) {
$node->setData(trim($node->getValue()));
}
say $root->toString(1);
That’s about it for the formatting. I like to keep my XMLs clean.