Summary of Subversion logs

2022-12-23 perl svn Subversion XML Moose XML::Twig

I needed to build a summary of our team’s Subversion (SVN) history to get better insight into last year’s activity.

Obtaining whole log is rather simple with SVN command-line interface

svn log repo_url -v --xml

yields whole history in handy XML format, something like this:

<?xml version="1.0" encoding="UTF-8"?>
<log>
<logentry
   revision="14445">
<author>Author 1</author>
<date>2022-12-13T14:52:57.563692Z</date>
<paths>
<path
   action="M"
   kind="file">/Tool/ReadMe.md</path>
</paths>
<msg>Added troubleshooting info</msg>
</logentry>
<logentry
   revision="14444">
<author>Author 2</author>
<date>2022-12-09T09:40:08.113558Z</date>
<paths>
<path
   action="M"
   kind="file">/Tool2/trunk/Support/ProductSettings.xml</path>
</paths>
<msg>Refs #3872. Prefixes changed</msg>
</logentry>
...
</log>

With this knowledge, let’s build classes for Revision and Log Entry with Moose:

package Revision;
use Moose;

has ['num', 'author', 'date', 'msg'] => (is => 'ro');
has entries => (
    is      => 'ro',
    traits  => ['Array'],
    isa     => 'ArrayRef[LogEntry]',
    default => sub { [] },
    handles => {
        list_entries => 'elements',
        add_entry    => 'push',
    },
);

__PACKAGE__->meta->make_immutable();

A little large declaration of entries property will initialize it to empty ArrayRef and defines two methods to add items into it and list everything from it. The isa portion also provides type constraint that would complain if the item placed into the ArrayRef is something other than LogEntry. Which in turn is very simple, just two properties:

package LogEntry;
use Moose;

has ['action', 'path'] => (is => 'ro');

__PACKAGE__->meta->make_immutable();

Then we need a class to load the log of one Subversion repository. It essentially encapsulates the piping svn log command above to parsing of the XML into Revision and LogEntry classes.

I am using XML::Twig’s chunk by chunk processing to handle large XML files without loading it completely into memory. It will invoke defined callback for each logentry element.

package Subversion;
use Moose;

use Method::Signatures::Simple;
use Storable;
use XML::Twig;
use Try::Tiny;

has url          => (is => 'ro', required => 1);
has force_update => (is => 'ro', default  => 0);
has silent       => (is => 'ro', default  => 0);

has revisions => (
    is      => 'ro',
    traits  => ['Array'],
    isa     => 'ArrayRef[Revision]',
    builder => '_build_revisions',
    lazy    => 1,
    handles => { list_revisions => 'elements' },
);

method name {
    return $1 if $self->url =~ m/([^\/]*)$/;
    return "Unknown";
}

method _build_revisions {
    my @revisions;
    my $url = $self->url;

    warn "Extracting log from ".$self->url." ...\n"
        unless $self->silent;
    open(my $in, "-|", "svn log ".$self->url." -v --xml")
        or die "Extraction failed";

    warn "Parsing revisions ...\n"
        unless $self->silent;
    XML::Twig->new(twig_handlers => {
        discard_spaces => 1,
        logentry       => sub {
            my ($t,$e) = @_;

            push @revisions, Revision->new(
                num => $_->att('revision'),
                (map { $_ => (try { $e->first_child($_)->text }) // '' }
                    qw(author date msg)),
                entries => [ map {
                        LogEntry->new(
                            action => $_->att('action'),
                            path   => $_->text
                        )
                    } $e->descendants('path')
                ],
            );

            $e->purge;
        },
    })->parse($in);

    return [ @revisions ];
}

__PACKAGE__->meta->make_immutable();

Usage is pretty simple:

use Subversion;

my $repo = Subversion->new(url => 'svn://svn-server/repo');
for my $rev ($repo->list_revisions) {
    printf "[%d] %s %s: %s\n", $rev->num, $rev->date, $rev->author, $rev->msg;
    for my $entry ($rev->list_entries) {
        print "  %s %s\n", $entry->action, $entry->path;
    }
}

In real use, I am filtering the data, placing them into monthly buckets and sending it into template system to produce nice HTML overview of our year activity.