Download Japanese Podcast

2024-09-06 perl podcast Mojo::UserAgent Mojolicious japanese

I was looking for some podcasts to improve my Japanese language learning. Based on recommendations from Reddit, I found the Nihongo Con Teppei podcast. The idea is to listen to a native Japanese speaker discuss various topics, which helps in getting a feel for the language and eventually understanding it.

Since I often learn offline, I wanted to download the podcast episodes in .mp3 format. Note that the pages are currently static, as the author has moved to Spotify for delivering newer episodes. Therefore, I don’t need to worry too much about future changes. Upon analyzing the web page, I noticed there are numbered index.html pages like:

  • index-70.html - first batch of episodes
  • index-69.html - second batch
  • and so on

Each page contains links to a generic download page. The link includes a parameter u that points to the .mp3 file. Using this, it’s relatively straightforward to build a simple script based on the Mojolicious framework and its web client, Mojo::UserAgent.

use 5.16.3;
use Mojo::UserAgent;

my $ua  = Mojo::UserAgent->new;
my $page = 'http://teppeisensei.com/index-66.html';

my $dom = $ua->get($page)->result->dom;
my @download_urls = $dom->find('a')
    # find links with href attributes pointing to .mp3 files
    ->grep(sub { defined $_->attr('href') && $_->attr('href') =~ /\.mp3/ })     
    ->map(sub { Mojo::URL->new($_->attr('href')) })
    ->map(sub { Mojo::URL->new($_->query->param('u')) })
    ->each;

for my $url (@download_urls) {
    my $filename = $url->path->parts->[-1];
    next if -e $filename;

    warn "$url -> $filename\n";
    my $tx = $ua->get($url);
    $tx->result->save_to($filename);
}

Here are a few notes on the script above:

  • Mojolicious provides well-structured objects, making it easy to manipulate data. For example, to obtain the @download_urls variable, you can turn the user agent result into a DOM parser, find all links using Mojo::Collection, filter URLs containing .mp3, and extract the u parameter as a Mojo::URL object.
  • The download section extracts the filename from the URL, fetches the .mp3 file, and saves it locally.

I can highly recommend the podcast and this quick script allowed me to enjoy it offline.