fileinput

2023-04-21 python fileinput

There is a little helper in python to iterate number of files - fileinput. By default it takes files specified on command-line or stdin. Basic usage is something like this

import fileinput

for line in fileinput.input():
    print(fileinput.filename() + "[" + str(fileinput.filelineno()) + "]:" + line.rstrip())

When run as

python t1.py a.txt b.txt

It prints out each line of those files as sequence, each line preceded by current filename and line number

a.txt[1]:Lorem ipsum dolor sit amet consectetur adipisicing elit. Aliquid nam nobis itaque ab voluptatem consequuntur at commodi
a.txt[2]:atque consectetur. Sapiente sit, expedita doloribus non dolores sed? Et assumenda aspernatur delectus.
a.txt[3]:Placeat delectus dolore exercitationem, vero adipisci officia excepturi cumque beatae fugiat enim odit eveniet nesciunt
a.txt[4]:numquam aliquam iusto perspiciatis sit at tenetur eos provident facilis quisquam? Ducimus earum totam et.
b.txt[1]:Adipisicing elit. Corrupti dolorem, repellendus natus impedit iure nulla
b.txt[2]:molestias. Quod vel ratione earum? Nesciunt corrupti, veritatis reiciendis suscipit nam enim inventore quaerat deleniti?
b.txt[3]:Magni dolores minima vel voluptas non dolore harum eveniet quia maxime illum velit voluptatibus, quo nam, deleniti iure

The module can be also very useful for quickly loading the file into list comprehension and into other structure

import fileinput

sizes = dict([line.strip().split("\t") for line in fileinput.input()])
print(sizes)

Code like this loads file with tab-separated names and sizes into a dictionary. Consider the data

Filename Size
args.py 670
czech-stemmer.pl 1734
firefox.pl 838
iterator.pl 323
porter.pl 7311
re-debug.pl 727
repeated-regex.pl 155
section-numbering.tt 1053
skyline.pl 3327

The sizes dictionary will look like this

{'porter.pl': '7311', 'firefox.pl': '838', 're-debug.pl': '727', 
'czech-stemmer.pl': '1734', 'iterator.pl': '323', 'args.py': '670', 
'section-numbering.tt': '1053', 'skyline.pl': '3327', 'repeated-regex.pl': '155'}

Building on previous post it can be also combined with argparse

import fileinput
import argparse

cli = argparse.ArgumentParser(prog = 'generator', description = 'Builds summary spreadsheet')
cli.add_argument('--safety',   help = 'safety requirements file. Default: %(default)s', default = 'safety.xls')
cli.add_argument('-v', '--verbose', help = 'verbose output', action='store_true')
cli.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used')

args = cli.parse_args()

for line in fileinput.input(files=args.files):
    print(fileinput.filename() + "[" + str(fileinput.filelineno()) + "]:" + line.rstrip())