Skip to content Skip to sidebar Skip to footer

Python: Use CSV Reader With Single File Extracted From Tarfile

I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz file using Python's tarfile library. I have this: tarFile = tarfile.open(name=tarFileName,

Solution 1:

tarfile.extractfile returns an io.BufferedReader object, a bytes stream, and yet csv.reader expects a text stream. You can use io.TextIOWrapper to convert the bytes stream to a text stream instead:

import io

...

reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))

Solution 2:

You need to provide a file-like object to csv.reader.

Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):

import csv
import io
import tarfile

tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
    csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")

    reader = csv.reader(csv_file)
    next(reader)    # skip header
    for row in reader:
        print(row)

Alternatively a possible solution from here: Python3 working with csv files in tar files would be


import csv
import io
import tarfile

tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
    csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))

    reader = csv.reader(csv_file)
    next(reader)    # skip header
    for row in reader:
        print(row)

Here a io.StringIO object is used to make csv.reader happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.


Post a Comment for "Python: Use CSV Reader With Single File Extracted From Tarfile"