Python: Use CSV Reader With Single File Extracted From Tarfile
I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz file using Python's tarfile library. I have this: tarFile = tarfile.open(name=tarFileName,
Solution 1:
tarfile.extractfile
returns an io.BufferedReader
object, a bytes stream, and yet csv.reader
expects a text stream. You can use io.TextIOWrapper
to convert the bytes stream to a text stream instead:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
Solution 2:
You need to provide a file-like object to csv.reader
.
Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Alternatively a possible solution from here: Python3 working with csv files in tar files would be
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Here a io.StringIO
object is used to make csv.reader
happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.
Post a Comment for "Python: Use CSV Reader With Single File Extracted From Tarfile"