Xml Files Not Being Parsed And Appended To List
Solution 1:
The cause of your problem is here:
path = os.listdir(directory)
for filename inpath:
tree = ET.parse(filename)
os.listdir()
returns a list of names, not full path. So ET.parse()
tries to open a file by that name in the current working directory, not in directory
.
You want:
filenames = os.listdir(directory)
for filename in filenames:
filepath = os.path.join(directory, filename)
tree = ET.parse(filepath)
Also, this:
try:
tree = ET.parse(filename)
root = tree.getroot()
doc_parser(root)
except:
print("ERROR ON FILE: {}".format(filename))
is the worst thing you could do. This will actually prevent you from knowing what went wrong and where, so you cannot debug your code at all.
Proper exception handling guidelines:
1/ NEVER EVER use a "bare" except clause, always specify the exact exception(s) you are expecting at this point. For a top-level "catch all" handler, at least restrict your except clause to Exception
, so you don't catch SystemExit
.
2/ Have the narrower possible try
block (have as few code as possible here). This is to make sure you know where the exception you are handling was effectively raised, so if two statements raises the same exception type for unrelated reasons, you only catch the one you expected.
3/ only catch exception you can actually and effectively handle at this point of the code. If you cannot handle the exception at this point, just let it propagate (or report it with additionnal informations and re-raise it).
4/ Never assume anything about what really happened. Use the exception message and the traceback when reporting the exception. The stdlib's logging
module makes it a breeze (well, once you've learned to properly configure your logger which can be a bit of a PITA xD).
Here what you want is something like:
try:
tree = ET.parse(filepath)
except ET.ParseError as e:
# using `logging.exception()` would be better,# but we don't really need the whole traceback here# as the error is specific enough and we already# know where it happensprint("{} is not valid XML: {}".format(filepath, e))
continue
root = tree.getroot()
doc_parser(root)
Post a Comment for "Xml Files Not Being Parsed And Appended To List"