Skip to content Skip to sidebar Skip to footer

Xml Files Not Being Parsed And Appended To List

I'm trying to parse all XML files in a given directory using python. I am able to parse one file at a time but that would be 'impossible' for me to do due to the roughly 19k differ

Solution 1:

The cause of your problem is here:

path = os.listdir(directory)
for filename inpath:
    tree = ET.parse(filename)

os.listdir() returns a list of names, not full path. So ET.parse() tries to open a file by that name in the current working directory, not in directory.

You want:

filenames = os.listdir(directory)
for filename in filenames:
    filepath = os.path.join(directory, filename) 
    tree = ET.parse(filepath)

Also, this:

    try:
        tree = ET.parse(filename)
        root = tree.getroot()
        doc_parser(root)
    except:
        print("ERROR ON FILE: {}".format(filename))

is the worst thing you could do. This will actually prevent you from knowing what went wrong and where, so you cannot debug your code at all.

Proper exception handling guidelines:

1/ NEVER EVER use a "bare" except clause, always specify the exact exception(s) you are expecting at this point. For a top-level "catch all" handler, at least restrict your except clause to Exception, so you don't catch SystemExit.

2/ Have the narrower possible try block (have as few code as possible here). This is to make sure you know where the exception you are handling was effectively raised, so if two statements raises the same exception type for unrelated reasons, you only catch the one you expected.

3/ only catch exception you can actually and effectively handle at this point of the code. If you cannot handle the exception at this point, just let it propagate (or report it with additionnal informations and re-raise it).

4/ Never assume anything about what really happened. Use the exception message and the traceback when reporting the exception. The stdlib's logging module makes it a breeze (well, once you've learned to properly configure your logger which can be a bit of a PITA xD).

Here what you want is something like:

try:
        tree = ET.parse(filepath)
    except ET.ParseError as e:
        # using `logging.exception()` would be better,# but we don't really need the whole traceback here# as the error is specific enough and we already# know where it happensprint("{} is not valid XML: {}".format(filepath, e))
        continue 

    root = tree.getroot()
    doc_parser(root)

Post a Comment for "Xml Files Not Being Parsed And Appended To List"