Skip to content Skip to sidebar Skip to footer

Extracting Specific Values For A Header In Different Lines Using Regex

I have text string which has multiple lines and each line has mix of characters/numbers and spaces etc. Here is how a couple lines look like: WEIGHT VOLUME

Solution 1:

You can use a regex to easily split the text into a list containing all the fields:

import re

a = "WEIGHT                         VOLUME                    CHARGEABLE                PACKAGES\n                                                                         398.000 KG                     4.999 M3                  833.500 KG                12 PLT\n                                                                                         MAWB                                    HAWB\n    / MH616 /                                                                                           8947806753                             ABC20018830\n"# Split on 4 (or more) whitespace (leaves the units with the numbers)
data = re.split(r'\s{4,}', a)
print(data)

['WEIGHT', 'VOLUME', 'CHARGEABLE', 'PACKAGES', '398.000 KG', '4.999 M3', '833.500 KG', '12 PLT', 'MAWB', 'HAWB', '/ MH616 /', '8947806753', 'ABC20018830\n']

Since the keys and values are mixed, there probably isn't an easy way to automatically determine which is which. However if they are always in the same position, you can pick them out manually, e.g.:

b = {
    # WEIGHT
    data[0]: data[4],
    # VOLUME
    data[1]: data[5]
}

Post a Comment for "Extracting Specific Values For A Header In Different Lines Using Regex"