Extracting Specific Values For A Header In Different Lines Using Regex
I have text string which has multiple lines and each line has mix of characters/numbers and spaces etc. Here is how a couple lines look like: WEIGHT VOLUME
Solution 1:
You can use a regex to easily split the text into a list containing all the fields:
import re
a = "WEIGHT VOLUME CHARGEABLE PACKAGES\n 398.000 KG 4.999 M3 833.500 KG 12 PLT\n MAWB HAWB\n / MH616 / 8947806753 ABC20018830\n"# Split on 4 (or more) whitespace (leaves the units with the numbers)
data = re.split(r'\s{4,}', a)
print(data)
['WEIGHT', 'VOLUME', 'CHARGEABLE', 'PACKAGES', '398.000 KG', '4.999 M3', '833.500 KG', '12 PLT', 'MAWB', 'HAWB', '/ MH616 /', '8947806753', 'ABC20018830\n']
Since the keys and values are mixed, there probably isn't an easy way to automatically determine which is which. However if they are always in the same position, you can pick them out manually, e.g.:
b = {
# WEIGHT
data[0]: data[4],
# VOLUME
data[1]: data[5]
}
Post a Comment for "Extracting Specific Values For A Header In Different Lines Using Regex"