Skip to content Skip to sidebar Skip to footer

Python Regex Extract Width X Depth X Height

I am trying to extract the physical dimensions of items from a column 'Description' in a df to create a new column with it. Dimensions usually appear in this format (120x80x100) in

Solution 1:

You can use the regex, \d+\s*x\s*\d+(?:\s*x\s*\d+)?

Explanation:

  • \d+: One or more digits
  • \s*: Zero or more whitespace characters
  • x: Literal, x
  • (?:\s*x\s*\d+)?: Optional non-capturing group

If you want the numbers to be of one to three digits, replace \d+ with \d{1,3} as shown in the regex, \d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?.

If your code requires you to use a group, do it as follows:

(\d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?)

Solution 2:

We can try using a re.findall approach with a regex pattern covering all possible dimension formats:

inp = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit 1. 20x80x100 ed do 120 x 80 x 100 eiusmod 120x80 tempor...'
dims = re.findall(r'\d+(?:\s*x\s*\d+){1,2}', inp)
print(dims)  # ['120x80x100', '120 x 80 x 100', '120x80']

Solution 3:

Something like this should work:

\d+(\s?x\s?\d+){1,2}

Post a Comment for "Python Regex Extract Width X Depth X Height"