Python Regex Extract Width X Depth X Height
I am trying to extract the physical dimensions of items from a column 'Description' in a df to create a new column with it. Dimensions usually appear in this format (120x80x100) in
Solution 1:
You can use the regex, \d+\s*x\s*\d+(?:\s*x\s*\d+)?
Explanation:
\d+
: One or more digits\s*
: Zero or more whitespace charactersx
: Literal,x
(?:\s*x\s*\d+)?
: Optional non-capturing group
If you want the numbers to be of one to three digits, replace \d+
with \d{1,3}
as shown in the regex, \d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?
.
If your code requires you to use a group, do it as follows:
(\d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?)
Solution 2:
We can try using a re.findall
approach with a regex pattern covering all possible dimension formats:
inp = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit 1. 20x80x100 ed do 120 x 80 x 100 eiusmod 120x80 tempor...'
dims = re.findall(r'\d+(?:\s*x\s*\d+){1,2}', inp)
print(dims) # ['120x80x100', '120 x 80 x 100', '120x80']
Solution 3:
Something like this should work:
\d+(\s?x\s?\d+){1,2}
Post a Comment for "Python Regex Extract Width X Depth X Height"