Python - Extract Text From String
Solution 1:
import re
str = '[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]'print'Target IDs = ' + ','.join( re.findall(r'ID=(\d+)', str) )
print'Target ScreenNames = ' + ','.join( re.findall(r' ScreenName=(\w+)', str) )
Output : Target IDs = 1234567890,233323490,4459284 Target ScreenNames = RandomNameHere,AnotherRandomName,YetAnotherName
Solution 2:
It depends. Assuming that all your text comes in the form of
TagName = TagValue1, TagValue2, ...
You need just two calls to split.
tag, value_string = string.split('=')
values = value_string.split(',')
Remove the excess space (probably a couple of rstrip()
/lstrip()
calls will suffice) and you are done. Or you can take regex. They are slightly more powerful, but in this case I think it's a matter of personal taste.
If you want more complex syntax with nonterminals, terminals and all that, you'll need lex/yacc, which will require some background in parsers. A rather interesting thing to play with, but not something you'll want to use for storing program options and such.
Solution 3:
The regex I'd use would be:
(?:ID=|ScreenName=)+(\d+|[\w\d]+)
However, this assumes that ID is only digits (\d
) and usernames are only letters or numbers ([\w\d]
).
This regex (when combined with re.findall
) would return a list of matches that could be iterated through and sorted in some fashion like so:
import re
s = "[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]"
pattern = re.compile(r'(?:ID=|ScreenName=)+(\d+|[\w\d]+)');
ids = []
names = []
for p in re.findall(pattern, s):
if p.isnumeric():
ids.append(p)
else:
names.append(p)
print(ids, names)
Post a Comment for "Python - Extract Text From String"