How To Get Offset Of A Matched An N-gram In Text
I would like to match a string ( n-gram) in a text, with a way to get offsets with it : string_to_match = 'many workers are very underpaid' text = 'The new york times claimed in
Solution 1:
You can re.finditer()
and call span()
method on the matched object to get the beginning and the ending indices of the matched substring-
def m():
string_to_match = "many workers are very underpaid"
text = "The new york times claimed in a report that many workers are very underpaid in some africans countries."
m = re.finditer(r'%s'%(string_to_match),text)
for x in m:
print x.group(0), x.span() # x.span() will return the beginning and the ending indices of the matched substring as a tuple
Post a Comment for "How To Get Offset Of A Matched An N-gram In Text"