Skip to content Skip to sidebar Skip to footer

How To Get Offset Of A Matched An N-gram In Text

I would like to match a string ( n-gram) in a text, with a way to get offsets with it : string_to_match = 'many workers are very underpaid' text = 'The new york times claimed in

Solution 1:

You can re.finditer() and call span() method on the matched object to get the beginning and the ending indices of the matched substring-

def m():
    string_to_match = "many workers are very underpaid"
    text = "The new york times claimed in a report that many workers are very underpaid in some africans countries."
    m = re.finditer(r'%s'%(string_to_match),text)
    for x in m:
        print x.group(0), x.span()     # x.span() will return the beginning and the ending indices of the matched substring as a tuple

Post a Comment for "How To Get Offset Of A Matched An N-gram In Text"