How To Get Offset Of A Matched An N-gram In Text
I would like to match a string ( n-gram) in a text, with a way to get offsets with it : string_to_match = 'many workers are very underpaid'   text = 'The new york times claimed in
Solution 1:
You can re.finditer() and call span() method on the matched object to get the beginning and the ending indices of the matched substring-
def m():
    string_to_match = "many workers are very underpaid"
    text = "The new york times claimed in a report that many workers are very underpaid in some africans countries."
    m = re.finditer(r'%s'%(string_to_match),text)
    for x in m:
        print x.group(0), x.span()     # x.span() will return the beginning and the ending indices of the matched substring as a tuple
Post a Comment for "How To Get Offset Of A Matched An N-gram In Text"