Skip to content Skip to sidebar Skip to footer

Extract Dict From String

I'm calling a function that returns a string that contains a dict. How can I extract this dict keeping in mind that the first and last lines could contain '{' and '}'. This is a {t

Solution 1:

Updated Answer


Taking on board comments from @martineau and @ekhumoro, the following edited code contains a function which searches through the string and extracts valid all dicts. This is a more robust approach to my previous answer as the contents of the real-world dict may vary, and this logic (hopes to) account for that.

Sample Code:

import json
import re

def extract_dict(s) -> list:
    """Extract all valid dicts from a string.
    
    Args:
        s (str): A string possibly containing dicts.
    
    Returns:
        A list containing all valid dicts.
    
    """
    results = []
    s_ = ' '.join(s.split('\n')).strip()
    exp = re.compile(r'(\{.*?\})')
    for i in exp.findall(s_):
        try:
            results.append(json.loads(i))        
        except json.JSONDecodeError:
            pass    
    return results

Test String:

The OP's original string has been updated to add multiple dicts, a numeric value as a last field, and a list value.

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": 5
}
{"website": "stackoverflow",
"type": "question",
"date": "2020-09-11"
}
{"website": "stackoverflow",
"type": "question",
"dates": ["2020-09-11", "2020-09-12"]
}
This is a {testing string} example
This {is} a testing {string} example
"""

Output:

As the OP states, there is generally only one dict in the string, so this would (obviously) be accessed using results[0].

>>> results = extract_dict(s)

[{'website': 'stackoverflow', 'type': 'question', 'date': 5},
 {'website': 'stackoverflow', 'type': 'question', 'date': '2020-09-11'},
 {'website': 'stackoverflow', 'type': 'question', 'dates': ['2020-09-11', '2020-09-12']}]

Original Answer:


Ignore this section. Although the code works, it fits the OP's request specifically and is not robust for other uses.

This sample uses regex to identify the dict start {" and dict end "} and extracting the middle, then converting the string to a proper dict. As new lines get in the way and complicate the regex, I've just flattened the string to start.

Per a comment from @jizhihaoSAMA, I've updated to use json.loads to convert the string to a dict, as it's cleaner. If you don't want the additional import, eval will work as well, but not recommended.

Sample Code:

import json
import re

s = """
This is a {testing string} example
This {is} a testing {string} example
{"website": "stackoverflow",
"type": "question",
"date": "10-09-2020"
}
This is a {testing string} example
This {is} a testing {string} example
"""

s_ = ' '.join(s.split('\n')).strip()
d = json.loads(re.findall(r'(\{\".*\"\s?\})', s_)[0])

>>> d
>>> d['website']

Outputs:

{"website": "stackoverflow", "type": "question", "date": "10-09-2020"}

'stackoverflow'

Post a Comment for "Extract Dict From String"