Skip to content Skip to sidebar Skip to footer

How To Condense Script When Scraping Different Locations For One Element

I have 2 working scripts that do their job. I want to combine them for efficiency and reduce redundancy. I am using Python 3.7, Beautifulsoup 4.7.1, re, and requests. script 1 sear

Solution 1:

You could combine the two into one (as most is using same code). Simply make the field names the same across both. :contains will still match on shortened field name of Best Sellers Rank, and then use css Or syntax to handle tr versus li

import requests
from bs4 import BeautifulSoup as bs
import re

links = ['','']
map_dict = {'Product Dimensions': 'dimensions', 'Shipping Weight': 'weight', 'Item model number': 'Item_No', 'Best Sellers Rank': ['R1_NO','R1_CAT']}

p = re.compile(r'#([0-9][0-9,]*)+[\n\s]+in[\n\s]+([A-Za-z&\s]+)')

with requests.Session() as s:
    for link in links:
        r = s.get(link, headers = {'User-Agent': 'Mozilla\5.0'})
        soup = bs(r.content, 'lxml')
        fields = ['Product Dimensions', 'Shipping Weight', 'Item model number', 'Best Sellers Rank']
        final_dict = {}

        for field in fields:
            element = soup.select_one('li:contains("' + field + '"), tr:contains("' + field + '")')
            if element is None:
                if field == 'Best Sellers Rank':
                    item = dict(zip(map_dict[field], ['N/A','N/A']))
                    final_dict = {**final_dict, **item}
                    final_dict[map_dict[field]] = 'N/A'
                if field == 'Best Sellers Rank':      
                    text = element.text
                    i = 1
                    for x,y in p.findall(text):
                        prefix = 'R' + str(i) + '_'
                        final_dict[prefix + 'NO'] = x  
                        final_dict[prefix + 'CAT'] = y.strip()
                    item = [string for string in element.stripped_strings][1]
                    final_dict[map_dict[field]] = item.replace('(', '').strip()

Post a Comment for "How To Condense Script When Scraping Different Locations For One Element"