How To Condense Script When Scraping Different Locations For One Element
I have 2 working scripts that do their job. I want to combine them for efficiency and reduce redundancy. I am using Python 3.7, Beautifulsoup 4.7.1, re, and requests. script 1 sear
Solution 1:
You could combine the two into one (as most is using same code). Simply make the field names the same across both. :contains
will still match on shortened field name of Best Sellers Rank
, and then use css Or syntax to handle tr
versus li
import requests
from bs4 import BeautifulSoup as bs
import re
links = ['https://www.amazon.com/dp/B00FSCBQV2','https://www.amazon.com/dp/B00Q2XLI0U']
map_dict = {'Product Dimensions': 'dimensions', 'Shipping Weight': 'weight', 'Item model number': 'Item_No', 'Best Sellers Rank': ['R1_NO','R1_CAT']}
p = re.compile(r'#([0-9][0-9,]*)+[\n\s]+in[\n\s]+([A-Za-z&\s]+)')
with requests.Session() as s:
for link in links:
r = s.get(link, headers = {'User-Agent': 'Mozilla\5.0'})
soup = bs(r.content, 'lxml')
fields = ['Product Dimensions', 'Shipping Weight', 'Item model number', 'Best Sellers Rank']
final_dict = {}
for field in fields:
element = soup.select_one('li:contains("' + field + '"), tr:contains("' + field + '")')
if element is None:
if field == 'Best Sellers Rank':
item = dict(zip(map_dict[field], ['N/A','N/A']))
final_dict = {**final_dict, **item}
else:
final_dict[map_dict[field]] = 'N/A'
else:
if field == 'Best Sellers Rank':
text = element.text
i = 1
for x,y in p.findall(text):
prefix = 'R' + str(i) + '_'
final_dict[prefix + 'NO'] = x
final_dict[prefix + 'CAT'] = y.strip()
i+=1
else:
item = [string for string in element.stripped_strings][1]
final_dict[map_dict[field]] = item.replace('(', '').strip()
print(final_dict)
Post a Comment for "How To Condense Script When Scraping Different Locations For One Element"