Skip to content Skip to sidebar Skip to footer

Converting A Web Scrape Into Excel?

UPDATE: I tried to install pandas module on Pycharm and got an error? (Indexerror: list index out of range). Pandas error message I also tried to install in command prompt window

Solution 1:

import requests
from bs4 import BeautifulSoup
import pandas as pd


r = requests.get('https://cumberlink.com/sports/high-school/football/pa-football-writers-all-state-team-class-a-a-and/article_4d286757-a501-5b5b-b3be-cfebc06ef455.html')
soup = BeautifulSoup(r.content, 'html.parser')

new = []
for item in soup.findAll('div', {"class": "subscriber-only"}):
    if '-' in item.text:
        data = [s.strip() for s in item.text.replace('–', ',').split(',')]
        data[-1:] = data[-1].split()
        new.append(data)


df = pd.DataFrame(new, columns=['Name', 'School', 'Height', 'Weight', 'Class'])
df['Year'] = '2018'
df.to_excel('output.xlsx')

For online view: Check


Post a Comment for "Converting A Web Scrape Into Excel?"