My Script Parses All The Links Again And Again From A Infinite Scrolling Webpage
Solution 1:
I don't know python but I do know what you are doing wrong. Hopefully you'll be able to figure out the code for yourself ;)
Every time you scroll down 50 links are added to the page until there are 1000 links. Well almost... it starts with 20 links and then adds 30 and then 50 each time until there are 1000.
The way your code is now you are printing of:
The 1st 20 links.
The 1st 20 again + the next 30.
The 1st 50 + the next 50.
And so on...
What you actually want to do is just scroll down the page until you have all the links on the page and then print them. Hope that helps.
Here's the updated Python code (I've checked it and it works)
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('http://fortune.com/fortune500/list/')
whileTrue:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
listElements = driver.find_elements_by_xpath("//li[contains(concat(' ', @class, ' '), ' small-12 ')]//a")
print(len(listElements))
if (len(listElements) == 1000):
breakfor item in listElements:
print(item.get_attribute("href"))
driver.close()
If you want it to work a bit faster you could swap out the "time.sleep(5)" for Anderson's wait statement
Solution 2:
You can try below code:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException
my_links = []
whileTrue:
try:
current_length = len(my_links)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
wait(driver, 10).until(lambda: len(driver.find_elements_by_xpath("//li[contains(concat(' ', @class, ' '), ' small-12 ')]//a")) > current_length)
my_links.extend([a.get_attribute("href") for a in driver.find_elements_by_xpath("//li[contains(concat(' ', @class, ' '), ' small-12 ')]//a")])
except TimeoutException:
break
my_links = set(my_links)
This should allow you to scroll down and collect new links while it's possible. Finally with set()
you can leave only unique values
Post a Comment for "My Script Parses All The Links Again And Again From A Infinite Scrolling Webpage"