How To Scrape A Website Which Redirects For Some Time
Solution 1:
It uses JavaScript to generate some value which is send to page https://koinex.in/cdn-cgi/l/chk_jschl
and get cookie cf_clearance
which is checked by page to skip doss page.
Code can generate value using different parameters and different methods in every requests so it can be easier to use Selenium to get data
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('https://koinex.in/')
time.sleep(8)
tables = driver.find_elements_by_tag_name('table')
for item in tables:
print(item.text)
#print(item.get_attribute("value"))
Result
VOLUME PRICE/ETH
5.231064,300.000.093064,100.0010.767064,025.010.084064,000.000.330063,800.000.280063,701.000.488063,700.000.706063,511.000.502063,501.000.101063,500.011.485063,500.001.000063,254.000.030063,253.00
VOLUME PRICE/ETH
1.000064,379.000.094064,380.000.971064,398.000.035064,399.000.717064,400.000.300064,479.005.165064,480.350.002064,495.000.200064,496.009.563064,500.000.400064,501.010.040064,550.000.522064,600.00DATE VOLUME PRICE/ETH
31/12/2017, 12:19:290.277064,300.0031/12/2017, 12:19:110.500064,300.0031/12/2017, 12:18:280.344064,025.0131/12/2017, 12:18:280.075064,026.0031/12/2017, 12:17:500.001064,300.0031/12/2017, 12:17:470.015064,300.0031/12/2017, 12:15:450.672064,385.0031/12/2017, 12:15:450.200064,300.0031/12/2017, 12:15:450.062064,300.0031/12/2017, 12:15:450.065064,199.9731/12/2017, 12:15:450.001064,190.0031/12/2017, 12:15:450.003064,190.0031/12/2017, 12:15:250.001064,190.00
You can also get HTML
from Selenium
and use with BeautifulSoup
soup = BeautifulSoup(driver.page_source)
but Selenium
can get data using xpath
, css selector
and other methods so mostly there is no need to use BeautifulSoup
See documentation: 4. Locating Elements
EDIT: this code uses cookies from Selenium
to load page with requests
and it has no problem with DDoS page.
Problem is that page uses JavaScript to display tables so you can't get them using requests
+BeautifulSoup
. But maybe you will find urls used by JavaScript to get data for tables and then requests
can be useful.
from selenium import webdriver
import time
# --- Selenium ---
url = 'https://koinex.in/'
driver = webdriver.Firefox()
driver.get(url)
time.sleep(8)
#tables = driver.find_elements_by_tag_name('table')#for item in tables:# print(item.text)# --- convert cookies/headers from Selenium to Requests ---
cookies = driver.get_cookies()
for item in cookies:
print('name:', item['name'])
print('value:', item['value'])
print('path:', item['path'])
print('domain:', item['domain'])
print('expiry:', item['expiry'])
print('secure:', item['secure'])
print('httpOnly:', item['httpOnly'])
print('----')
# convert list of dictionaries into dictionary
cookies = {c['name']: c['value'] for c in cookies}
# it has to be full `User-Agent` used in Browser/Selenium (it can't be short 'Mozilla/5.0')
headers = {'User-Agent': driver.execute_script('return navigator.userAgent')}
# --- requests + BeautifulSoup ---import requests
from bs4 import BeautifulSoup
s = requests.Session()
s.headers.update(headers)
s.cookies.update(cookies)
r = s.get(url)
print(r.text)
soup = BeautifulSoup(r.text, 'html.parser')
tables = soup.find_all('table')
print('tables:', len(tables))
for item in tables:
print(item.get_text())
Post a Comment for "How To Scrape A Website Which Redirects For Some Time"