Python web scrapping using Selenium - iterate through href link
up vote
0
down vote
favorite
I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single
Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.
Here is my current code:
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
links = driver.find_element_by_xpath('//a[@href]')
links.click()
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
select2 = Select(driver.find_element_by_type('submit'))
select2.select_by_value("submit")
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
CSVall.click()
driver.close()
I try to change different things, but I always get an error. Where is the problem ?
Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.
Thanks
python selenium web-scraping webdriverwait
add a comment |
up vote
0
down vote
favorite
I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single
Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.
Here is my current code:
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
links = driver.find_element_by_xpath('//a[@href]')
links.click()
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
select2 = Select(driver.find_element_by_type('submit'))
select2.select_by_value("submit")
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
CSVall.click()
driver.close()
I try to change different things, but I always get an error. Where is the problem ?
Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.
Thanks
python selenium web-scraping webdriverwait
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single
Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.
Here is my current code:
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
links = driver.find_element_by_xpath('//a[@href]')
links.click()
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
select2 = Select(driver.find_element_by_type('submit'))
select2.select_by_value("submit")
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
CSVall.click()
driver.close()
I try to change different things, but I always get an error. Where is the problem ?
Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.
Thanks
python selenium web-scraping webdriverwait
I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single
Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.
Here is my current code:
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
links = driver.find_element_by_xpath('//a[@href]')
links.click()
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
select2 = Select(driver.find_element_by_type('submit'))
select2.select_by_value("submit")
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
CSVall.click()
driver.close()
I try to change different things, but I always get an error. Where is the problem ?
Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.
Thanks
python selenium web-scraping webdriverwait
python selenium web-scraping webdriverwait
asked Nov 15 at 1:54
Jagr
289
289
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def getPlayerId(url):
id = url.split('playerid=')[1]
id = id.split('&')[0]
return id
def makeNewURL(playerId):
return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId
#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks =
for link in links:
newLinks.append(link.get_attribute('href'))
for link in newLinks:
playerId = getPlayerId(link)
link = makeNewURL(playerId)
driver.get(link)
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
Try the updated script
– QHarr
Nov 16 at 7:46
add a comment |
up vote
0
down vote
you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]
print(len(playerLinks))
for url in playerLinks:
driver.get(url)
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
driver.find_element_by_css_selector('input[type="submit"]').click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
driver.close()
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
it takes time to generateplayerLinks
, try addprint(playerLinks)
before loop
– ewwink
Nov 16 at 3:39
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
It is working now. I had an error indriver.get
; it was not there... Thanks !
– Jagr
Nov 16 at 4:59
|
show 3 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def getPlayerId(url):
id = url.split('playerid=')[1]
id = id.split('&')[0]
return id
def makeNewURL(playerId):
return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId
#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks =
for link in links:
newLinks.append(link.get_attribute('href'))
for link in newLinks:
playerId = getPlayerId(link)
link = makeNewURL(playerId)
driver.get(link)
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
Try the updated script
– QHarr
Nov 16 at 7:46
add a comment |
up vote
0
down vote
accepted
Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def getPlayerId(url):
id = url.split('playerid=')[1]
id = id.split('&')[0]
return id
def makeNewURL(playerId):
return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId
#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks =
for link in links:
newLinks.append(link.get_attribute('href'))
for link in newLinks:
playerId = getPlayerId(link)
link = makeNewURL(playerId)
driver.get(link)
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
Try the updated script
– QHarr
Nov 16 at 7:46
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def getPlayerId(url):
id = url.split('playerid=')[1]
id = id.split('&')[0]
return id
def makeNewURL(playerId):
return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId
#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks =
for link in links:
newLinks.append(link.get_attribute('href'))
for link in newLinks:
playerId = getPlayerId(link)
link = makeNewURL(playerId)
driver.get(link)
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def getPlayerId(url):
id = url.split('playerid=')[1]
id = id.split('&')[0]
return id
def makeNewURL(playerId):
return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId
#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks =
for link in links:
newLinks.append(link.get_attribute('href'))
for link in newLinks:
playerId = getPlayerId(link)
link = makeNewURL(playerId)
driver.get(link)
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
edited Nov 16 at 7:29
answered Nov 15 at 7:25
QHarr
28.2k81839
28.2k81839
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
Try the updated script
– QHarr
Nov 16 at 7:46
add a comment |
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
Try the updated script
– QHarr
Nov 16 at 7:46
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
– Jagr
Nov 16 at 3:28
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
– Jagr
Nov 16 at 3:50
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
I have updated the css selector and it now matches with the number of players on the page shown.
– QHarr
Nov 16 at 7:29
Try the updated script
– QHarr
Nov 16 at 7:46
Try the updated script
– QHarr
Nov 16 at 7:46
add a comment |
up vote
0
down vote
you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]
print(len(playerLinks))
for url in playerLinks:
driver.get(url)
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
driver.find_element_by_css_selector('input[type="submit"]').click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
driver.close()
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
it takes time to generateplayerLinks
, try addprint(playerLinks)
before loop
– ewwink
Nov 16 at 3:39
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
It is working now. I had an error indriver.get
; it was not there... Thanks !
– Jagr
Nov 16 at 4:59
|
show 3 more comments
up vote
0
down vote
you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]
print(len(playerLinks))
for url in playerLinks:
driver.get(url)
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
driver.find_element_by_css_selector('input[type="submit"]').click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
driver.close()
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
it takes time to generateplayerLinks
, try addprint(playerLinks)
before loop
– ewwink
Nov 16 at 3:39
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
It is working now. I had an error indriver.get
; it was not there... Thanks !
– Jagr
Nov 16 at 4:59
|
show 3 more comments
up vote
0
down vote
up vote
0
down vote
you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]
print(len(playerLinks))
for url in playerLinks:
driver.get(url)
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
driver.find_element_by_css_selector('input[type="submit"]').click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
driver.close()
you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]
print(len(playerLinks))
for url in playerLinks:
driver.get(url)
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
driver.find_element_by_css_selector('input[type="submit"]').click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
driver.close()
edited Nov 16 at 4:35
answered Nov 15 at 5:10
ewwink
8,40622234
8,40622234
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
it takes time to generateplayerLinks
, try addprint(playerLinks)
before loop
– ewwink
Nov 16 at 3:39
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
It is working now. I had an error indriver.get
; it was not there... Thanks !
– Jagr
Nov 16 at 4:59
|
show 3 more comments
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
it takes time to generateplayerLinks
, try addprint(playerLinks)
before loop
– ewwink
Nov 16 at 3:39
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
It is working now. I had an error indriver.get
; it was not there... Thanks !
– Jagr
Nov 16 at 4:59
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
It does not work. I think the problem is playerLinks, the list seems empty...
– Jagr
Nov 16 at 3:33
it takes time to generate
playerLinks
, try add print(playerLinks)
before loop– ewwink
Nov 16 at 3:39
it takes time to generate
playerLinks
, try add print(playerLinks)
before loop– ewwink
Nov 16 at 3:39
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
It prints [ ] ; an empty list.
– Jagr
Nov 16 at 3:57
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
strange, it give me 1217 player, edited code above see if it different than yours
– ewwink
Nov 16 at 4:36
It is working now. I had an error in
driver.get
; it was not there... Thanks !– Jagr
Nov 16 at 4:59
It is working now. I had an error in
driver.get
; it was not there... Thanks !– Jagr
Nov 16 at 4:59
|
show 3 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311334%2fpython-web-scrapping-using-selenium-iterate-through-href-link%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown