Python web scrapping using Selenium - iterate through href link











up vote
0
down vote

favorite












I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single



Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.



Here is my current code:



from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)

driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
links = driver.find_element_by_xpath('//a[@href]')
links.click()
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
select2 = Select(driver.find_element_by_type('submit'))
select2.select_by_value("submit")
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
CSVall.click()
driver.close()


I try to change different things, but I always get an error. Where is the problem ?



Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.



Thanks










share|improve this question


























    up vote
    0
    down vote

    favorite












    I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single



    Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.



    Here is my current code:



    from selenium import webdriver
    import csv
    from selenium.webdriver.support.ui import Select
    from datetime import date, timedelta
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC

    chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
    driver = webdriver.Chrome(chromedriver)

    driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
    table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
    for row in table.find_elements_by_xpath("//tr[@role='row']")
    links = driver.find_element_by_xpath('//a[@href]')
    links.click()
    select = Select(driver.find_element_by_name('rate'))
    select.select_by_value("y")
    select1 = Select(driver.find_element_by_name('v'))
    select1.select_by_value("g")
    select2 = Select(driver.find_element_by_type('submit'))
    select2.select_by_value("submit")
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
    CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
    CSVall.click()
    driver.close()


    I try to change different things, but I always get an error. Where is the problem ?



    Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.



    Thanks










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single



      Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.



      Here is my current code:



      from selenium import webdriver
      import csv
      from selenium.webdriver.support.ui import Select
      from datetime import date, timedelta
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC

      chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
      driver = webdriver.Chrome(chromedriver)

      driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
      table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
      for row in table.find_elements_by_xpath("//tr[@role='row']")
      links = driver.find_element_by_xpath('//a[@href]')
      links.click()
      select = Select(driver.find_element_by_name('rate'))
      select.select_by_value("y")
      select1 = Select(driver.find_element_by_name('v'))
      select1.select_by_value("g")
      select2 = Select(driver.find_element_by_type('submit'))
      select2.select_by_value("submit")
      WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
      CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
      CSVall.click()
      driver.close()


      I try to change different things, but I always get an error. Where is the problem ?



      Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.



      Thanks










      share|improve this question













      I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single



      Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.



      Here is my current code:



      from selenium import webdriver
      import csv
      from selenium.webdriver.support.ui import Select
      from datetime import date, timedelta
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC

      chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
      driver = webdriver.Chrome(chromedriver)

      driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
      table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
      for row in table.find_elements_by_xpath("//tr[@role='row']")
      links = driver.find_element_by_xpath('//a[@href]')
      links.click()
      select = Select(driver.find_element_by_name('rate'))
      select.select_by_value("y")
      select1 = Select(driver.find_element_by_name('v'))
      select1.select_by_value("g")
      select2 = Select(driver.find_element_by_type('submit'))
      select2.select_by_value("submit")
      WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
      CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
      CSVall.click()
      driver.close()


      I try to change different things, but I always get an error. Where is the problem ?



      Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.



      Thanks







      python selenium web-scraping webdriverwait






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 15 at 1:54









      Jagr

      289




      289
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.



          from selenium import webdriver
          from selenium.webdriver.support.ui import Select
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          def getPlayerId(url):
          id = url.split('playerid=')[1]
          id = id.split('&')[0]
          return id

          def makeNewURL(playerId):
          return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId

          #chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome()

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
          newLinks =

          for link in links:
          newLinks.append(link.get_attribute('href'))

          for link in newLinks:
          playerId = getPlayerId(link)
          link = makeNewURL(playerId)
          driver.get(link)
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()





          share|improve this answer























          • The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
            – Jagr
            Nov 16 at 3:28










          • I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
            – Jagr
            Nov 16 at 3:50












          • I have updated the css selector and it now matches with the number of players on the page shown.
            – QHarr
            Nov 16 at 7:29










          • Try the updated script
            – QHarr
            Nov 16 at 7:46


















          up vote
          0
          down vote













          you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below



          from selenium import webdriver
          import csv
          from selenium.webdriver.support.ui import Select
          from datetime import date, timedelta
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome(chromedriver)

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
          playerLinks = [p.get_attribute('href') for p in playerLinks]

          print(len(playerLinks))

          for url in playerLinks:
          driver.get(url)
          select = Select(driver.find_element_by_name('rate'))
          select.select_by_value("y")
          select1 = Select(driver.find_element_by_name('v'))
          select1.select_by_value("g")
          driver.find_element_by_css_selector('input[type="submit"]').click()
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()

          driver.close()





          share|improve this answer























          • It does not work. I think the problem is playerLinks, the list seems empty...
            – Jagr
            Nov 16 at 3:33










          • it takes time to generate playerLinks, try add print(playerLinks) before loop
            – ewwink
            Nov 16 at 3:39










          • It prints [ ] ; an empty list.
            – Jagr
            Nov 16 at 3:57










          • strange, it give me 1217 player, edited code above see if it different than yours
            – ewwink
            Nov 16 at 4:36










          • It is working now. I had an error in driver.get; it was not there... Thanks !
            – Jagr
            Nov 16 at 4:59











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311334%2fpython-web-scrapping-using-selenium-iterate-through-href-link%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote



          accepted










          Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.



          from selenium import webdriver
          from selenium.webdriver.support.ui import Select
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          def getPlayerId(url):
          id = url.split('playerid=')[1]
          id = id.split('&')[0]
          return id

          def makeNewURL(playerId):
          return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId

          #chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome()

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
          newLinks =

          for link in links:
          newLinks.append(link.get_attribute('href'))

          for link in newLinks:
          playerId = getPlayerId(link)
          link = makeNewURL(playerId)
          driver.get(link)
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()





          share|improve this answer























          • The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
            – Jagr
            Nov 16 at 3:28










          • I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
            – Jagr
            Nov 16 at 3:50












          • I have updated the css selector and it now matches with the number of players on the page shown.
            – QHarr
            Nov 16 at 7:29










          • Try the updated script
            – QHarr
            Nov 16 at 7:46















          up vote
          0
          down vote



          accepted










          Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.



          from selenium import webdriver
          from selenium.webdriver.support.ui import Select
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          def getPlayerId(url):
          id = url.split('playerid=')[1]
          id = id.split('&')[0]
          return id

          def makeNewURL(playerId):
          return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId

          #chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome()

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
          newLinks =

          for link in links:
          newLinks.append(link.get_attribute('href'))

          for link in newLinks:
          playerId = getPlayerId(link)
          link = makeNewURL(playerId)
          driver.get(link)
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()





          share|improve this answer























          • The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
            – Jagr
            Nov 16 at 3:28










          • I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
            – Jagr
            Nov 16 at 3:50












          • I have updated the css selector and it now matches with the number of players on the page shown.
            – QHarr
            Nov 16 at 7:29










          • Try the updated script
            – QHarr
            Nov 16 at 7:46













          up vote
          0
          down vote



          accepted







          up vote
          0
          down vote



          accepted






          Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.



          from selenium import webdriver
          from selenium.webdriver.support.ui import Select
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          def getPlayerId(url):
          id = url.split('playerid=')[1]
          id = id.split('&')[0]
          return id

          def makeNewURL(playerId):
          return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId

          #chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome()

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
          newLinks =

          for link in links:
          newLinks.append(link.get_attribute('href'))

          for link in newLinks:
          playerId = getPlayerId(link)
          link = makeNewURL(playerId)
          driver.get(link)
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()





          share|improve this answer














          Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.



          from selenium import webdriver
          from selenium.webdriver.support.ui import Select
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          def getPlayerId(url):
          id = url.split('playerid=')[1]
          id = id.split('&')[0]
          return id

          def makeNewURL(playerId):
          return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId

          #chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome()

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
          newLinks =

          for link in links:
          newLinks.append(link.get_attribute('href'))

          for link in newLinks:
          playerId = getPlayerId(link)
          link = makeNewURL(playerId)
          driver.get(link)
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 16 at 7:29

























          answered Nov 15 at 7:25









          QHarr

          28.2k81839




          28.2k81839












          • The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
            – Jagr
            Nov 16 at 3:28










          • I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
            – Jagr
            Nov 16 at 3:50












          • I have updated the css selector and it now matches with the number of players on the page shown.
            – QHarr
            Nov 16 at 7:29










          • Try the updated script
            – QHarr
            Nov 16 at 7:46


















          • The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
            – Jagr
            Nov 16 at 3:28










          • I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
            – Jagr
            Nov 16 at 3:50












          • I have updated the css selector and it now matches with the number of players on the page shown.
            – QHarr
            Nov 16 at 7:29










          • Try the updated script
            – QHarr
            Nov 16 at 7:46
















          The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
          – Jagr
          Nov 16 at 3:28




          The loop never stops. I add driver.close(), but it did not stop it either. It continues unless I stop it manually.
          – Jagr
          Nov 16 at 3:28












          I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
          – Jagr
          Nov 16 at 3:50






          I tried to change the date in both url for 2017 to 2019 such as [naturalstattrick.com/… for [naturalstattrick.com/… but some players are missing. I get around 600 players on the 961.
          – Jagr
          Nov 16 at 3:50














          I have updated the css selector and it now matches with the number of players on the page shown.
          – QHarr
          Nov 16 at 7:29




          I have updated the css selector and it now matches with the number of players on the page shown.
          – QHarr
          Nov 16 at 7:29












          Try the updated script
          – QHarr
          Nov 16 at 7:46




          Try the updated script
          – QHarr
          Nov 16 at 7:46












          up vote
          0
          down vote













          you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below



          from selenium import webdriver
          import csv
          from selenium.webdriver.support.ui import Select
          from datetime import date, timedelta
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome(chromedriver)

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
          playerLinks = [p.get_attribute('href') for p in playerLinks]

          print(len(playerLinks))

          for url in playerLinks:
          driver.get(url)
          select = Select(driver.find_element_by_name('rate'))
          select.select_by_value("y")
          select1 = Select(driver.find_element_by_name('v'))
          select1.select_by_value("g")
          driver.find_element_by_css_selector('input[type="submit"]').click()
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()

          driver.close()





          share|improve this answer























          • It does not work. I think the problem is playerLinks, the list seems empty...
            – Jagr
            Nov 16 at 3:33










          • it takes time to generate playerLinks, try add print(playerLinks) before loop
            – ewwink
            Nov 16 at 3:39










          • It prints [ ] ; an empty list.
            – Jagr
            Nov 16 at 3:57










          • strange, it give me 1217 player, edited code above see if it different than yours
            – ewwink
            Nov 16 at 4:36










          • It is working now. I had an error in driver.get; it was not there... Thanks !
            – Jagr
            Nov 16 at 4:59















          up vote
          0
          down vote













          you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below



          from selenium import webdriver
          import csv
          from selenium.webdriver.support.ui import Select
          from datetime import date, timedelta
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome(chromedriver)

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
          playerLinks = [p.get_attribute('href') for p in playerLinks]

          print(len(playerLinks))

          for url in playerLinks:
          driver.get(url)
          select = Select(driver.find_element_by_name('rate'))
          select.select_by_value("y")
          select1 = Select(driver.find_element_by_name('v'))
          select1.select_by_value("g")
          driver.find_element_by_css_selector('input[type="submit"]').click()
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()

          driver.close()





          share|improve this answer























          • It does not work. I think the problem is playerLinks, the list seems empty...
            – Jagr
            Nov 16 at 3:33










          • it takes time to generate playerLinks, try add print(playerLinks) before loop
            – ewwink
            Nov 16 at 3:39










          • It prints [ ] ; an empty list.
            – Jagr
            Nov 16 at 3:57










          • strange, it give me 1217 player, edited code above see if it different than yours
            – ewwink
            Nov 16 at 4:36










          • It is working now. I had an error in driver.get; it was not there... Thanks !
            – Jagr
            Nov 16 at 4:59













          up vote
          0
          down vote










          up vote
          0
          down vote









          you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below



          from selenium import webdriver
          import csv
          from selenium.webdriver.support.ui import Select
          from datetime import date, timedelta
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome(chromedriver)

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
          playerLinks = [p.get_attribute('href') for p in playerLinks]

          print(len(playerLinks))

          for url in playerLinks:
          driver.get(url)
          select = Select(driver.find_element_by_name('rate'))
          select.select_by_value("y")
          select1 = Select(driver.find_element_by_name('v'))
          select1.select_by_value("g")
          driver.find_element_by_css_selector('input[type="submit"]').click()
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()

          driver.close()





          share|improve this answer














          you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below



          from selenium import webdriver
          import csv
          from selenium.webdriver.support.ui import Select
          from datetime import date, timedelta
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.support import expected_conditions as EC

          chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
          driver = webdriver.Chrome(chromedriver)

          driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

          playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
          playerLinks = [p.get_attribute('href') for p in playerLinks]

          print(len(playerLinks))

          for url in playerLinks:
          driver.get(url)
          select = Select(driver.find_element_by_name('rate'))
          select.select_by_value("y")
          select1 = Select(driver.find_element_by_name('v'))
          select1.select_by_value("g")
          driver.find_element_by_css_selector('input[type="submit"]').click()
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
          CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
          CSVall.click()

          driver.close()






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 16 at 4:35

























          answered Nov 15 at 5:10









          ewwink

          8,40622234




          8,40622234












          • It does not work. I think the problem is playerLinks, the list seems empty...
            – Jagr
            Nov 16 at 3:33










          • it takes time to generate playerLinks, try add print(playerLinks) before loop
            – ewwink
            Nov 16 at 3:39










          • It prints [ ] ; an empty list.
            – Jagr
            Nov 16 at 3:57










          • strange, it give me 1217 player, edited code above see if it different than yours
            – ewwink
            Nov 16 at 4:36










          • It is working now. I had an error in driver.get; it was not there... Thanks !
            – Jagr
            Nov 16 at 4:59


















          • It does not work. I think the problem is playerLinks, the list seems empty...
            – Jagr
            Nov 16 at 3:33










          • it takes time to generate playerLinks, try add print(playerLinks) before loop
            – ewwink
            Nov 16 at 3:39










          • It prints [ ] ; an empty list.
            – Jagr
            Nov 16 at 3:57










          • strange, it give me 1217 player, edited code above see if it different than yours
            – ewwink
            Nov 16 at 4:36










          • It is working now. I had an error in driver.get; it was not there... Thanks !
            – Jagr
            Nov 16 at 4:59
















          It does not work. I think the problem is playerLinks, the list seems empty...
          – Jagr
          Nov 16 at 3:33




          It does not work. I think the problem is playerLinks, the list seems empty...
          – Jagr
          Nov 16 at 3:33












          it takes time to generate playerLinks, try add print(playerLinks) before loop
          – ewwink
          Nov 16 at 3:39




          it takes time to generate playerLinks, try add print(playerLinks) before loop
          – ewwink
          Nov 16 at 3:39












          It prints [ ] ; an empty list.
          – Jagr
          Nov 16 at 3:57




          It prints [ ] ; an empty list.
          – Jagr
          Nov 16 at 3:57












          strange, it give me 1217 player, edited code above see if it different than yours
          – ewwink
          Nov 16 at 4:36




          strange, it give me 1217 player, edited code above see if it different than yours
          – ewwink
          Nov 16 at 4:36












          It is working now. I had an error in driver.get; it was not there... Thanks !
          – Jagr
          Nov 16 at 4:59




          It is working now. I had an error in driver.get; it was not there... Thanks !
          – Jagr
          Nov 16 at 4:59


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311334%2fpython-web-scrapping-using-selenium-iterate-through-href-link%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

          ComboBox Display Member on multiple fields

          Is it possible to collect Nectar points via Trainline?