Trying to scrape from multiple links
The goal of this script is to visit a website, generate links to the specific products, and then scrape from those generated links.
In this script, the script will gain links from the four featured products seen on the product homepage through the attribute defined. The links are saved in a variable 'links', which contain four urls to the four featured products.
Then, I will use requests to request each of those urls of the products to scrape the data using BeautifulSoup.
Here is my code:
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
url = "https://www.vatainc.com/"
service = service.Service('/Users/Name/Downloads/chromedriver.exe')
service.start()
capabilities = {'chrome.binary': '/Google/Chrome/Application/chrome.exe'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]
html = requests.get(links).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
The error code I get:
No connection adapters were found for
'['https://www.vatainc.com/0240-bonnie-bone-marrow-biopsy-skills-trainertm.html',
'https://www.vatainc.com/0910-seymour-iitm-wound-care-model-1580.html',
'https://www.vatainc.com/2410-chester-chesttm-with-new-advanced-arm-1197.html',
'https://www.vatainc.com/2365-advanced-four-vein-venipuncture-training-aidtm-dermalike-iitm-latex-free.html']'
python selenium web-scraping beautifulsoup python-requests
add a comment |
The goal of this script is to visit a website, generate links to the specific products, and then scrape from those generated links.
In this script, the script will gain links from the four featured products seen on the product homepage through the attribute defined. The links are saved in a variable 'links', which contain four urls to the four featured products.
Then, I will use requests to request each of those urls of the products to scrape the data using BeautifulSoup.
Here is my code:
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
url = "https://www.vatainc.com/"
service = service.Service('/Users/Name/Downloads/chromedriver.exe')
service.start()
capabilities = {'chrome.binary': '/Google/Chrome/Application/chrome.exe'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]
html = requests.get(links).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
The error code I get:
No connection adapters were found for
'['https://www.vatainc.com/0240-bonnie-bone-marrow-biopsy-skills-trainertm.html',
'https://www.vatainc.com/0910-seymour-iitm-wound-care-model-1580.html',
'https://www.vatainc.com/2410-chester-chesttm-with-new-advanced-arm-1197.html',
'https://www.vatainc.com/2365-advanced-four-vein-venipuncture-training-aidtm-dermalike-iitm-latex-free.html']'
python selenium web-scraping beautifulsoup python-requests
add a comment |
The goal of this script is to visit a website, generate links to the specific products, and then scrape from those generated links.
In this script, the script will gain links from the four featured products seen on the product homepage through the attribute defined. The links are saved in a variable 'links', which contain four urls to the four featured products.
Then, I will use requests to request each of those urls of the products to scrape the data using BeautifulSoup.
Here is my code:
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
url = "https://www.vatainc.com/"
service = service.Service('/Users/Name/Downloads/chromedriver.exe')
service.start()
capabilities = {'chrome.binary': '/Google/Chrome/Application/chrome.exe'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]
html = requests.get(links).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
The error code I get:
No connection adapters were found for
'['https://www.vatainc.com/0240-bonnie-bone-marrow-biopsy-skills-trainertm.html',
'https://www.vatainc.com/0910-seymour-iitm-wound-care-model-1580.html',
'https://www.vatainc.com/2410-chester-chesttm-with-new-advanced-arm-1197.html',
'https://www.vatainc.com/2365-advanced-four-vein-venipuncture-training-aidtm-dermalike-iitm-latex-free.html']'
python selenium web-scraping beautifulsoup python-requests
The goal of this script is to visit a website, generate links to the specific products, and then scrape from those generated links.
In this script, the script will gain links from the four featured products seen on the product homepage through the attribute defined. The links are saved in a variable 'links', which contain four urls to the four featured products.
Then, I will use requests to request each of those urls of the products to scrape the data using BeautifulSoup.
Here is my code:
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
url = "https://www.vatainc.com/"
service = service.Service('/Users/Name/Downloads/chromedriver.exe')
service.start()
capabilities = {'chrome.binary': '/Google/Chrome/Application/chrome.exe'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]
html = requests.get(links).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
The error code I get:
No connection adapters were found for
'['https://www.vatainc.com/0240-bonnie-bone-marrow-biopsy-skills-trainertm.html',
'https://www.vatainc.com/0910-seymour-iitm-wound-care-model-1580.html',
'https://www.vatainc.com/2410-chester-chesttm-with-new-advanced-arm-1197.html',
'https://www.vatainc.com/2365-advanced-four-vein-venipuncture-training-aidtm-dermalike-iitm-latex-free.html']'
python selenium web-scraping beautifulsoup python-requests
python selenium web-scraping beautifulsoup python-requests
edited Nov 21 '18 at 15:37
Andersson
39k103469
39k103469
asked Nov 21 '18 at 15:30
Jonathan Jonathan
165
165
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
links
is the list of strings (URLs). You cannot pass list as url
argument to requests.get()
. Try to iterate through this list and pass each URL one by one and get each page:
for link in links:
html = requests.get(link).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415395%2ftrying-to-scrape-from-multiple-links%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
links
is the list of strings (URLs). You cannot pass list as url
argument to requests.get()
. Try to iterate through this list and pass each URL one by one and get each page:
for link in links:
html = requests.get(link).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
add a comment |
links
is the list of strings (URLs). You cannot pass list as url
argument to requests.get()
. Try to iterate through this list and pass each URL one by one and get each page:
for link in links:
html = requests.get(link).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
add a comment |
links
is the list of strings (URLs). You cannot pass list as url
argument to requests.get()
. Try to iterate through this list and pass each URL one by one and get each page:
for link in links:
html = requests.get(link).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
links
is the list of strings (URLs). You cannot pass list as url
argument to requests.get()
. Try to iterate through this list and pass each URL one by one and get each page:
for link in links:
html = requests.get(link).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)
answered Nov 21 '18 at 15:34
AnderssonAndersson
39k103469
39k103469
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415395%2ftrying-to-scrape-from-multiple-links%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown