Can not find data of html with R
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I want to import a table with R, but I cannot find it.
library(rvest)
XXX <- read_html('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie')
Thats my code for the website, I am searching for the huge table, called "Historische Kurse Scholz Holding". I cannot find the data in XXX, but i can find it when I use the Inspector on the website directly. Why is it so? Any suggestions how to extract the data with R?
Edit: Copy paste is not an option, I have to do this with 1000 webpages.
Best regards
Edit:
URL_1<- structure(list(carb = c('000A0U9J2', '000A0V7D8', '000A0VL70', '000A0VLS5'), optden = c('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie',
'http://www.boerse-frankfurt.de/anleihen/kurshistorie/Strabag_SEEO-Schuldverschr_201219-Anleihe-2019-AT0000A0V7D8/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/BorealisEO-Schuldv_201219-Anleihe-2019-AT0000A0VL70/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/AndritzEO-Anleihe_201219-Anleihe-2019-AT0000A0VLS5/FSE/1.5.2017_19.11.2018#Kurshistorie')), .Names = c("ISIN", "LINK"
), row.names = c("1", "2", "3", "4"), class = "data.frame")
html r
add a comment |
I want to import a table with R, but I cannot find it.
library(rvest)
XXX <- read_html('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie')
Thats my code for the website, I am searching for the huge table, called "Historische Kurse Scholz Holding". I cannot find the data in XXX, but i can find it when I use the Inspector on the website directly. Why is it so? Any suggestions how to extract the data with R?
Edit: Copy paste is not an option, I have to do this with 1000 webpages.
Best regards
Edit:
URL_1<- structure(list(carb = c('000A0U9J2', '000A0V7D8', '000A0VL70', '000A0VLS5'), optden = c('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie',
'http://www.boerse-frankfurt.de/anleihen/kurshistorie/Strabag_SEEO-Schuldverschr_201219-Anleihe-2019-AT0000A0V7D8/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/BorealisEO-Schuldv_201219-Anleihe-2019-AT0000A0VL70/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/AndritzEO-Anleihe_201219-Anleihe-2019-AT0000A0VLS5/FSE/1.5.2017_19.11.2018#Kurshistorie')), .Names = c("ISIN", "LINK"
), row.names = c("1", "2", "3", "4"), class = "data.frame")
html r
would you kindly provide the package you are using (read_html
)
– Ben T.
Nov 22 '18 at 12:18
Sorry, i am using rvest
– Kramer
Nov 22 '18 at 12:22
1
This is a very complex site that uses XHR requests to load that HTML<div>
with the price table. It's also using something called "ajax tokens" to make it harder to do this very task. Those tokens are generated anew each page and get passed with the XHRPOST
request that asynchronously retrieves the table. You should consider using RSelenium or splashr, both of which use full browser environments to render content. It's possible to eventually work up a solution without using RSelenium or splashr but it will be a tedious journey.
– hrbrmstr
Nov 22 '18 at 12:52
add a comment |
I want to import a table with R, but I cannot find it.
library(rvest)
XXX <- read_html('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie')
Thats my code for the website, I am searching for the huge table, called "Historische Kurse Scholz Holding". I cannot find the data in XXX, but i can find it when I use the Inspector on the website directly. Why is it so? Any suggestions how to extract the data with R?
Edit: Copy paste is not an option, I have to do this with 1000 webpages.
Best regards
Edit:
URL_1<- structure(list(carb = c('000A0U9J2', '000A0V7D8', '000A0VL70', '000A0VLS5'), optden = c('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie',
'http://www.boerse-frankfurt.de/anleihen/kurshistorie/Strabag_SEEO-Schuldverschr_201219-Anleihe-2019-AT0000A0V7D8/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/BorealisEO-Schuldv_201219-Anleihe-2019-AT0000A0VL70/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/AndritzEO-Anleihe_201219-Anleihe-2019-AT0000A0VLS5/FSE/1.5.2017_19.11.2018#Kurshistorie')), .Names = c("ISIN", "LINK"
), row.names = c("1", "2", "3", "4"), class = "data.frame")
html r
I want to import a table with R, but I cannot find it.
library(rvest)
XXX <- read_html('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie')
Thats my code for the website, I am searching for the huge table, called "Historische Kurse Scholz Holding". I cannot find the data in XXX, but i can find it when I use the Inspector on the website directly. Why is it so? Any suggestions how to extract the data with R?
Edit: Copy paste is not an option, I have to do this with 1000 webpages.
Best regards
Edit:
URL_1<- structure(list(carb = c('000A0U9J2', '000A0V7D8', '000A0VL70', '000A0VLS5'), optden = c('http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie',
'http://www.boerse-frankfurt.de/anleihen/kurshistorie/Strabag_SEEO-Schuldverschr_201219-Anleihe-2019-AT0000A0V7D8/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/BorealisEO-Schuldv_201219-Anleihe-2019-AT0000A0VL70/FSE/1.5.2017_19.11.2018#Kurshistorie', 'http://www.boerse-frankfurt.de/anleihen/kurshistorie/AndritzEO-Anleihe_201219-Anleihe-2019-AT0000A0VLS5/FSE/1.5.2017_19.11.2018#Kurshistorie')), .Names = c("ISIN", "LINK"
), row.names = c("1", "2", "3", "4"), class = "data.frame")
html r
html r
edited Nov 24 '18 at 22:03
Kramer
asked Nov 22 '18 at 12:06
KramerKramer
61
61
would you kindly provide the package you are using (read_html
)
– Ben T.
Nov 22 '18 at 12:18
Sorry, i am using rvest
– Kramer
Nov 22 '18 at 12:22
1
This is a very complex site that uses XHR requests to load that HTML<div>
with the price table. It's also using something called "ajax tokens" to make it harder to do this very task. Those tokens are generated anew each page and get passed with the XHRPOST
request that asynchronously retrieves the table. You should consider using RSelenium or splashr, both of which use full browser environments to render content. It's possible to eventually work up a solution without using RSelenium or splashr but it will be a tedious journey.
– hrbrmstr
Nov 22 '18 at 12:52
add a comment |
would you kindly provide the package you are using (read_html
)
– Ben T.
Nov 22 '18 at 12:18
Sorry, i am using rvest
– Kramer
Nov 22 '18 at 12:22
1
This is a very complex site that uses XHR requests to load that HTML<div>
with the price table. It's also using something called "ajax tokens" to make it harder to do this very task. Those tokens are generated anew each page and get passed with the XHRPOST
request that asynchronously retrieves the table. You should consider using RSelenium or splashr, both of which use full browser environments to render content. It's possible to eventually work up a solution without using RSelenium or splashr but it will be a tedious journey.
– hrbrmstr
Nov 22 '18 at 12:52
would you kindly provide the package you are using (
read_html
)– Ben T.
Nov 22 '18 at 12:18
would you kindly provide the package you are using (
read_html
)– Ben T.
Nov 22 '18 at 12:18
Sorry, i am using rvest
– Kramer
Nov 22 '18 at 12:22
Sorry, i am using rvest
– Kramer
Nov 22 '18 at 12:22
1
1
This is a very complex site that uses XHR requests to load that HTML
<div>
with the price table. It's also using something called "ajax tokens" to make it harder to do this very task. Those tokens are generated anew each page and get passed with the XHR POST
request that asynchronously retrieves the table. You should consider using RSelenium or splashr, both of which use full browser environments to render content. It's possible to eventually work up a solution without using RSelenium or splashr but it will be a tedious journey.– hrbrmstr
Nov 22 '18 at 12:52
This is a very complex site that uses XHR requests to load that HTML
<div>
with the price table. It's also using something called "ajax tokens" to make it harder to do this very task. Those tokens are generated anew each page and get passed with the XHR POST
request that asynchronously retrieves the table. You should consider using RSelenium or splashr, both of which use full browser environments to render content. It's possible to eventually work up a solution without using RSelenium or splashr but it will be a tedious journey.– hrbrmstr
Nov 22 '18 at 12:52
add a comment |
1 Answer
1
active
oldest
votes
Here is a solution involving RSelenium
via docker
(more information can be found here). First, a few preliminaries:
In the
docker
terminal
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.0
In
R
remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L)
remDr$open(silent = T)
Now we can navigate to the page and extract the table
url1 <- "http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie"
remDr$navigate(url1)
pageSource <- read_html(remDr$getPageSource()[[1]])
dt <- html_table(html_nodes(pageSource, ".table"), fill = T)
And here is the result
# printing the head of the data frame
> head(dt[[7]])
Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
1 19.11.2018 1,5000 1,5000 1,5000 1,5000 1.000 NA
2 16.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
3 15.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
4 14.11.2018 1,5000 1,5000 1,5000 1,5000 2.000 NA
5 13.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
6 12.11.2018 1,2500 2,2500 2,2500 1,2500 1.127 NA
Addendum
As indicated in the link above, here is how to get started with RSelenium
and docker
:
- Install
docker
- Pull an image, e.g. (in the
docker
terminal)docker pull selenium/standalone-firefox:2.53.0
- Now one can proceed with 1. and 2. above
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
@KramerRSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add somesys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in atry()
in order to handle errors e.g.for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also awhile
loop in order to try multiple times if the attempt fails).
– niko
Nov 24 '18 at 22:07
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
yes they are :)
– Kramer
Nov 24 '18 at 22:21
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430663%2fcan-not-find-data-of-html-with-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here is a solution involving RSelenium
via docker
(more information can be found here). First, a few preliminaries:
In the
docker
terminal
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.0
In
R
remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L)
remDr$open(silent = T)
Now we can navigate to the page and extract the table
url1 <- "http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie"
remDr$navigate(url1)
pageSource <- read_html(remDr$getPageSource()[[1]])
dt <- html_table(html_nodes(pageSource, ".table"), fill = T)
And here is the result
# printing the head of the data frame
> head(dt[[7]])
Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
1 19.11.2018 1,5000 1,5000 1,5000 1,5000 1.000 NA
2 16.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
3 15.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
4 14.11.2018 1,5000 1,5000 1,5000 1,5000 2.000 NA
5 13.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
6 12.11.2018 1,2500 2,2500 2,2500 1,2500 1.127 NA
Addendum
As indicated in the link above, here is how to get started with RSelenium
and docker
:
- Install
docker
- Pull an image, e.g. (in the
docker
terminal)docker pull selenium/standalone-firefox:2.53.0
- Now one can proceed with 1. and 2. above
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
@KramerRSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add somesys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in atry()
in order to handle errors e.g.for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also awhile
loop in order to try multiple times if the attempt fails).
– niko
Nov 24 '18 at 22:07
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
yes they are :)
– Kramer
Nov 24 '18 at 22:21
add a comment |
Here is a solution involving RSelenium
via docker
(more information can be found here). First, a few preliminaries:
In the
docker
terminal
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.0
In
R
remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L)
remDr$open(silent = T)
Now we can navigate to the page and extract the table
url1 <- "http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie"
remDr$navigate(url1)
pageSource <- read_html(remDr$getPageSource()[[1]])
dt <- html_table(html_nodes(pageSource, ".table"), fill = T)
And here is the result
# printing the head of the data frame
> head(dt[[7]])
Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
1 19.11.2018 1,5000 1,5000 1,5000 1,5000 1.000 NA
2 16.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
3 15.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
4 14.11.2018 1,5000 1,5000 1,5000 1,5000 2.000 NA
5 13.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
6 12.11.2018 1,2500 2,2500 2,2500 1,2500 1.127 NA
Addendum
As indicated in the link above, here is how to get started with RSelenium
and docker
:
- Install
docker
- Pull an image, e.g. (in the
docker
terminal)docker pull selenium/standalone-firefox:2.53.0
- Now one can proceed with 1. and 2. above
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
@KramerRSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add somesys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in atry()
in order to handle errors e.g.for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also awhile
loop in order to try multiple times if the attempt fails).
– niko
Nov 24 '18 at 22:07
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
yes they are :)
– Kramer
Nov 24 '18 at 22:21
add a comment |
Here is a solution involving RSelenium
via docker
(more information can be found here). First, a few preliminaries:
In the
docker
terminal
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.0
In
R
remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L)
remDr$open(silent = T)
Now we can navigate to the page and extract the table
url1 <- "http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie"
remDr$navigate(url1)
pageSource <- read_html(remDr$getPageSource()[[1]])
dt <- html_table(html_nodes(pageSource, ".table"), fill = T)
And here is the result
# printing the head of the data frame
> head(dt[[7]])
Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
1 19.11.2018 1,5000 1,5000 1,5000 1,5000 1.000 NA
2 16.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
3 15.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
4 14.11.2018 1,5000 1,5000 1,5000 1,5000 2.000 NA
5 13.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
6 12.11.2018 1,2500 2,2500 2,2500 1,2500 1.127 NA
Addendum
As indicated in the link above, here is how to get started with RSelenium
and docker
:
- Install
docker
- Pull an image, e.g. (in the
docker
terminal)docker pull selenium/standalone-firefox:2.53.0
- Now one can proceed with 1. and 2. above
Here is a solution involving RSelenium
via docker
(more information can be found here). First, a few preliminaries:
In the
docker
terminal
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.0
In
R
remDr <- RSelenium::remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L)
remDr$open(silent = T)
Now we can navigate to the page and extract the table
url1 <- "http://www.boerse-frankfurt.de/anleihen/kurshistorie/Inh-Schv_v20122019-Anleihe-2019-AT0000A0U9J2/FSE/1.5.2017_19.11.2018#Kurshistorie"
remDr$navigate(url1)
pageSource <- read_html(remDr$getPageSource()[[1]])
dt <- html_table(html_nodes(pageSource, ".table"), fill = T)
And here is the result
# printing the head of the data frame
> head(dt[[7]])
Datum Eröffnung Schluss Tageshoch Tagestief Umsatz
1 19.11.2018 1,5000 1,5000 1,5000 1,5000 1.000 NA
2 16.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
3 15.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
4 14.11.2018 1,5000 1,5000 1,5000 1,5000 2.000 NA
5 13.11.2018 1,5000 1,5000 1,5000 1,5000 NA NA
6 12.11.2018 1,2500 2,2500 2,2500 1,2500 1.127 NA
Addendum
As indicated in the link above, here is how to get started with RSelenium
and docker
:
- Install
docker
- Pull an image, e.g. (in the
docker
terminal)docker pull selenium/standalone-firefox:2.53.0
- Now one can proceed with 1. and 2. above
edited Nov 22 '18 at 14:54
answered Nov 22 '18 at 13:28
nikoniko
3,3091421
3,3091421
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
@KramerRSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add somesys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in atry()
in order to handle errors e.g.for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also awhile
loop in order to try multiple times if the attempt fails).
– niko
Nov 24 '18 at 22:07
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
yes they are :)
– Kramer
Nov 24 '18 at 22:21
add a comment |
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
@KramerRSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add somesys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in atry()
in order to handle errors e.g.for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also awhile
loop in order to try multiple times if the attempt fails).
– niko
Nov 24 '18 at 22:07
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
yes they are :)
– Kramer
Nov 24 '18 at 22:21
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
I made it, thanks
– Kramer
Nov 23 '18 at 16:47
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
So, I want to extend the whole stuff with a while loop, but my one is not working, and the solutions which are given back are unconstant anf often zero. Is it passible to do this R code a few thousand times in a row? I have edit a data set with 4 rows for which I am trying to do a while loop.
– Kramer
Nov 24 '18 at 22:01
@Kramer
RSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add some sys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in a try()
in order to handle errors e.g. for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also a while
loop in order to try multiple times if the attempt fails).– niko
Nov 24 '18 at 22:07
@Kramer
RSelenium
is unfortunately unstable at times and rather slow. To avoid errors you might need to add some sys.Sleep()
in order to let the page load. Also, what you might want to consider wrapping the scrapping process in a try()
in order to handle errors e.g. for(k in seq(1000)) res <- try(myScrapper(url[k]))
(and maybe also a while
loop in order to try multiple times if the attempt fails).– niko
Nov 24 '18 at 22:07
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
@Kramer Also, to state the obvious, check that for all links the above code works, i.e. that it indeed gathers the relevant data.
– niko
Nov 24 '18 at 22:11
yes they are :)
– Kramer
Nov 24 '18 at 22:21
yes they are :)
– Kramer
Nov 24 '18 at 22:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430663%2fcan-not-find-data-of-html-with-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
would you kindly provide the package you are using (
read_html
)– Ben T.
Nov 22 '18 at 12:18
Sorry, i am using rvest
– Kramer
Nov 22 '18 at 12:22
1
This is a very complex site that uses XHR requests to load that HTML
<div>
with the price table. It's also using something called "ajax tokens" to make it harder to do this very task. Those tokens are generated anew each page and get passed with the XHRPOST
request that asynchronously retrieves the table. You should consider using RSelenium or splashr, both of which use full browser environments to render content. It's possible to eventually work up a solution without using RSelenium or splashr but it will be a tedious journey.– hrbrmstr
Nov 22 '18 at 12:52