Import data from dynamically generated webpage
$begingroup$
I'm trying to get data from a web-page
data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]
It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...}
etc.
"FullData"
element performs in the same way and adding some empty lists.
Can anyone suggest how to handle this type of things in WM? Many thanks.
import web-access
$endgroup$
add a comment |
$begingroup$
I'm trying to get data from a web-page
data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]
It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...}
etc.
"FullData"
element performs in the same way and adding some empty lists.
Can anyone suggest how to handle this type of things in WM? Many thanks.
import web-access
$endgroup$
add a comment |
$begingroup$
I'm trying to get data from a web-page
data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]
It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...}
etc.
"FullData"
element performs in the same way and adding some empty lists.
Can anyone suggest how to handle this type of things in WM? Many thanks.
import web-access
$endgroup$
I'm trying to get data from a web-page
data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]
It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...}
etc.
"FullData"
element performs in the same way and adding some empty lists.
Can anyone suggest how to handle this type of things in WM? Many thanks.
import web-access
import web-access
edited Jan 24 at 16:42
gwr
8,14822761
8,14822761
asked Jan 24 at 15:07
Gabriel ParkerGabriel Parker
384
384
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.
The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:
url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
URLParse[url]
<|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
"Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
"LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
"Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
"PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
"Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>
Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID
to get a different team, or DateFrom
and DateTo
to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.
Anyway, let's just take the results of the initial URL.
Import[url, "JSON"]
Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.
A way to get around this is to use Python's JSON import instead, so let's do that:
pythonImportJson[jsonText_] :=
ExternalEvaluate["Python",
"import json; json.loads("""" <> jsonText <> """")"]
pythonImportJson[Import[url, "Text"]]
The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000
.
A way to get around this is to replace repeated 0
s with an empty string and import the resulting string, or to use the python method above.
ImportString[
StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]
Now we have our data in a nice association format.
Now, it seems to me like the structure of the JSON is essentially table rows and headings.
If we look at the contents of any data in the resultSets
association, we see headers
and rowSet
, and name
. To me this seems like it's basically describing the exact table we see on the webpage.
We can transform this into a dataset by AssociationThread
ing the headers
onto each row in the rowSet
s.
headers = data[["resultSets", 1, "headers"]]
rows = data[["resultSets", 1, "rowSet"]]
Dataset[AssociationThread[headers, #] & /@ rows]
Comparing that with the first table on the webpage, it seems like we got it right:
Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:
createDataset[resultSet_] :=
Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
Dataset[AssociationThread[headers, #] & /@ rows]]
createDataset[data[["resultSets", 6]]]
Seems pretty good to me.
Now we can create a workflow where we can easily modify the input parameters:
url = URLBuild[URLBuild[
<|"Scheme" -> "https", "User" -> None,
"Domain" -> "stats.nba.com", "Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "",
"GameSegment" -> "", "LastNGames" -> "0",
"LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0",
"OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
"PaceAdjust" -> "N", "PerMode" -> "PerGame",
"Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
"Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""},
"Fragment" -> None|>];
data = pythonImportJson[Import[url, "Text"]];
Dataset@AssociationThread[data[["resultSets", All, "name"]],
createDataset /@ data["resultSets"]]
Which gives us a nice dataset of each table on the page.
Now it's easy to change the "TeamID"
parameter to compare teams for instance.
$endgroup$
3
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
Another reason to look forward to 12. I think there's actually a bug inImport[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
$endgroup$
– Carl Lange
Jan 24 at 16:31
1
$begingroup$
(+1) Is Arnoud Buzing'sWebTools
usefull here? (see (69343) for the originalWebUnit
)
$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
@gwr yes, probably - some of that functionality is actually inExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
It fails becauseImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f190177%2fimport-data-from-dynamically-generated-webpage%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.
The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:
url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
URLParse[url]
<|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
"Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
"LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
"Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
"PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
"Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>
Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID
to get a different team, or DateFrom
and DateTo
to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.
Anyway, let's just take the results of the initial URL.
Import[url, "JSON"]
Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.
A way to get around this is to use Python's JSON import instead, so let's do that:
pythonImportJson[jsonText_] :=
ExternalEvaluate["Python",
"import json; json.loads("""" <> jsonText <> """")"]
pythonImportJson[Import[url, "Text"]]
The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000
.
A way to get around this is to replace repeated 0
s with an empty string and import the resulting string, or to use the python method above.
ImportString[
StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]
Now we have our data in a nice association format.
Now, it seems to me like the structure of the JSON is essentially table rows and headings.
If we look at the contents of any data in the resultSets
association, we see headers
and rowSet
, and name
. To me this seems like it's basically describing the exact table we see on the webpage.
We can transform this into a dataset by AssociationThread
ing the headers
onto each row in the rowSet
s.
headers = data[["resultSets", 1, "headers"]]
rows = data[["resultSets", 1, "rowSet"]]
Dataset[AssociationThread[headers, #] & /@ rows]
Comparing that with the first table on the webpage, it seems like we got it right:
Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:
createDataset[resultSet_] :=
Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
Dataset[AssociationThread[headers, #] & /@ rows]]
createDataset[data[["resultSets", 6]]]
Seems pretty good to me.
Now we can create a workflow where we can easily modify the input parameters:
url = URLBuild[URLBuild[
<|"Scheme" -> "https", "User" -> None,
"Domain" -> "stats.nba.com", "Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "",
"GameSegment" -> "", "LastNGames" -> "0",
"LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0",
"OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
"PaceAdjust" -> "N", "PerMode" -> "PerGame",
"Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
"Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""},
"Fragment" -> None|>];
data = pythonImportJson[Import[url, "Text"]];
Dataset@AssociationThread[data[["resultSets", All, "name"]],
createDataset /@ data["resultSets"]]
Which gives us a nice dataset of each table on the page.
Now it's easy to change the "TeamID"
parameter to compare teams for instance.
$endgroup$
3
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
Another reason to look forward to 12. I think there's actually a bug inImport[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
$endgroup$
– Carl Lange
Jan 24 at 16:31
1
$begingroup$
(+1) Is Arnoud Buzing'sWebTools
usefull here? (see (69343) for the originalWebUnit
)
$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
@gwr yes, probably - some of that functionality is actually inExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
It fails becauseImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
|
show 1 more comment
$begingroup$
Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.
The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:
url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
URLParse[url]
<|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
"Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
"LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
"Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
"PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
"Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>
Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID
to get a different team, or DateFrom
and DateTo
to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.
Anyway, let's just take the results of the initial URL.
Import[url, "JSON"]
Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.
A way to get around this is to use Python's JSON import instead, so let's do that:
pythonImportJson[jsonText_] :=
ExternalEvaluate["Python",
"import json; json.loads("""" <> jsonText <> """")"]
pythonImportJson[Import[url, "Text"]]
The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000
.
A way to get around this is to replace repeated 0
s with an empty string and import the resulting string, or to use the python method above.
ImportString[
StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]
Now we have our data in a nice association format.
Now, it seems to me like the structure of the JSON is essentially table rows and headings.
If we look at the contents of any data in the resultSets
association, we see headers
and rowSet
, and name
. To me this seems like it's basically describing the exact table we see on the webpage.
We can transform this into a dataset by AssociationThread
ing the headers
onto each row in the rowSet
s.
headers = data[["resultSets", 1, "headers"]]
rows = data[["resultSets", 1, "rowSet"]]
Dataset[AssociationThread[headers, #] & /@ rows]
Comparing that with the first table on the webpage, it seems like we got it right:
Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:
createDataset[resultSet_] :=
Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
Dataset[AssociationThread[headers, #] & /@ rows]]
createDataset[data[["resultSets", 6]]]
Seems pretty good to me.
Now we can create a workflow where we can easily modify the input parameters:
url = URLBuild[URLBuild[
<|"Scheme" -> "https", "User" -> None,
"Domain" -> "stats.nba.com", "Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "",
"GameSegment" -> "", "LastNGames" -> "0",
"LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0",
"OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
"PaceAdjust" -> "N", "PerMode" -> "PerGame",
"Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
"Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""},
"Fragment" -> None|>];
data = pythonImportJson[Import[url, "Text"]];
Dataset@AssociationThread[data[["resultSets", All, "name"]],
createDataset /@ data["resultSets"]]
Which gives us a nice dataset of each table on the page.
Now it's easy to change the "TeamID"
parameter to compare teams for instance.
$endgroup$
3
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
Another reason to look forward to 12. I think there's actually a bug inImport[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
$endgroup$
– Carl Lange
Jan 24 at 16:31
1
$begingroup$
(+1) Is Arnoud Buzing'sWebTools
usefull here? (see (69343) for the originalWebUnit
)
$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
@gwr yes, probably - some of that functionality is actually inExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
It fails becauseImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
|
show 1 more comment
$begingroup$
Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.
The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:
url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
URLParse[url]
<|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
"Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
"LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
"Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
"PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
"Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>
Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID
to get a different team, or DateFrom
and DateTo
to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.
Anyway, let's just take the results of the initial URL.
Import[url, "JSON"]
Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.
A way to get around this is to use Python's JSON import instead, so let's do that:
pythonImportJson[jsonText_] :=
ExternalEvaluate["Python",
"import json; json.loads("""" <> jsonText <> """")"]
pythonImportJson[Import[url, "Text"]]
The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000
.
A way to get around this is to replace repeated 0
s with an empty string and import the resulting string, or to use the python method above.
ImportString[
StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]
Now we have our data in a nice association format.
Now, it seems to me like the structure of the JSON is essentially table rows and headings.
If we look at the contents of any data in the resultSets
association, we see headers
and rowSet
, and name
. To me this seems like it's basically describing the exact table we see on the webpage.
We can transform this into a dataset by AssociationThread
ing the headers
onto each row in the rowSet
s.
headers = data[["resultSets", 1, "headers"]]
rows = data[["resultSets", 1, "rowSet"]]
Dataset[AssociationThread[headers, #] & /@ rows]
Comparing that with the first table on the webpage, it seems like we got it right:
Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:
createDataset[resultSet_] :=
Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
Dataset[AssociationThread[headers, #] & /@ rows]]
createDataset[data[["resultSets", 6]]]
Seems pretty good to me.
Now we can create a workflow where we can easily modify the input parameters:
url = URLBuild[URLBuild[
<|"Scheme" -> "https", "User" -> None,
"Domain" -> "stats.nba.com", "Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "",
"GameSegment" -> "", "LastNGames" -> "0",
"LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0",
"OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
"PaceAdjust" -> "N", "PerMode" -> "PerGame",
"Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
"Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""},
"Fragment" -> None|>];
data = pythonImportJson[Import[url, "Text"]];
Dataset@AssociationThread[data[["resultSets", All, "name"]],
createDataset /@ data["resultSets"]]
Which gives us a nice dataset of each table on the page.
Now it's easy to change the "TeamID"
parameter to compare teams for instance.
$endgroup$
Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.
The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:
url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
URLParse[url]
<|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
"Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
"LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
"Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
"PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
"Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>
Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID
to get a different team, or DateFrom
and DateTo
to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.
Anyway, let's just take the results of the initial URL.
Import[url, "JSON"]
Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.
A way to get around this is to use Python's JSON import instead, so let's do that:
pythonImportJson[jsonText_] :=
ExternalEvaluate["Python",
"import json; json.loads("""" <> jsonText <> """")"]
pythonImportJson[Import[url, "Text"]]
The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000
.
A way to get around this is to replace repeated 0
s with an empty string and import the resulting string, or to use the python method above.
ImportString[
StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]
Now we have our data in a nice association format.
Now, it seems to me like the structure of the JSON is essentially table rows and headings.
If we look at the contents of any data in the resultSets
association, we see headers
and rowSet
, and name
. To me this seems like it's basically describing the exact table we see on the webpage.
We can transform this into a dataset by AssociationThread
ing the headers
onto each row in the rowSet
s.
headers = data[["resultSets", 1, "headers"]]
rows = data[["resultSets", 1, "rowSet"]]
Dataset[AssociationThread[headers, #] & /@ rows]
Comparing that with the first table on the webpage, it seems like we got it right:
Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:
createDataset[resultSet_] :=
Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
Dataset[AssociationThread[headers, #] & /@ rows]]
createDataset[data[["resultSets", 6]]]
Seems pretty good to me.
Now we can create a workflow where we can easily modify the input parameters:
url = URLBuild[URLBuild[
<|"Scheme" -> "https", "User" -> None,
"Domain" -> "stats.nba.com", "Port" -> None,
"Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
"Query" -> {"DateFrom" -> "", "DateTo" -> "",
"GameSegment" -> "", "LastNGames" -> "0",
"LeagueID" -> "00", "Location" -> "",
"MeasureType" -> "Base", "Month" -> "0",
"OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
"PaceAdjust" -> "N", "PerMode" -> "PerGame",
"Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
"Season" -> "2018-19", "SeasonSegment" -> "",
"SeasonType" -> "Regular Season", "ShotClockRange" -> "",
"Split" -> "general", "TeamID" -> "1610612738",
"VsConference" -> "", "VsDivision" -> ""},
"Fragment" -> None|>];
data = pythonImportJson[Import[url, "Text"]];
Dataset@AssociationThread[data[["resultSets", All, "name"]],
createDataset /@ data["resultSets"]]
Which gives us a nice dataset of each table on the page.
Now it's easy to change the "TeamID"
parameter to compare teams for instance.
edited Jan 24 at 17:27
answered Jan 24 at 15:44
Carl LangeCarl Lange
3,5951732
3,5951732
3
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
Another reason to look forward to 12. I think there's actually a bug inImport[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
$endgroup$
– Carl Lange
Jan 24 at 16:31
1
$begingroup$
(+1) Is Arnoud Buzing'sWebTools
usefull here? (see (69343) for the originalWebUnit
)
$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
@gwr yes, probably - some of that functionality is actually inExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
It fails becauseImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
|
show 1 more comment
3
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
Another reason to look forward to 12. I think there's actually a bug inImport[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
$endgroup$
– Carl Lange
Jan 24 at 16:31
1
$begingroup$
(+1) Is Arnoud Buzing'sWebTools
usefull here? (see (69343) for the originalWebUnit
)
$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
@gwr yes, probably - some of that functionality is actually inExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
It fails becauseImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
3
3
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
When 12 comes out, looks like WebExecute will be able the handle this
$endgroup$
– M.R.
Jan 24 at 16:29
$begingroup$
Another reason to look forward to 12. I think there's actually a bug in
Import[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.$endgroup$
– Carl Lange
Jan 24 at 16:31
$begingroup$
Another reason to look forward to 12. I think there's actually a bug in
Import[..., "JSON"]
(as the output from the endpoint is definitely valid) - perhaps that will be fixed also.$endgroup$
– Carl Lange
Jan 24 at 16:31
1
1
$begingroup$
(+1) Is Arnoud Buzing's
WebTools
usefull here? (see (69343) for the original WebUnit
)$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
(+1) Is Arnoud Buzing's
WebTools
usefull here? (see (69343) for the original WebUnit
)$endgroup$
– gwr
Jan 24 at 16:38
$begingroup$
@gwr yes, probably - some of that functionality is actually in
ExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
@gwr yes, probably - some of that functionality is actually in
ExternalEvaluate
in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)$endgroup$
– Carl Lange
Jan 24 at 16:58
$begingroup$
It fails because
ImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
$begingroup$
It fails because
ImportString["-8.3000000000000000000000", "JSON"]
$endgroup$
– Kuba♦
Jan 24 at 17:18
|
show 1 more comment
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f190177%2fimport-data-from-dynamically-generated-webpage%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown