Import data from dynamically generated webpage












4












$begingroup$


I'm trying to get data from a web-page



data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]


It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...} etc.



"FullData"element performs in the same way and adding some empty lists.
Can anyone suggest how to handle this type of things in WM? Many thanks.










share|improve this question











$endgroup$

















    4












    $begingroup$


    I'm trying to get data from a web-page



    data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]


    It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...} etc.



    "FullData"element performs in the same way and adding some empty lists.
    Can anyone suggest how to handle this type of things in WM? Many thanks.










    share|improve this question











    $endgroup$















      4












      4








      4





      $begingroup$


      I'm trying to get data from a web-page



      data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]


      It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...} etc.



      "FullData"element performs in the same way and adding some empty lists.
      Can anyone suggest how to handle this type of things in WM? Many thanks.










      share|improve this question











      $endgroup$




      I'm trying to get data from a web-page



      data=Import["https://stats.nba.com/team/1610612738/traditional/", "Data"]


      It returns only a textual information. {Traditional, Advanced, Four Factors, Misc, Scoring}, {{t.city }} {{ t.name }}, {...} etc.



      "FullData"element performs in the same way and adding some empty lists.
      Can anyone suggest how to handle this type of things in WM? Many thanks.







      import web-access






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 24 at 16:42









      gwr

      8,14822761




      8,14822761










      asked Jan 24 at 15:07









      Gabriel ParkerGabriel Parker

      384




      384






















          1 Answer
          1






          active

          oldest

          votes


















          7












          $begingroup$

          Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.



          The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:



          url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
          URLParse[url]



          <|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
          "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
          "LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
          "Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
          "PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
          "Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>




          Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID to get a different team, or DateFrom and DateTo to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.



          Anyway, let's just take the results of the initial URL.



          Import[url, "JSON"]


          Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.



          A way to get around this is to use Python's JSON import instead, so let's do that:



          pythonImportJson[jsonText_] := 
          ExternalEvaluate["Python",
          "import json; json.loads("""" <> jsonText <> """")"]
          pythonImportJson[Import[url, "Text"]]


          The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000.



          A way to get around this is to replace repeated 0s with an empty string and import the resulting string, or to use the python method above.



          ImportString[
          StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]


          Now we have our data in a nice association format.



          enter image description here



          Now, it seems to me like the structure of the JSON is essentially table rows and headings.



          If we look at the contents of any data in the resultSets association, we see headers and rowSet, and name. To me this seems like it's basically describing the exact table we see on the webpage.



          enter image description here



          We can transform this into a dataset by AssociationThreading the headers onto each row in the rowSets.



          headers = data[["resultSets", 1, "headers"]]
          rows = data[["resultSets", 1, "rowSet"]]
          Dataset[AssociationThread[headers, #] & /@ rows]


          enter image description here



          Comparing that with the first table on the webpage, it seems like we got it right:



          enter image description here



          Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:



          createDataset[resultSet_] := 
          Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
          Dataset[AssociationThread[headers, #] & /@ rows]]

          createDataset[data[["resultSets", 6]]]


          enter image description here



          Seems pretty good to me.



          Now we can create a workflow where we can easily modify the input parameters:



          url = URLBuild[URLBuild[
          <|"Scheme" -> "https", "User" -> None,
          "Domain" -> "stats.nba.com", "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "",
          "GameSegment" -> "", "LastNGames" -> "0",
          "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0",
          "OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
          "PaceAdjust" -> "N", "PerMode" -> "PerGame",
          "Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
          "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""},
          "Fragment" -> None|>];
          data = pythonImportJson[Import[url, "Text"]];

          Dataset@AssociationThread[data[["resultSets", All, "name"]],
          createDataset /@ data["resultSets"]]


          Which gives us a nice dataset of each table on the page.



          enter image description here



          Now it's easy to change the "TeamID" parameter to compare teams for instance.






          share|improve this answer











          $endgroup$









          • 3




            $begingroup$
            When 12 comes out, looks like WebExecute will be able the handle this
            $endgroup$
            – M.R.
            Jan 24 at 16:29










          • $begingroup$
            Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
            $endgroup$
            – Carl Lange
            Jan 24 at 16:31






          • 1




            $begingroup$
            (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
            $endgroup$
            – gwr
            Jan 24 at 16:38












          • $begingroup$
            @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
            $endgroup$
            – Carl Lange
            Jan 24 at 16:58










          • $begingroup$
            It fails because ImportString["-8.3000000000000000000000", "JSON"]
            $endgroup$
            – Kuba
            Jan 24 at 17:18











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "387"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f190177%2fimport-data-from-dynamically-generated-webpage%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7












          $begingroup$

          Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.



          The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:



          url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
          URLParse[url]



          <|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
          "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
          "LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
          "Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
          "PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
          "Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>




          Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID to get a different team, or DateFrom and DateTo to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.



          Anyway, let's just take the results of the initial URL.



          Import[url, "JSON"]


          Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.



          A way to get around this is to use Python's JSON import instead, so let's do that:



          pythonImportJson[jsonText_] := 
          ExternalEvaluate["Python",
          "import json; json.loads("""" <> jsonText <> """")"]
          pythonImportJson[Import[url, "Text"]]


          The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000.



          A way to get around this is to replace repeated 0s with an empty string and import the resulting string, or to use the python method above.



          ImportString[
          StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]


          Now we have our data in a nice association format.



          enter image description here



          Now, it seems to me like the structure of the JSON is essentially table rows and headings.



          If we look at the contents of any data in the resultSets association, we see headers and rowSet, and name. To me this seems like it's basically describing the exact table we see on the webpage.



          enter image description here



          We can transform this into a dataset by AssociationThreading the headers onto each row in the rowSets.



          headers = data[["resultSets", 1, "headers"]]
          rows = data[["resultSets", 1, "rowSet"]]
          Dataset[AssociationThread[headers, #] & /@ rows]


          enter image description here



          Comparing that with the first table on the webpage, it seems like we got it right:



          enter image description here



          Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:



          createDataset[resultSet_] := 
          Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
          Dataset[AssociationThread[headers, #] & /@ rows]]

          createDataset[data[["resultSets", 6]]]


          enter image description here



          Seems pretty good to me.



          Now we can create a workflow where we can easily modify the input parameters:



          url = URLBuild[URLBuild[
          <|"Scheme" -> "https", "User" -> None,
          "Domain" -> "stats.nba.com", "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "",
          "GameSegment" -> "", "LastNGames" -> "0",
          "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0",
          "OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
          "PaceAdjust" -> "N", "PerMode" -> "PerGame",
          "Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
          "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""},
          "Fragment" -> None|>];
          data = pythonImportJson[Import[url, "Text"]];

          Dataset@AssociationThread[data[["resultSets", All, "name"]],
          createDataset /@ data["resultSets"]]


          Which gives us a nice dataset of each table on the page.



          enter image description here



          Now it's easy to change the "TeamID" parameter to compare teams for instance.






          share|improve this answer











          $endgroup$









          • 3




            $begingroup$
            When 12 comes out, looks like WebExecute will be able the handle this
            $endgroup$
            – M.R.
            Jan 24 at 16:29










          • $begingroup$
            Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
            $endgroup$
            – Carl Lange
            Jan 24 at 16:31






          • 1




            $begingroup$
            (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
            $endgroup$
            – gwr
            Jan 24 at 16:38












          • $begingroup$
            @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
            $endgroup$
            – Carl Lange
            Jan 24 at 16:58










          • $begingroup$
            It fails because ImportString["-8.3000000000000000000000", "JSON"]
            $endgroup$
            – Kuba
            Jan 24 at 17:18
















          7












          $begingroup$

          Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.



          The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:



          url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
          URLParse[url]



          <|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
          "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
          "LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
          "Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
          "PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
          "Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>




          Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID to get a different team, or DateFrom and DateTo to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.



          Anyway, let's just take the results of the initial URL.



          Import[url, "JSON"]


          Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.



          A way to get around this is to use Python's JSON import instead, so let's do that:



          pythonImportJson[jsonText_] := 
          ExternalEvaluate["Python",
          "import json; json.loads("""" <> jsonText <> """")"]
          pythonImportJson[Import[url, "Text"]]


          The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000.



          A way to get around this is to replace repeated 0s with an empty string and import the resulting string, or to use the python method above.



          ImportString[
          StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]


          Now we have our data in a nice association format.



          enter image description here



          Now, it seems to me like the structure of the JSON is essentially table rows and headings.



          If we look at the contents of any data in the resultSets association, we see headers and rowSet, and name. To me this seems like it's basically describing the exact table we see on the webpage.



          enter image description here



          We can transform this into a dataset by AssociationThreading the headers onto each row in the rowSets.



          headers = data[["resultSets", 1, "headers"]]
          rows = data[["resultSets", 1, "rowSet"]]
          Dataset[AssociationThread[headers, #] & /@ rows]


          enter image description here



          Comparing that with the first table on the webpage, it seems like we got it right:



          enter image description here



          Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:



          createDataset[resultSet_] := 
          Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
          Dataset[AssociationThread[headers, #] & /@ rows]]

          createDataset[data[["resultSets", 6]]]


          enter image description here



          Seems pretty good to me.



          Now we can create a workflow where we can easily modify the input parameters:



          url = URLBuild[URLBuild[
          <|"Scheme" -> "https", "User" -> None,
          "Domain" -> "stats.nba.com", "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "",
          "GameSegment" -> "", "LastNGames" -> "0",
          "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0",
          "OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
          "PaceAdjust" -> "N", "PerMode" -> "PerGame",
          "Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
          "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""},
          "Fragment" -> None|>];
          data = pythonImportJson[Import[url, "Text"]];

          Dataset@AssociationThread[data[["resultSets", All, "name"]],
          createDataset /@ data["resultSets"]]


          Which gives us a nice dataset of each table on the page.



          enter image description here



          Now it's easy to change the "TeamID" parameter to compare teams for instance.






          share|improve this answer











          $endgroup$









          • 3




            $begingroup$
            When 12 comes out, looks like WebExecute will be able the handle this
            $endgroup$
            – M.R.
            Jan 24 at 16:29










          • $begingroup$
            Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
            $endgroup$
            – Carl Lange
            Jan 24 at 16:31






          • 1




            $begingroup$
            (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
            $endgroup$
            – gwr
            Jan 24 at 16:38












          • $begingroup$
            @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
            $endgroup$
            – Carl Lange
            Jan 24 at 16:58










          • $begingroup$
            It fails because ImportString["-8.3000000000000000000000", "JSON"]
            $endgroup$
            – Kuba
            Jan 24 at 17:18














          7












          7








          7





          $begingroup$

          Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.



          The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:



          url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
          URLParse[url]



          <|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
          "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
          "LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
          "Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
          "PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
          "Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>




          Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID to get a different team, or DateFrom and DateTo to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.



          Anyway, let's just take the results of the initial URL.



          Import[url, "JSON"]


          Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.



          A way to get around this is to use Python's JSON import instead, so let's do that:



          pythonImportJson[jsonText_] := 
          ExternalEvaluate["Python",
          "import json; json.loads("""" <> jsonText <> """")"]
          pythonImportJson[Import[url, "Text"]]


          The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000.



          A way to get around this is to replace repeated 0s with an empty string and import the resulting string, or to use the python method above.



          ImportString[
          StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]


          Now we have our data in a nice association format.



          enter image description here



          Now, it seems to me like the structure of the JSON is essentially table rows and headings.



          If we look at the contents of any data in the resultSets association, we see headers and rowSet, and name. To me this seems like it's basically describing the exact table we see on the webpage.



          enter image description here



          We can transform this into a dataset by AssociationThreading the headers onto each row in the rowSets.



          headers = data[["resultSets", 1, "headers"]]
          rows = data[["resultSets", 1, "rowSet"]]
          Dataset[AssociationThread[headers, #] & /@ rows]


          enter image description here



          Comparing that with the first table on the webpage, it seems like we got it right:



          enter image description here



          Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:



          createDataset[resultSet_] := 
          Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
          Dataset[AssociationThread[headers, #] & /@ rows]]

          createDataset[data[["resultSets", 6]]]


          enter image description here



          Seems pretty good to me.



          Now we can create a workflow where we can easily modify the input parameters:



          url = URLBuild[URLBuild[
          <|"Scheme" -> "https", "User" -> None,
          "Domain" -> "stats.nba.com", "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "",
          "GameSegment" -> "", "LastNGames" -> "0",
          "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0",
          "OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
          "PaceAdjust" -> "N", "PerMode" -> "PerGame",
          "Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
          "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""},
          "Fragment" -> None|>];
          data = pythonImportJson[Import[url, "Text"]];

          Dataset@AssociationThread[data[["resultSets", All, "name"]],
          createDataset /@ data["resultSets"]]


          Which gives us a nice dataset of each table on the page.



          enter image description here



          Now it's easy to change the "TeamID" parameter to compare teams for instance.






          share|improve this answer











          $endgroup$



          Since the page is generated asynchronously, you can use the same data source that the page itself does. Using the network inspector in my browser, I discovered that the page loaded its data from this url. It's some API that delivers JSON, so we can use that data.



          The first thing we can do is parse the URL, so get an idea of the various parameters we can send to it:



          url = "https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID=1610612738&VsConference=&VsDivision=";
          URLParse[url]



          <|"Scheme" -> "https", "User" -> None, "Domain" -> "stats.nba.com",
          "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "", "GameSegment" -> "",
          "LastNGames" -> "0", "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0", "OpponentTeamID" -> "0",
          "Outcome" -> "", "PORound" -> "0", "PaceAdjust" -> "N",
          "PerMode" -> "PerGame", "Period" -> "0", "PlusMinus" -> "N",
          "Rank" -> "N", "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""}, "Fragment" -> None|>




          Now, you didn't say what you were looking for specifically, but it's clear enough that it would be possible to modify this (for instance, changing TeamID to get a different team, or DateFrom and DateTo to get specific dates. It's worth using your browser's network inspector while changing things in the web page to see more details about these fields.



          Anyway, let's just take the results of the initial URL.



          Import[url, "JSON"]


          Unfortunately, it seems like Mathematica's JSON parser doesn't like the output of that URL, even though it seems like completely valid JSON to me. Perhaps we've stumbled across a bug.



          A way to get around this is to use Python's JSON import instead, so let's do that:



          pythonImportJson[jsonText_] := 
          ExternalEvaluate["Python",
          "import json; json.loads("""" <> jsonText <> """")"]
          pythonImportJson[Import[url, "Text"]]


          The bug that causes our JSON to not import correctly is that Mathematica fails at importing JSON when the numbers are at a high precision. If we look at the output from that API, we see that many of the numbers have very high precision, for instance numbers like -8.300000000000000000000.



          A way to get around this is to replace repeated 0s with an empty string and import the resulting string, or to use the python method above.



          ImportString[
          StringReplace[Import[url], Repeated["0", {10, [Infinity]}] -> ""], "JSON"]


          Now we have our data in a nice association format.



          enter image description here



          Now, it seems to me like the structure of the JSON is essentially table rows and headings.



          If we look at the contents of any data in the resultSets association, we see headers and rowSet, and name. To me this seems like it's basically describing the exact table we see on the webpage.



          enter image description here



          We can transform this into a dataset by AssociationThreading the headers onto each row in the rowSets.



          headers = data[["resultSets", 1, "headers"]]
          rows = data[["resultSets", 1, "rowSet"]]
          Dataset[AssociationThread[headers, #] & /@ rows]


          enter image description here



          Comparing that with the first table on the webpage, it seems like we got it right:



          enter image description here



          Let's pile this into a function and then try it on the "Days Rest" table, which has a few more rows:



          createDataset[resultSet_] := 
          Module[{headers = resultSet["headers"], rows = resultSet["rowSet"]},
          Dataset[AssociationThread[headers, #] & /@ rows]]

          createDataset[data[["resultSets", 6]]]


          enter image description here



          Seems pretty good to me.



          Now we can create a workflow where we can easily modify the input parameters:



          url = URLBuild[URLBuild[
          <|"Scheme" -> "https", "User" -> None,
          "Domain" -> "stats.nba.com", "Port" -> None,
          "Path" -> {"", "stats", "teamdashboardbygeneralsplits"},
          "Query" -> {"DateFrom" -> "", "DateTo" -> "",
          "GameSegment" -> "", "LastNGames" -> "0",
          "LeagueID" -> "00", "Location" -> "",
          "MeasureType" -> "Base", "Month" -> "0",
          "OpponentTeamID" -> "0", "Outcome" -> "", "PORound" -> "0",
          "PaceAdjust" -> "N", "PerMode" -> "PerGame",
          "Period" -> "0", "PlusMinus" -> "N", "Rank" -> "N",
          "Season" -> "2018-19", "SeasonSegment" -> "",
          "SeasonType" -> "Regular Season", "ShotClockRange" -> "",
          "Split" -> "general", "TeamID" -> "1610612738",
          "VsConference" -> "", "VsDivision" -> ""},
          "Fragment" -> None|>];
          data = pythonImportJson[Import[url, "Text"]];

          Dataset@AssociationThread[data[["resultSets", All, "name"]],
          createDataset /@ data["resultSets"]]


          Which gives us a nice dataset of each table on the page.



          enter image description here



          Now it's easy to change the "TeamID" parameter to compare teams for instance.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 24 at 17:27

























          answered Jan 24 at 15:44









          Carl LangeCarl Lange

          3,5951732




          3,5951732








          • 3




            $begingroup$
            When 12 comes out, looks like WebExecute will be able the handle this
            $endgroup$
            – M.R.
            Jan 24 at 16:29










          • $begingroup$
            Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
            $endgroup$
            – Carl Lange
            Jan 24 at 16:31






          • 1




            $begingroup$
            (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
            $endgroup$
            – gwr
            Jan 24 at 16:38












          • $begingroup$
            @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
            $endgroup$
            – Carl Lange
            Jan 24 at 16:58










          • $begingroup$
            It fails because ImportString["-8.3000000000000000000000", "JSON"]
            $endgroup$
            – Kuba
            Jan 24 at 17:18














          • 3




            $begingroup$
            When 12 comes out, looks like WebExecute will be able the handle this
            $endgroup$
            – M.R.
            Jan 24 at 16:29










          • $begingroup$
            Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
            $endgroup$
            – Carl Lange
            Jan 24 at 16:31






          • 1




            $begingroup$
            (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
            $endgroup$
            – gwr
            Jan 24 at 16:38












          • $begingroup$
            @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
            $endgroup$
            – Carl Lange
            Jan 24 at 16:58










          • $begingroup$
            It fails because ImportString["-8.3000000000000000000000", "JSON"]
            $endgroup$
            – Kuba
            Jan 24 at 17:18








          3




          3




          $begingroup$
          When 12 comes out, looks like WebExecute will be able the handle this
          $endgroup$
          – M.R.
          Jan 24 at 16:29




          $begingroup$
          When 12 comes out, looks like WebExecute will be able the handle this
          $endgroup$
          – M.R.
          Jan 24 at 16:29












          $begingroup$
          Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
          $endgroup$
          – Carl Lange
          Jan 24 at 16:31




          $begingroup$
          Another reason to look forward to 12. I think there's actually a bug in Import[..., "JSON"] (as the output from the endpoint is definitely valid) - perhaps that will be fixed also.
          $endgroup$
          – Carl Lange
          Jan 24 at 16:31




          1




          1




          $begingroup$
          (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
          $endgroup$
          – gwr
          Jan 24 at 16:38






          $begingroup$
          (+1) Is Arnoud Buzing's WebTools usefull here? (see (69343) for the original WebUnit)
          $endgroup$
          – gwr
          Jan 24 at 16:38














          $begingroup$
          @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
          $endgroup$
          – Carl Lange
          Jan 24 at 16:58




          $begingroup$
          @gwr yes, probably - some of that functionality is actually in ExternalEvaluate in 11.3: ref/externalevaluationsystem/WebDriverChromeHeadless. Room for another answer here :)
          $endgroup$
          – Carl Lange
          Jan 24 at 16:58












          $begingroup$
          It fails because ImportString["-8.3000000000000000000000", "JSON"]
          $endgroup$
          – Kuba
          Jan 24 at 17:18




          $begingroup$
          It fails because ImportString["-8.3000000000000000000000", "JSON"]
          $endgroup$
          – Kuba
          Jan 24 at 17:18


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematica Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f190177%2fimport-data-from-dynamically-generated-webpage%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to change which sound is reproduced for terminal bell?

          Can I use Tabulator js library in my java Spring + Thymeleaf project?

          Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents