Is it possible to get text from a website in terminal












-1















I would like to get a timetable from my schools website and use it in a script to set automatic alerts but I don't know how.



So it seems my school uses FullCalendar to set the timetable so the times aren't HTML tags in the .html file.










share|improve this question




















  • 1





    What text? What website? It is very possible, but we can't help you parse data you don't show. Please edit your question and give us an example website and the text you want to extract from it.

    – terdon
    Dec 13 '18 at 11:27






  • 1





    Might be related: Data scraping with wget and regex, I suggest using python for this purpose.

    – Ravexina
    Dec 13 '18 at 11:30


















-1















I would like to get a timetable from my schools website and use it in a script to set automatic alerts but I don't know how.



So it seems my school uses FullCalendar to set the timetable so the times aren't HTML tags in the .html file.










share|improve this question




















  • 1





    What text? What website? It is very possible, but we can't help you parse data you don't show. Please edit your question and give us an example website and the text you want to extract from it.

    – terdon
    Dec 13 '18 at 11:27






  • 1





    Might be related: Data scraping with wget and regex, I suggest using python for this purpose.

    – Ravexina
    Dec 13 '18 at 11:30
















-1












-1








-1








I would like to get a timetable from my schools website and use it in a script to set automatic alerts but I don't know how.



So it seems my school uses FullCalendar to set the timetable so the times aren't HTML tags in the .html file.










share|improve this question
















I would like to get a timetable from my schools website and use it in a script to set automatic alerts but I don't know how.



So it seems my school uses FullCalendar to set the timetable so the times aren't HTML tags in the .html file.







bash scripts






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 13 '18 at 12:36







ELDIAZ 35

















asked Dec 13 '18 at 11:25









ELDIAZ 35ELDIAZ 35

45




45








  • 1





    What text? What website? It is very possible, but we can't help you parse data you don't show. Please edit your question and give us an example website and the text you want to extract from it.

    – terdon
    Dec 13 '18 at 11:27






  • 1





    Might be related: Data scraping with wget and regex, I suggest using python for this purpose.

    – Ravexina
    Dec 13 '18 at 11:30
















  • 1





    What text? What website? It is very possible, but we can't help you parse data you don't show. Please edit your question and give us an example website and the text you want to extract from it.

    – terdon
    Dec 13 '18 at 11:27






  • 1





    Might be related: Data scraping with wget and regex, I suggest using python for this purpose.

    – Ravexina
    Dec 13 '18 at 11:30










1




1





What text? What website? It is very possible, but we can't help you parse data you don't show. Please edit your question and give us an example website and the text you want to extract from it.

– terdon
Dec 13 '18 at 11:27





What text? What website? It is very possible, but we can't help you parse data you don't show. Please edit your question and give us an example website and the text you want to extract from it.

– terdon
Dec 13 '18 at 11:27




1




1





Might be related: Data scraping with wget and regex, I suggest using python for this purpose.

– Ravexina
Dec 13 '18 at 11:30







Might be related: Data scraping with wget and regex, I suggest using python for this purpose.

– Ravexina
Dec 13 '18 at 11:30












2 Answers
2






active

oldest

votes


















4














As we don't have the real website you want to scrape data from, and website scraping is always different if you don't have some standardized API, it's not possible to give a 100% working solution. But I'll try to explain a way to get to your information.



fullcalender.io is Javascript based, the events are set up as Javascript object or may be imported from json format. If the latter is the case, you can easily just download the ready json file that is referred to somewhere in the Javascript source code. Regarding parsing json, there are many Questions and Answers around here.



If it's set up as a Javascript Object, you can just parse the .js file or if it's included in a html <script> tag, parse the html for the $('#calendar').fullCalendar( object.



We can use curl to get the website, then extract the information using e.g. awk.





I made a small script to get the object for the fullcalender.io Basic View demo. Your script may look similar.



curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
| awk '/.fullCalendar({/{s=1; print "{"; next;};
/});/{s=0};
s{print};
END{print "}";}'


Explanation:





  • /.fullCalendar({/{s=1; print "{"; next;}; Searches .fullCalender({ and if found sets variable s=1 and prints {


  • /});/{s=0}; Searches for )}; and sets variable s=0


  • s{print}; prints the line if s is set and not 0.


  • END{print "}";}' prints the } at the end.


Output:



{
header: {
left: 'prev,next today',
center: 'title',
right: 'month,basicWeek,basicDay'
},
defaultDate: '2018-03-12',
navLinks: true, // can click day/week names to navigate views
editable: true,
eventLimit: true, // allow "more" link when too many events
events: [
{
title: 'All Day Event',
start: '2018-03-01'
},
{
title: 'Long Event',
start: '2018-03-07',
end: '2018-03-10'
},
{
id: 999,
title: 'Repeating Event',
start: '2018-03-09T16:00:00'
},
{
id: 999,
title: 'Repeating Event',
start: '2018-03-16T16:00:00'
},
{
title: 'Conference',
start: '2018-03-11',
end: '2018-03-13'
},
{
title: 'Meeting',
start: '2018-03-12T10:30:00',
end: '2018-03-12T12:30:00'
},
{
title: 'Lunch',
start: '2018-03-12T12:00:00'
},
{
title: 'Meeting',
start: '2018-03-12T14:30:00'
},
{
title: 'Happy Hour',
start: '2018-03-12T17:30:00'
},
{
title: 'Dinner',
start: '2018-03-12T20:00:00'
},
{
title: 'Birthday Party',
start: '2018-03-13T07:00:00'
},
{
title: 'Click for Google',
url: 'http://google.com/',
start: '2018-03-28'
}
]
}


You can then parse the JS object to a JSON object using python and demjson:



Install demjson:



pip3 install demjson


and then run this:



curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
| awk '/.fullCalendar({/{s=1; print "{"; next;};
/});/{s=0};
s{print};
END{print "}";}'
| python3 -c "import demjson, sys, json; print(json.dumps(demjson.decode('n'.join(sys.stdin.readlines()))));"
| jq ".events"


From here it should be fairly easy to move on using jq. Of course instead of bash and jq you can do the whole thing in Python.






share|improve this answer


























  • I think I only need the events. What is the awk script for just the events?

    – ELDIAZ 35
    Dec 13 '18 at 13:48











  • I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

    – RoVo
    Dec 13 '18 at 14:30





















0














The websync bash script uses wget to retrieve answers here in Ask Ubuntu. It searches HTML tags to find Question Upvotes and Answer Upvotes. It converts special HTML symbols such as &amp to & and &lt to <, etc.



Here are a few snippets from the code you may find helpful:





LineOut=""
HTMLtoText () {
LineOut=$1 # Parm 1= Input line
LineOut="${LineOut//&amp;/&}"
LineOut="${LineOut//&lt;/<}"
LineOut="${LineOut//&gt;/>}"
LineOut="${LineOut//&quot;/'"'}"
LineOut="${LineOut//'/"'"}"
LineOut="${LineOut//&ldquo;/'"'}"
LineOut="${LineOut//&rdquo;/'"'}"
} # HTMLtoText ()

Ampersand=$'46'

(... SNIP LINES ...)

while IFS= read -r Line; do

(... SNIP LINES ...)

# Convert HTML codes to normal characters
HTMLtoText $Line
Line="$LineOut"

(... SNIP LINES ...)

done < "/tmp/$AnswerID"

(... SNIP LINES ...)

wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
if [[ "$?" -ne 0 ]] # check return code for errors
then
# Sometimes a second attempt is required. Not sure why.
wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
fi
if [[ "$?" == 0 ]] # check return code for errors
then
echo "$BarNo:100" > "$PercentFile"
echo "$BarNo:#Download completed." > "$PercentFile"
else
echo "$BarNo:100" > "$PercentFile"
echo "$BarNo:#Download error." > "$PercentFile"
echo "ERROR: $AnswerID" >> ~/websync.log
return 1
fi





share|improve this answer

























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "89"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1100595%2fis-it-possible-to-get-text-from-a-website-in-terminal%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    As we don't have the real website you want to scrape data from, and website scraping is always different if you don't have some standardized API, it's not possible to give a 100% working solution. But I'll try to explain a way to get to your information.



    fullcalender.io is Javascript based, the events are set up as Javascript object or may be imported from json format. If the latter is the case, you can easily just download the ready json file that is referred to somewhere in the Javascript source code. Regarding parsing json, there are many Questions and Answers around here.



    If it's set up as a Javascript Object, you can just parse the .js file or if it's included in a html <script> tag, parse the html for the $('#calendar').fullCalendar( object.



    We can use curl to get the website, then extract the information using e.g. awk.





    I made a small script to get the object for the fullcalender.io Basic View demo. Your script may look similar.



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'


    Explanation:





    • /.fullCalendar({/{s=1; print "{"; next;}; Searches .fullCalender({ and if found sets variable s=1 and prints {


    • /});/{s=0}; Searches for )}; and sets variable s=0


    • s{print}; prints the line if s is set and not 0.


    • END{print "}";}' prints the } at the end.


    Output:



    {
    header: {
    left: 'prev,next today',
    center: 'title',
    right: 'month,basicWeek,basicDay'
    },
    defaultDate: '2018-03-12',
    navLinks: true, // can click day/week names to navigate views
    editable: true,
    eventLimit: true, // allow "more" link when too many events
    events: [
    {
    title: 'All Day Event',
    start: '2018-03-01'
    },
    {
    title: 'Long Event',
    start: '2018-03-07',
    end: '2018-03-10'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-09T16:00:00'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-16T16:00:00'
    },
    {
    title: 'Conference',
    start: '2018-03-11',
    end: '2018-03-13'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T10:30:00',
    end: '2018-03-12T12:30:00'
    },
    {
    title: 'Lunch',
    start: '2018-03-12T12:00:00'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T14:30:00'
    },
    {
    title: 'Happy Hour',
    start: '2018-03-12T17:30:00'
    },
    {
    title: 'Dinner',
    start: '2018-03-12T20:00:00'
    },
    {
    title: 'Birthday Party',
    start: '2018-03-13T07:00:00'
    },
    {
    title: 'Click for Google',
    url: 'http://google.com/',
    start: '2018-03-28'
    }
    ]
    }


    You can then parse the JS object to a JSON object using python and demjson:



    Install demjson:



    pip3 install demjson


    and then run this:



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'
    | python3 -c "import demjson, sys, json; print(json.dumps(demjson.decode('n'.join(sys.stdin.readlines()))));"
    | jq ".events"


    From here it should be fairly easy to move on using jq. Of course instead of bash and jq you can do the whole thing in Python.






    share|improve this answer


























    • I think I only need the events. What is the awk script for just the events?

      – ELDIAZ 35
      Dec 13 '18 at 13:48











    • I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

      – RoVo
      Dec 13 '18 at 14:30


















    4














    As we don't have the real website you want to scrape data from, and website scraping is always different if you don't have some standardized API, it's not possible to give a 100% working solution. But I'll try to explain a way to get to your information.



    fullcalender.io is Javascript based, the events are set up as Javascript object or may be imported from json format. If the latter is the case, you can easily just download the ready json file that is referred to somewhere in the Javascript source code. Regarding parsing json, there are many Questions and Answers around here.



    If it's set up as a Javascript Object, you can just parse the .js file or if it's included in a html <script> tag, parse the html for the $('#calendar').fullCalendar( object.



    We can use curl to get the website, then extract the information using e.g. awk.





    I made a small script to get the object for the fullcalender.io Basic View demo. Your script may look similar.



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'


    Explanation:





    • /.fullCalendar({/{s=1; print "{"; next;}; Searches .fullCalender({ and if found sets variable s=1 and prints {


    • /});/{s=0}; Searches for )}; and sets variable s=0


    • s{print}; prints the line if s is set and not 0.


    • END{print "}";}' prints the } at the end.


    Output:



    {
    header: {
    left: 'prev,next today',
    center: 'title',
    right: 'month,basicWeek,basicDay'
    },
    defaultDate: '2018-03-12',
    navLinks: true, // can click day/week names to navigate views
    editable: true,
    eventLimit: true, // allow "more" link when too many events
    events: [
    {
    title: 'All Day Event',
    start: '2018-03-01'
    },
    {
    title: 'Long Event',
    start: '2018-03-07',
    end: '2018-03-10'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-09T16:00:00'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-16T16:00:00'
    },
    {
    title: 'Conference',
    start: '2018-03-11',
    end: '2018-03-13'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T10:30:00',
    end: '2018-03-12T12:30:00'
    },
    {
    title: 'Lunch',
    start: '2018-03-12T12:00:00'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T14:30:00'
    },
    {
    title: 'Happy Hour',
    start: '2018-03-12T17:30:00'
    },
    {
    title: 'Dinner',
    start: '2018-03-12T20:00:00'
    },
    {
    title: 'Birthday Party',
    start: '2018-03-13T07:00:00'
    },
    {
    title: 'Click for Google',
    url: 'http://google.com/',
    start: '2018-03-28'
    }
    ]
    }


    You can then parse the JS object to a JSON object using python and demjson:



    Install demjson:



    pip3 install demjson


    and then run this:



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'
    | python3 -c "import demjson, sys, json; print(json.dumps(demjson.decode('n'.join(sys.stdin.readlines()))));"
    | jq ".events"


    From here it should be fairly easy to move on using jq. Of course instead of bash and jq you can do the whole thing in Python.






    share|improve this answer


























    • I think I only need the events. What is the awk script for just the events?

      – ELDIAZ 35
      Dec 13 '18 at 13:48











    • I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

      – RoVo
      Dec 13 '18 at 14:30
















    4












    4








    4







    As we don't have the real website you want to scrape data from, and website scraping is always different if you don't have some standardized API, it's not possible to give a 100% working solution. But I'll try to explain a way to get to your information.



    fullcalender.io is Javascript based, the events are set up as Javascript object or may be imported from json format. If the latter is the case, you can easily just download the ready json file that is referred to somewhere in the Javascript source code. Regarding parsing json, there are many Questions and Answers around here.



    If it's set up as a Javascript Object, you can just parse the .js file or if it's included in a html <script> tag, parse the html for the $('#calendar').fullCalendar( object.



    We can use curl to get the website, then extract the information using e.g. awk.





    I made a small script to get the object for the fullcalender.io Basic View demo. Your script may look similar.



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'


    Explanation:





    • /.fullCalendar({/{s=1; print "{"; next;}; Searches .fullCalender({ and if found sets variable s=1 and prints {


    • /});/{s=0}; Searches for )}; and sets variable s=0


    • s{print}; prints the line if s is set and not 0.


    • END{print "}";}' prints the } at the end.


    Output:



    {
    header: {
    left: 'prev,next today',
    center: 'title',
    right: 'month,basicWeek,basicDay'
    },
    defaultDate: '2018-03-12',
    navLinks: true, // can click day/week names to navigate views
    editable: true,
    eventLimit: true, // allow "more" link when too many events
    events: [
    {
    title: 'All Day Event',
    start: '2018-03-01'
    },
    {
    title: 'Long Event',
    start: '2018-03-07',
    end: '2018-03-10'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-09T16:00:00'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-16T16:00:00'
    },
    {
    title: 'Conference',
    start: '2018-03-11',
    end: '2018-03-13'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T10:30:00',
    end: '2018-03-12T12:30:00'
    },
    {
    title: 'Lunch',
    start: '2018-03-12T12:00:00'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T14:30:00'
    },
    {
    title: 'Happy Hour',
    start: '2018-03-12T17:30:00'
    },
    {
    title: 'Dinner',
    start: '2018-03-12T20:00:00'
    },
    {
    title: 'Birthday Party',
    start: '2018-03-13T07:00:00'
    },
    {
    title: 'Click for Google',
    url: 'http://google.com/',
    start: '2018-03-28'
    }
    ]
    }


    You can then parse the JS object to a JSON object using python and demjson:



    Install demjson:



    pip3 install demjson


    and then run this:



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'
    | python3 -c "import demjson, sys, json; print(json.dumps(demjson.decode('n'.join(sys.stdin.readlines()))));"
    | jq ".events"


    From here it should be fairly easy to move on using jq. Of course instead of bash and jq you can do the whole thing in Python.






    share|improve this answer















    As we don't have the real website you want to scrape data from, and website scraping is always different if you don't have some standardized API, it's not possible to give a 100% working solution. But I'll try to explain a way to get to your information.



    fullcalender.io is Javascript based, the events are set up as Javascript object or may be imported from json format. If the latter is the case, you can easily just download the ready json file that is referred to somewhere in the Javascript source code. Regarding parsing json, there are many Questions and Answers around here.



    If it's set up as a Javascript Object, you can just parse the .js file or if it's included in a html <script> tag, parse the html for the $('#calendar').fullCalendar( object.



    We can use curl to get the website, then extract the information using e.g. awk.





    I made a small script to get the object for the fullcalender.io Basic View demo. Your script may look similar.



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'


    Explanation:





    • /.fullCalendar({/{s=1; print "{"; next;}; Searches .fullCalender({ and if found sets variable s=1 and prints {


    • /});/{s=0}; Searches for )}; and sets variable s=0


    • s{print}; prints the line if s is set and not 0.


    • END{print "}";}' prints the } at the end.


    Output:



    {
    header: {
    left: 'prev,next today',
    center: 'title',
    right: 'month,basicWeek,basicDay'
    },
    defaultDate: '2018-03-12',
    navLinks: true, // can click day/week names to navigate views
    editable: true,
    eventLimit: true, // allow "more" link when too many events
    events: [
    {
    title: 'All Day Event',
    start: '2018-03-01'
    },
    {
    title: 'Long Event',
    start: '2018-03-07',
    end: '2018-03-10'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-09T16:00:00'
    },
    {
    id: 999,
    title: 'Repeating Event',
    start: '2018-03-16T16:00:00'
    },
    {
    title: 'Conference',
    start: '2018-03-11',
    end: '2018-03-13'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T10:30:00',
    end: '2018-03-12T12:30:00'
    },
    {
    title: 'Lunch',
    start: '2018-03-12T12:00:00'
    },
    {
    title: 'Meeting',
    start: '2018-03-12T14:30:00'
    },
    {
    title: 'Happy Hour',
    start: '2018-03-12T17:30:00'
    },
    {
    title: 'Dinner',
    start: '2018-03-12T20:00:00'
    },
    {
    title: 'Birthday Party',
    start: '2018-03-13T07:00:00'
    },
    {
    title: 'Click for Google',
    url: 'http://google.com/',
    start: '2018-03-28'
    }
    ]
    }


    You can then parse the JS object to a JSON object using python and demjson:



    Install demjson:



    pip3 install demjson


    and then run this:



    curl -s https://fullcalendar.io/releases/fullcalendar/3.9.0/demos/basic-views.html 
    | awk '/.fullCalendar({/{s=1; print "{"; next;};
    /});/{s=0};
    s{print};
    END{print "}";}'
    | python3 -c "import demjson, sys, json; print(json.dumps(demjson.decode('n'.join(sys.stdin.readlines()))));"
    | jq ".events"


    From here it should be fairly easy to move on using jq. Of course instead of bash and jq you can do the whole thing in Python.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 13 '18 at 14:29

























    answered Dec 13 '18 at 13:05









    RoVoRoVo

    7,1111741




    7,1111741













    • I think I only need the events. What is the awk script for just the events?

      – ELDIAZ 35
      Dec 13 '18 at 13:48











    • I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

      – RoVo
      Dec 13 '18 at 14:30





















    • I think I only need the events. What is the awk script for just the events?

      – ELDIAZ 35
      Dec 13 '18 at 13:48











    • I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

      – RoVo
      Dec 13 '18 at 14:30



















    I think I only need the events. What is the awk script for just the events?

    – ELDIAZ 35
    Dec 13 '18 at 13:48





    I think I only need the events. What is the awk script for just the events?

    – ELDIAZ 35
    Dec 13 '18 at 13:48













    I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

    – RoVo
    Dec 13 '18 at 14:30







    I added a way to convert the JS object to a JSON string using python, which you can then parse with jq which should be the preferred tool to parse json. Don't use awk for that ... It will only cause pain ...

    – RoVo
    Dec 13 '18 at 14:30















    0














    The websync bash script uses wget to retrieve answers here in Ask Ubuntu. It searches HTML tags to find Question Upvotes and Answer Upvotes. It converts special HTML symbols such as &amp to & and &lt to <, etc.



    Here are a few snippets from the code you may find helpful:





    LineOut=""
    HTMLtoText () {
    LineOut=$1 # Parm 1= Input line
    LineOut="${LineOut//&amp;/&}"
    LineOut="${LineOut//&lt;/<}"
    LineOut="${LineOut//&gt;/>}"
    LineOut="${LineOut//&quot;/'"'}"
    LineOut="${LineOut//'/"'"}"
    LineOut="${LineOut//&ldquo;/'"'}"
    LineOut="${LineOut//&rdquo;/'"'}"
    } # HTMLtoText ()

    Ampersand=$'46'

    (... SNIP LINES ...)

    while IFS= read -r Line; do

    (... SNIP LINES ...)

    # Convert HTML codes to normal characters
    HTMLtoText $Line
    Line="$LineOut"

    (... SNIP LINES ...)

    done < "/tmp/$AnswerID"

    (... SNIP LINES ...)

    wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
    if [[ "$?" -ne 0 ]] # check return code for errors
    then
    # Sometimes a second attempt is required. Not sure why.
    wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
    fi
    if [[ "$?" == 0 ]] # check return code for errors
    then
    echo "$BarNo:100" > "$PercentFile"
    echo "$BarNo:#Download completed." > "$PercentFile"
    else
    echo "$BarNo:100" > "$PercentFile"
    echo "$BarNo:#Download error." > "$PercentFile"
    echo "ERROR: $AnswerID" >> ~/websync.log
    return 1
    fi





    share|improve this answer






























      0














      The websync bash script uses wget to retrieve answers here in Ask Ubuntu. It searches HTML tags to find Question Upvotes and Answer Upvotes. It converts special HTML symbols such as &amp to & and &lt to <, etc.



      Here are a few snippets from the code you may find helpful:





      LineOut=""
      HTMLtoText () {
      LineOut=$1 # Parm 1= Input line
      LineOut="${LineOut//&amp;/&}"
      LineOut="${LineOut//&lt;/<}"
      LineOut="${LineOut//&gt;/>}"
      LineOut="${LineOut//&quot;/'"'}"
      LineOut="${LineOut//'/"'"}"
      LineOut="${LineOut//&ldquo;/'"'}"
      LineOut="${LineOut//&rdquo;/'"'}"
      } # HTMLtoText ()

      Ampersand=$'46'

      (... SNIP LINES ...)

      while IFS= read -r Line; do

      (... SNIP LINES ...)

      # Convert HTML codes to normal characters
      HTMLtoText $Line
      Line="$LineOut"

      (... SNIP LINES ...)

      done < "/tmp/$AnswerID"

      (... SNIP LINES ...)

      wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
      if [[ "$?" -ne 0 ]] # check return code for errors
      then
      # Sometimes a second attempt is required. Not sure why.
      wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
      fi
      if [[ "$?" == 0 ]] # check return code for errors
      then
      echo "$BarNo:100" > "$PercentFile"
      echo "$BarNo:#Download completed." > "$PercentFile"
      else
      echo "$BarNo:100" > "$PercentFile"
      echo "$BarNo:#Download error." > "$PercentFile"
      echo "ERROR: $AnswerID" >> ~/websync.log
      return 1
      fi





      share|improve this answer




























        0












        0








        0







        The websync bash script uses wget to retrieve answers here in Ask Ubuntu. It searches HTML tags to find Question Upvotes and Answer Upvotes. It converts special HTML symbols such as &amp to & and &lt to <, etc.



        Here are a few snippets from the code you may find helpful:





        LineOut=""
        HTMLtoText () {
        LineOut=$1 # Parm 1= Input line
        LineOut="${LineOut//&amp;/&}"
        LineOut="${LineOut//&lt;/<}"
        LineOut="${LineOut//&gt;/>}"
        LineOut="${LineOut//&quot;/'"'}"
        LineOut="${LineOut//'/"'"}"
        LineOut="${LineOut//&ldquo;/'"'}"
        LineOut="${LineOut//&rdquo;/'"'}"
        } # HTMLtoText ()

        Ampersand=$'46'

        (... SNIP LINES ...)

        while IFS= read -r Line; do

        (... SNIP LINES ...)

        # Convert HTML codes to normal characters
        HTMLtoText $Line
        Line="$LineOut"

        (... SNIP LINES ...)

        done < "/tmp/$AnswerID"

        (... SNIP LINES ...)

        wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
        if [[ "$?" -ne 0 ]] # check return code for errors
        then
        # Sometimes a second attempt is required. Not sure why.
        wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
        fi
        if [[ "$?" == 0 ]] # check return code for errors
        then
        echo "$BarNo:100" > "$PercentFile"
        echo "$BarNo:#Download completed." > "$PercentFile"
        else
        echo "$BarNo:100" > "$PercentFile"
        echo "$BarNo:#Download error." > "$PercentFile"
        echo "ERROR: $AnswerID" >> ~/websync.log
        return 1
        fi





        share|improve this answer















        The websync bash script uses wget to retrieve answers here in Ask Ubuntu. It searches HTML tags to find Question Upvotes and Answer Upvotes. It converts special HTML symbols such as &amp to & and &lt to <, etc.



        Here are a few snippets from the code you may find helpful:





        LineOut=""
        HTMLtoText () {
        LineOut=$1 # Parm 1= Input line
        LineOut="${LineOut//&amp;/&}"
        LineOut="${LineOut//&lt;/<}"
        LineOut="${LineOut//&gt;/>}"
        LineOut="${LineOut//&quot;/'"'}"
        LineOut="${LineOut//'/"'"}"
        LineOut="${LineOut//&ldquo;/'"'}"
        LineOut="${LineOut//&rdquo;/'"'}"
        } # HTMLtoText ()

        Ampersand=$'46'

        (... SNIP LINES ...)

        while IFS= read -r Line; do

        (... SNIP LINES ...)

        # Convert HTML codes to normal characters
        HTMLtoText $Line
        Line="$LineOut"

        (... SNIP LINES ...)

        done < "/tmp/$AnswerID"

        (... SNIP LINES ...)

        wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
        if [[ "$?" -ne 0 ]] # check return code for errors
        then
        # Sometimes a second attempt is required. Not sure why.
        wget -O- "${RecArr[$ColWebAddr]}" > "/tmp/$AnswerID"
        fi
        if [[ "$?" == 0 ]] # check return code for errors
        then
        echo "$BarNo:100" > "$PercentFile"
        echo "$BarNo:#Download completed." > "$PercentFile"
        else
        echo "$BarNo:100" > "$PercentFile"
        echo "$BarNo:#Download error." > "$PercentFile"
        echo "ERROR: $AnswerID" >> ~/websync.log
        return 1
        fi






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 14 '18 at 0:05

























        answered Dec 13 '18 at 11:58









        WinEunuuchs2UnixWinEunuuchs2Unix

        44.8k1080171




        44.8k1080171






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Ask Ubuntu!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1100595%2fis-it-possible-to-get-text-from-a-website-in-terminal%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to send String Array data to Server using php in android

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

            Is anime1.com a legal site for watching anime?