Are spaces around CSS combinators are really optional












0















I'm a bit confused by using CSS selectors with axis combinators in BeautifulSoup. Below is the simple code to illustrate what I mean:



from bs4 import BeautifulSoup as bs
import requests

response = requests.get('https://stackoverflow.com/questions/tagged/python')
soup = bs(response.text)

print(len(soup.select('#mainbar > div')))


returns 6 children... but



print(len(soup.select('#mainbar>div')))


returns 0 children...



The same with '#mainbar ~ div' (found 1 sibling) and #mainbar~div' (found nothing)



From documentation those spaces are optional, but in fact I got different output with BeautifulSoup for the same selectors (as I thought)



So is it bs4 bug or this behavior depends on version of CSS or something else?










share|improve this question























  • Why don't you just not do that? If I inherited code like that it would make me unhappy.

    – pguardiario
    Nov 21 '18 at 1:23
















0















I'm a bit confused by using CSS selectors with axis combinators in BeautifulSoup. Below is the simple code to illustrate what I mean:



from bs4 import BeautifulSoup as bs
import requests

response = requests.get('https://stackoverflow.com/questions/tagged/python')
soup = bs(response.text)

print(len(soup.select('#mainbar > div')))


returns 6 children... but



print(len(soup.select('#mainbar>div')))


returns 0 children...



The same with '#mainbar ~ div' (found 1 sibling) and #mainbar~div' (found nothing)



From documentation those spaces are optional, but in fact I got different output with BeautifulSoup for the same selectors (as I thought)



So is it bs4 bug or this behavior depends on version of CSS or something else?










share|improve this question























  • Why don't you just not do that? If I inherited code like that it would make me unhappy.

    – pguardiario
    Nov 21 '18 at 1:23














0












0








0








I'm a bit confused by using CSS selectors with axis combinators in BeautifulSoup. Below is the simple code to illustrate what I mean:



from bs4 import BeautifulSoup as bs
import requests

response = requests.get('https://stackoverflow.com/questions/tagged/python')
soup = bs(response.text)

print(len(soup.select('#mainbar > div')))


returns 6 children... but



print(len(soup.select('#mainbar>div')))


returns 0 children...



The same with '#mainbar ~ div' (found 1 sibling) and #mainbar~div' (found nothing)



From documentation those spaces are optional, but in fact I got different output with BeautifulSoup for the same selectors (as I thought)



So is it bs4 bug or this behavior depends on version of CSS or something else?










share|improve this question














I'm a bit confused by using CSS selectors with axis combinators in BeautifulSoup. Below is the simple code to illustrate what I mean:



from bs4 import BeautifulSoup as bs
import requests

response = requests.get('https://stackoverflow.com/questions/tagged/python')
soup = bs(response.text)

print(len(soup.select('#mainbar > div')))


returns 6 children... but



print(len(soup.select('#mainbar>div')))


returns 0 children...



The same with '#mainbar ~ div' (found 1 sibling) and #mainbar~div' (found nothing)



From documentation those spaces are optional, but in fact I got different output with BeautifulSoup for the same selectors (as I thought)



So is it bs4 bug or this behavior depends on version of CSS or something else?







python web-scraping beautifulsoup css-selectors






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 21:21









JaSONJaSON

5319




5319













  • Why don't you just not do that? If I inherited code like that it would make me unhappy.

    – pguardiario
    Nov 21 '18 at 1:23



















  • Why don't you just not do that? If I inherited code like that it would make me unhappy.

    – pguardiario
    Nov 21 '18 at 1:23

















Why don't you just not do that? If I inherited code like that it would make me unhappy.

– pguardiario
Nov 21 '18 at 1:23





Why don't you just not do that? If I inherited code like that it would make me unhappy.

– pguardiario
Nov 21 '18 at 1:23












2 Answers
2






active

oldest

votes


















3














This is confirmed as a bug here: https://bugs.launchpad.net/beautifulsoup/+bug/1717851



The selector, from a CSS perspective is fine with/without.



I will see if I can find further evidence.



The individual reporting the bug states:




The issue, as far as I see, is that since the code is only doing a
shlex.split, it doesn't treat div, >, and span as separate
entities is a space is left out on either side of >.







share|improve this answer
























  • Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

    – JaSON
    Nov 20 '18 at 21:43













  • I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

    – QHarr
    Nov 20 '18 at 21:43











  • There is no additional mention in the development log: code.launchpad.net/beautifulsoup

    – QHarr
    Nov 20 '18 at 21:59











  • Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

    – JaSON
    Nov 20 '18 at 22:04











  • Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

    – QHarr
    Nov 20 '18 at 22:05



















2














in case you want to patch it, see bs4/element.py line 1440 replace



tokens = shlex.split(selector)


with



selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
tokens = shlex.split(selector)


Demo:






<script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

<div data-datacamp-exercise data-lang="python">
<code data-type="sample-code">
import re, shlex

def testSelect(selector):
selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
tokens = shlex.split(selector)
print(tokens)

testSelect('#mainbar > div ~ p') # default
testSelect('#mainbar>div~p')
testSelect('#mainbar >div+ p')
testSelect('#mainbar.classA')
testSelect('#mainbar p')
</code>
</div>








share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401724%2fare-spaces-around-css-combinators-are-really-optional%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    This is confirmed as a bug here: https://bugs.launchpad.net/beautifulsoup/+bug/1717851



    The selector, from a CSS perspective is fine with/without.



    I will see if I can find further evidence.



    The individual reporting the bug states:




    The issue, as far as I see, is that since the code is only doing a
    shlex.split, it doesn't treat div, >, and span as separate
    entities is a space is left out on either side of >.







    share|improve this answer
























    • Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

      – JaSON
      Nov 20 '18 at 21:43













    • I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

      – QHarr
      Nov 20 '18 at 21:43











    • There is no additional mention in the development log: code.launchpad.net/beautifulsoup

      – QHarr
      Nov 20 '18 at 21:59











    • Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

      – JaSON
      Nov 20 '18 at 22:04











    • Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

      – QHarr
      Nov 20 '18 at 22:05
















    3














    This is confirmed as a bug here: https://bugs.launchpad.net/beautifulsoup/+bug/1717851



    The selector, from a CSS perspective is fine with/without.



    I will see if I can find further evidence.



    The individual reporting the bug states:




    The issue, as far as I see, is that since the code is only doing a
    shlex.split, it doesn't treat div, >, and span as separate
    entities is a space is left out on either side of >.







    share|improve this answer
























    • Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

      – JaSON
      Nov 20 '18 at 21:43













    • I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

      – QHarr
      Nov 20 '18 at 21:43











    • There is no additional mention in the development log: code.launchpad.net/beautifulsoup

      – QHarr
      Nov 20 '18 at 21:59











    • Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

      – JaSON
      Nov 20 '18 at 22:04











    • Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

      – QHarr
      Nov 20 '18 at 22:05














    3












    3








    3







    This is confirmed as a bug here: https://bugs.launchpad.net/beautifulsoup/+bug/1717851



    The selector, from a CSS perspective is fine with/without.



    I will see if I can find further evidence.



    The individual reporting the bug states:




    The issue, as far as I see, is that since the code is only doing a
    shlex.split, it doesn't treat div, >, and span as separate
    entities is a space is left out on either side of >.







    share|improve this answer













    This is confirmed as a bug here: https://bugs.launchpad.net/beautifulsoup/+bug/1717851



    The selector, from a CSS perspective is fine with/without.



    I will see if I can find further evidence.



    The individual reporting the bug states:




    The issue, as far as I see, is that since the code is only doing a
    shlex.split, it doesn't treat div, >, and span as separate
    entities is a space is left out on either side of >.








    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 20 '18 at 21:36









    QHarrQHarr

    33.6k82043




    33.6k82043













    • Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

      – JaSON
      Nov 20 '18 at 21:43













    • I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

      – QHarr
      Nov 20 '18 at 21:43











    • There is no additional mention in the development log: code.launchpad.net/beautifulsoup

      – QHarr
      Nov 20 '18 at 21:59











    • Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

      – JaSON
      Nov 20 '18 at 22:04











    • Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

      – QHarr
      Nov 20 '18 at 22:05



















    • Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

      – JaSON
      Nov 20 '18 at 21:43













    • I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

      – QHarr
      Nov 20 '18 at 21:43











    • There is no additional mention in the development log: code.launchpad.net/beautifulsoup

      – QHarr
      Nov 20 '18 at 21:59











    • Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

      – JaSON
      Nov 20 '18 at 22:04











    • Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

      – QHarr
      Nov 20 '18 at 22:05

















    Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

    – JaSON
    Nov 20 '18 at 21:43







    Thanks for the link. However, in bug description user gets ValueError while I'm just got an empty list... Maybe this was some kind of quick fix for not breaking the scripts...

    – JaSON
    Nov 20 '18 at 21:43















    I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

    – QHarr
    Nov 20 '18 at 21:43





    I can’t honestly say though sounds plausible. I am looking to see if I can find anything more up to date.

    – QHarr
    Nov 20 '18 at 21:43













    There is no additional mention in the development log: code.launchpad.net/beautifulsoup

    – QHarr
    Nov 20 '18 at 21:59





    There is no additional mention in the development log: code.launchpad.net/beautifulsoup

    – QHarr
    Nov 20 '18 at 21:59













    Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

    – JaSON
    Nov 20 '18 at 22:04





    Thank you for help. I just wanted to understand whether bs4 is good for scraping or not.. and as far as I can see - not so good :)

    – JaSON
    Nov 20 '18 at 22:04













    Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

    – QHarr
    Nov 20 '18 at 22:05





    Bs4 is great for scraping in my limited experience. You just need to remember the spaces it would seem. The appropriate spaces make for more legible selectors.

    – QHarr
    Nov 20 '18 at 22:05













    2














    in case you want to patch it, see bs4/element.py line 1440 replace



    tokens = shlex.split(selector)


    with



    selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
    tokens = shlex.split(selector)


    Demo:






    <script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

    <div data-datacamp-exercise data-lang="python">
    <code data-type="sample-code">
    import re, shlex

    def testSelect(selector):
    selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
    tokens = shlex.split(selector)
    print(tokens)

    testSelect('#mainbar > div ~ p') # default
    testSelect('#mainbar>div~p')
    testSelect('#mainbar >div+ p')
    testSelect('#mainbar.classA')
    testSelect('#mainbar p')
    </code>
    </div>








    share|improve this answer






























      2














      in case you want to patch it, see bs4/element.py line 1440 replace



      tokens = shlex.split(selector)


      with



      selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
      tokens = shlex.split(selector)


      Demo:






      <script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

      <div data-datacamp-exercise data-lang="python">
      <code data-type="sample-code">
      import re, shlex

      def testSelect(selector):
      selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
      tokens = shlex.split(selector)
      print(tokens)

      testSelect('#mainbar > div ~ p') # default
      testSelect('#mainbar>div~p')
      testSelect('#mainbar >div+ p')
      testSelect('#mainbar.classA')
      testSelect('#mainbar p')
      </code>
      </div>








      share|improve this answer




























        2












        2








        2







        in case you want to patch it, see bs4/element.py line 1440 replace



        tokens = shlex.split(selector)


        with



        selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
        tokens = shlex.split(selector)


        Demo:






        <script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

        <div data-datacamp-exercise data-lang="python">
        <code data-type="sample-code">
        import re, shlex

        def testSelect(selector):
        selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
        tokens = shlex.split(selector)
        print(tokens)

        testSelect('#mainbar > div ~ p') # default
        testSelect('#mainbar>div~p')
        testSelect('#mainbar >div+ p')
        testSelect('#mainbar.classA')
        testSelect('#mainbar p')
        </code>
        </div>








        share|improve this answer















        in case you want to patch it, see bs4/element.py line 1440 replace



        tokens = shlex.split(selector)


        with



        selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
        tokens = shlex.split(selector)


        Demo:






        <script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

        <div data-datacamp-exercise data-lang="python">
        <code data-type="sample-code">
        import re, shlex

        def testSelect(selector):
        selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
        tokens = shlex.split(selector)
        print(tokens)

        testSelect('#mainbar > div ~ p') # default
        testSelect('#mainbar>div~p')
        testSelect('#mainbar >div+ p')
        testSelect('#mainbar.classA')
        testSelect('#mainbar p')
        </code>
        </div>








        <script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

        <div data-datacamp-exercise data-lang="python">
        <code data-type="sample-code">
        import re, shlex

        def testSelect(selector):
        selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
        tokens = shlex.split(selector)
        print(tokens)

        testSelect('#mainbar > div ~ p') # default
        testSelect('#mainbar>div~p')
        testSelect('#mainbar >div+ p')
        testSelect('#mainbar.classA')
        testSelect('#mainbar p')
        </code>
        </div>





        <script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

        <div data-datacamp-exercise data-lang="python">
        <code data-type="sample-code">
        import re, shlex

        def testSelect(selector):
        selector = re.sub(r's*([+>~])s*', r' 1 ', selector)
        tokens = shlex.split(selector)
        print(tokens)

        testSelect('#mainbar > div ~ p') # default
        testSelect('#mainbar>div~p')
        testSelect('#mainbar >div+ p')
        testSelect('#mainbar.classA')
        testSelect('#mainbar p')
        </code>
        </div>






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 22 '18 at 6:25

























        answered Nov 20 '18 at 22:51









        ewwinkewwink

        12k22339




        12k22339






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401724%2fare-spaces-around-css-combinators-are-really-optional%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

            Can I use Tabulator js library in my java Spring + Thymeleaf project?