Regular expression with iterator












-1















I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))









share|improve this question























  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))

    – Wiktor Stribiżew
    Nov 19 '18 at 10:47








  • 1





    Maybe don't use regex to parse HTML content.

    – Tim Biegeleisen
    Nov 19 '18 at 10:47











  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.

    – Wiktor Stribiżew
    Nov 19 '18 at 11:05
















-1















I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))









share|improve this question























  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))

    – Wiktor Stribiżew
    Nov 19 '18 at 10:47








  • 1





    Maybe don't use regex to parse HTML content.

    – Tim Biegeleisen
    Nov 19 '18 at 10:47











  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.

    – Wiktor Stribiżew
    Nov 19 '18 at 11:05














-1












-1








-1








I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))









share|improve this question














I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))






regex python-3.x






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 '18 at 10:41









dzakobdzakob

11




11













  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))

    – Wiktor Stribiżew
    Nov 19 '18 at 10:47








  • 1





    Maybe don't use regex to parse HTML content.

    – Tim Biegeleisen
    Nov 19 '18 at 10:47











  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.

    – Wiktor Stribiżew
    Nov 19 '18 at 11:05



















  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))

    – Wiktor Stribiżew
    Nov 19 '18 at 10:47








  • 1





    Maybe don't use regex to parse HTML content.

    – Tim Biegeleisen
    Nov 19 '18 at 10:47











  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.

    – Wiktor Stribiżew
    Nov 19 '18 at 11:05

















Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))

– Wiktor Stribiżew
Nov 19 '18 at 10:47







Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))

– Wiktor Stribiżew
Nov 19 '18 at 10:47






1




1





Maybe don't use regex to parse HTML content.

– Tim Biegeleisen
Nov 19 '18 at 10:47





Maybe don't use regex to parse HTML content.

– Tim Biegeleisen
Nov 19 '18 at 10:47













BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.

– Wiktor Stribiżew
Nov 19 '18 at 11:05





BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.

– Wiktor Stribiżew
Nov 19 '18 at 11:05












1 Answer
1






active

oldest

votes


















0














You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



See example code:



results = 
for product in range(21):
min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
if min_prices_text:
results.append(re.sub(r'D+', '', min_prices_text.string))

print(results) # => ['29', '35']


Or use list(map(int, results)) if you want to cast the list of strings to integer.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372867%2fregular-expression-with-iterator%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



    See example code:



    results = 
    for product in range(21):
    min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
    if min_prices_text:
    results.append(re.sub(r'D+', '', min_prices_text.string))

    print(results) # => ['29', '35']


    Or use list(map(int, results)) if you want to cast the list of strings to integer.






    share|improve this answer




























      0














      You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



      See example code:



      results = 
      for product in range(21):
      min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
      if min_prices_text:
      results.append(re.sub(r'D+', '', min_prices_text.string))

      print(results) # => ['29', '35']


      Or use list(map(int, results)) if you want to cast the list of strings to integer.






      share|improve this answer


























        0












        0








        0







        You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



        See example code:



        results = 
        for product in range(21):
        min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
        if min_prices_text:
        results.append(re.sub(r'D+', '', min_prices_text.string))

        print(results) # => ['29', '35']


        Or use list(map(int, results)) if you want to cast the list of strings to integer.






        share|improve this answer













        You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



        See example code:



        results = 
        for product in range(21):
        min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
        if min_prices_text:
        results.append(re.sub(r'D+', '', min_prices_text.string))

        print(results) # => ['29', '35']


        Or use list(map(int, results)) if you want to cast the list of strings to integer.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 19 '18 at 11:13









        Wiktor StribiżewWiktor Stribiżew

        311k16131207




        311k16131207






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372867%2fregular-expression-with-iterator%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

            Can I use Tabulator js library in my java Spring + Thymeleaf project?