find newline with words starting with underscore with specific pattern












3















I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()









share|improve this question

























  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

    – Wiktor Stribiżew
    Nov 21 '18 at 16:55


















3















I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()









share|improve this question

























  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

    – Wiktor Stribiżew
    Nov 21 '18 at 16:55
















3












3








3








I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()









share|improve this question
















I need to find the following from c code using regular expression python but some how i could not write it properly.



if(condition)
/*~T*/
{
/*~T*/
_getmethis = FALSE;
/*~T*/
}
..........
/*~T*/
_findmethis = FALSE;
......
/*~T*/
_findthat = True;


I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file



import re
fh = open('filename.c', "r")
output = open("output.txt", "w")
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')
for line in fh:
for m in re.finditer(pattern, line):
output.write(m.group(3))
output.write("n")

output.close()






regex python-3.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 15:57







fastlearner

















asked Nov 21 '18 at 15:44









fastlearnerfastlearner

3117




3117













  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

    – Wiktor Stribiżew
    Nov 21 '18 at 16:55





















  • [aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

    – Wiktor Stribiżew
    Nov 21 '18 at 16:55



















[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55







[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55














3 Answers
3






active

oldest

votes


















1














You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



The pattern I suggest is



(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



When reading files in, it is more convenient to use with so that you do not have to use .close():



import re
pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

with open('filename.c', "r") as fh:
contents = fh.read()
with open("output.txt", "w") as output:
output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





share|improve this answer































    1














    The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



    Consider using this:



    t = """
    if(condition)
    /*~-*/
    {
    /*~T*/
    _getmethis = FALSE;
    /*~-*/
    }
    ..........
    /*~T*/
    _findmethis = FALSE;

    /*~T*/
    do_not_findme_this = FALSE;
    """

    import re
    pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
    for m in re.finditer(pattern, t): # use the whole file here - not line-wise
    print(m.group(1))


    The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



    Printout:



    _getmethis
    _findmethis


    Doku:




    • re.MULTILINE

    • re.DOTALL






    share|improve this answer


























    • I am so silly of it that i always check the regex but not the python. I will try this

      – fastlearner
      Nov 21 '18 at 16:00











    • but this also finds the words if the underscore is in the middle of a variable

      – fastlearner
      Nov 21 '18 at 17:46











    • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

      – Patrick Artner
      Nov 21 '18 at 18:05



















    0














    This is my final version where i also try to avoid duplicates



    import re
    fh = open('filename.c', "r")
    filecontent = fh.read()
    output = open("output.txt", "w")
    createlist =
    pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")
    for m in re.finditer(pattern, filecontent):
    if m.group(3) not in createlist:
    createlist.append(m.group(3))
    output.write(m.group(3))
    output.write('n')
    output.close()





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415684%2ffind-newline-with-words-starting-with-underscore-with-specific-pattern%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



      The pattern I suggest is



      (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


      See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



      When reading files in, it is more convenient to use with so that you do not have to use .close():



      import re
      pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

      with open('filename.c', "r") as fh:
      contents = fh.read()
      with open("output.txt", "w") as output:
      output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





      share|improve this answer




























        1














        You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



        The pattern I suggest is



        (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


        See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



        When reading files in, it is more convenient to use with so that you do not have to use .close():



        import re
        pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

        with open('filename.c', "r") as fh:
        contents = fh.read()
        with open("output.txt", "w") as output:
        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





        share|improve this answer


























          1












          1








          1







          You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



          The pattern I suggest is



          (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


          See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



          When reading files in, it is more convenient to use with so that you do not have to use .close():



          import re
          pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

          with open('filename.c', "r") as fh:
          contents = fh.read()
          with open("output.txt", "w") as output:
          output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))





          share|improve this answer













          You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.



          The pattern I suggest is



          (/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)


          See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.



          When reading files in, it is more convenient to use with so that you do not have to use .close():



          import re
          pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

          with open('filename.c', "r") as fh:
          contents = fh.read()
          with open("output.txt", "w") as output:
          output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 18:25









          Wiktor StribiżewWiktor Stribiżew

          324k16146226




          324k16146226

























              1














              The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



              Consider using this:



              t = """
              if(condition)
              /*~-*/
              {
              /*~T*/
              _getmethis = FALSE;
              /*~-*/
              }
              ..........
              /*~T*/
              _findmethis = FALSE;

              /*~T*/
              do_not_findme_this = FALSE;
              """

              import re
              pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
              for m in re.finditer(pattern, t): # use the whole file here - not line-wise
              print(m.group(1))


              The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



              Printout:



              _getmethis
              _findmethis


              Doku:




              • re.MULTILINE

              • re.DOTALL






              share|improve this answer


























              • I am so silly of it that i always check the regex but not the python. I will try this

                – fastlearner
                Nov 21 '18 at 16:00











              • but this also finds the words if the underscore is in the middle of a variable

                – fastlearner
                Nov 21 '18 at 17:46











              • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

                – Patrick Artner
                Nov 21 '18 at 18:05
















              1














              The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



              Consider using this:



              t = """
              if(condition)
              /*~-*/
              {
              /*~T*/
              _getmethis = FALSE;
              /*~-*/
              }
              ..........
              /*~T*/
              _findmethis = FALSE;

              /*~T*/
              do_not_findme_this = FALSE;
              """

              import re
              pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
              for m in re.finditer(pattern, t): # use the whole file here - not line-wise
              print(m.group(1))


              The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



              Printout:



              _getmethis
              _findmethis


              Doku:




              • re.MULTILINE

              • re.DOTALL






              share|improve this answer


























              • I am so silly of it that i always check the regex but not the python. I will try this

                – fastlearner
                Nov 21 '18 at 16:00











              • but this also finds the words if the underscore is in the middle of a variable

                – fastlearner
                Nov 21 '18 at 17:46











              • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

                – Patrick Artner
                Nov 21 '18 at 18:05














              1












              1








              1







              The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



              Consider using this:



              t = """
              if(condition)
              /*~-*/
              {
              /*~T*/
              _getmethis = FALSE;
              /*~-*/
              }
              ..........
              /*~T*/
              _findmethis = FALSE;

              /*~T*/
              do_not_findme_this = FALSE;
              """

              import re
              pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
              for m in re.finditer(pattern, t): # use the whole file here - not line-wise
              print(m.group(1))


              The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



              Printout:



              _getmethis
              _findmethis


              Doku:




              • re.MULTILINE

              • re.DOTALL






              share|improve this answer















              The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.



              Consider using this:



              t = """
              if(condition)
              /*~-*/
              {
              /*~T*/
              _getmethis = FALSE;
              /*~-*/
              }
              ..........
              /*~T*/
              _findmethis = FALSE;

              /*~T*/
              do_not_findme_this = FALSE;
              """

              import re
              pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)
              for m in re.finditer(pattern, t): # use the whole file here - not line-wise
              print(m.group(1))


              The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.



              Printout:



              _getmethis
              _findmethis


              Doku:




              • re.MULTILINE

              • re.DOTALL







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 21 '18 at 18:03

























              answered Nov 21 '18 at 15:56









              Patrick ArtnerPatrick Artner

              25.4k62544




              25.4k62544













              • I am so silly of it that i always check the regex but not the python. I will try this

                – fastlearner
                Nov 21 '18 at 16:00











              • but this also finds the words if the underscore is in the middle of a variable

                – fastlearner
                Nov 21 '18 at 17:46











              • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

                – Patrick Artner
                Nov 21 '18 at 18:05



















              • I am so silly of it that i always check the regex but not the python. I will try this

                – fastlearner
                Nov 21 '18 at 16:00











              • but this also finds the words if the underscore is in the middle of a variable

                – fastlearner
                Nov 21 '18 at 17:46











              • @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

                – Patrick Artner
                Nov 21 '18 at 18:05

















              I am so silly of it that i always check the regex but not the python. I will try this

              – fastlearner
              Nov 21 '18 at 16:00





              I am so silly of it that i always check the regex but not the python. I will try this

              – fastlearner
              Nov 21 '18 at 16:00













              but this also finds the words if the underscore is in the middle of a variable

              – fastlearner
              Nov 21 '18 at 17:46





              but this also finds the words if the underscore is in the middle of a variable

              – fastlearner
              Nov 21 '18 at 17:46













              @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

              – Patrick Artner
              Nov 21 '18 at 18:05





              @fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

              – Patrick Artner
              Nov 21 '18 at 18:05











              0














              This is my final version where i also try to avoid duplicates



              import re
              fh = open('filename.c', "r")
              filecontent = fh.read()
              output = open("output.txt", "w")
              createlist =
              pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")
              for m in re.finditer(pattern, filecontent):
              if m.group(3) not in createlist:
              createlist.append(m.group(3))
              output.write(m.group(3))
              output.write('n')
              output.close()





              share|improve this answer




























                0














                This is my final version where i also try to avoid duplicates



                import re
                fh = open('filename.c', "r")
                filecontent = fh.read()
                output = open("output.txt", "w")
                createlist =
                pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")
                for m in re.finditer(pattern, filecontent):
                if m.group(3) not in createlist:
                createlist.append(m.group(3))
                output.write(m.group(3))
                output.write('n')
                output.close()





                share|improve this answer


























                  0












                  0








                  0







                  This is my final version where i also try to avoid duplicates



                  import re
                  fh = open('filename.c', "r")
                  filecontent = fh.read()
                  output = open("output.txt", "w")
                  createlist =
                  pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")
                  for m in re.finditer(pattern, filecontent):
                  if m.group(3) not in createlist:
                  createlist.append(m.group(3))
                  output.write(m.group(3))
                  output.write('n')
                  output.close()





                  share|improve this answer













                  This is my final version where i also try to avoid duplicates



                  import re
                  fh = open('filename.c', "r")
                  filecontent = fh.read()
                  output = open("output.txt", "w")
                  createlist =
                  pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")
                  for m in re.finditer(pattern, filecontent):
                  if m.group(3) not in createlist:
                  createlist.append(m.group(3))
                  output.write(m.group(3))
                  output.write('n')
                  output.close()






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 '18 at 20:33









                  fastlearnerfastlearner

                  3117




                  3117






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415684%2ffind-newline-with-words-starting-with-underscore-with-specific-pattern%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to send String Array data to Server using php in android

                      Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                      Is anime1.com a legal site for watching anime?