find newline with words starting with underscore with specific pattern

I need to find the following from c code using regular expression python but some how i could not write it properly.

if(condition)

     /*~T*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~T*/

     }

..........

/*~T*/

     _findmethis = FALSE;

......

                    /*~T*/

_findthat = True;

I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file

import re

fh = open('filename.c', "r")

output = open("output.txt", "w")

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

for line in fh:

for m in re.finditer(pattern, line):

    output.write(m.group(3))

    output.write("n")



output.close()

edited Nov 21 '18 at 15:57

asked Nov 21 '18 at 15:44

fastlearner

3117

[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55

add a comment |

I need to find the following from c code using regular expression python but some how i could not write it properly.

if(condition)

     /*~T*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~T*/

     }

..........

/*~T*/

     _findmethis = FALSE;

......

                    /*~T*/

_findthat = True;

I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file

import re

fh = open('filename.c', "r")

output = open("output.txt", "w")

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

for line in fh:

for m in re.finditer(pattern, line):

    output.write(m.group(3))

    output.write("n")



output.close()

edited Nov 21 '18 at 15:57

asked Nov 21 '18 at 15:44

fastlearner

3117

[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55

add a comment |

I need to find the following from c code using regular expression python but some how i could not write it properly.

if(condition)

     /*~T*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~T*/

     }

..........

/*~T*/

     _findmethis = FALSE;

......

                    /*~T*/

_findthat = True;

I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file

import re

fh = open('filename.c', "r")

output = open("output.txt", "w")

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

for line in fh:

for m in re.finditer(pattern, line):

    output.write(m.group(3))

    output.write("n")



output.close()

edited Nov 21 '18 at 15:57

asked Nov 21 '18 at 15:44

fastlearner

3117

I need to find the following from c code using regular expression python but some how i could not write it properly.

if(condition)

     /*~T*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~T*/

     }

..........

/*~T*/

     _findmethis = FALSE;

......

                    /*~T*/

_findthat = True;

I need to find all variables after /*~T/ starting with underscore and write to new file but my code could not find it i tried several regex pattern it is always empty output file

import re

fh = open('filename.c', "r")

output = open("output.txt", "w")

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')

for line in fh:

for m in re.finditer(pattern, line):

    output.write(m.group(3))

    output.write("n")



output.close()

regex python-3.x

edited Nov 21 '18 at 15:57

asked Nov 21 '18 at 15:44

fastlearner

3117

edited Nov 21 '18 at 15:57

asked Nov 21 '18 at 15:44

fastlearner

3117

edited Nov 21 '18 at 15:57

asked Nov 21 '18 at 15:44

fastlearner

3117

asked Nov 21 '18 at 15:44

fastlearner

3117

asked Nov 21 '18 at 15:44

fastlearner

3117

[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55

add a comment |

[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55

[aA-zZ] does not only match letters, it also matches [, , ], ^, _, `. You must have meant [a-zA-Z]. All you need to do is remove for line in fh: and use re.finditer(pattern, fh.read())

– Wiktor Stribiżew
Nov 21 '18 at 16:55

add a comment |

3 Answers
3

active

oldest

votes

You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.

The pattern I suggest is

(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)

See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.

When reading files in, it is more convenient to use with so that you do not have to use .close():

import re

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')



with open('filename.c', "r") as fh:

    contents = fh.read()

    with open("output.txt", "w") as output:

        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

add a comment |

The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.

Consider using this:

t = """

if(condition)

     /*~-*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~-*/

     }

..........

/*~T*/

     _findmethis = FALSE;



     /*~T*/

     do_not_findme_this = FALSE;

"""



import re

pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)

for m in re.finditer(pattern, t):  # use the whole file here - not line-wise

    print(m.group(1))

The pattern uses 2 flags that tell regex to use multiline matches and that dots . also match newlines (by default they don't) together with a non greedy .*? to make the gap between /*~-T*/ and the following group minimal large.

Printout:

_getmethis

_findmethis

Doku:

re.MULTILINE

re.DOTALL

edited Nov 21 '18 at 18:03

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

I am so silly of it that i always check the regex but not the python. I will try this

– fastlearner
Nov 21 '18 at 16:00

but this also finds the words if the underscore is in the middle of a variable

– fastlearner
Nov 21 '18 at 17:46

@fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

– Patrick Artner
Nov 21 '18 at 18:05

add a comment |

This is my final version where i also try to avoid duplicates

import re

fh = open('filename.c', "r")

filecontent = fh.read() 

output = open("output.txt", "w")

createlist = 

pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")

for m in re.finditer(pattern, filecontent):

    if m.group(3) not in createlist:

        createlist.append(m.group(3))

        output.write(m.group(3))

        output.write('n')

output.close()

answered Nov 21 '18 at 20:33

fastlearner

3117

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415684%2ffind-newline-with-words-starting-with-underscore-with-specific-pattern%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.

The pattern I suggest is

(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)

See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.

When reading files in, it is more convenient to use with so that you do not have to use .close():

import re

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')



with open('filename.c', "r") as fh:

    contents = fh.read()

    with open("output.txt", "w") as output:

        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

add a comment |

You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.

The pattern I suggest is

(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)

See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.

When reading files in, it is more convenient to use with so that you do not have to use .close():

import re

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')



with open('filename.c', "r") as fh:

    contents = fh.read()

    with open("output.txt", "w") as output:

        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

add a comment |

You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.

The pattern I suggest is

(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)

See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.

When reading files in, it is more convenient to use with so that you do not have to use .close():

import re

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')



with open('filename.c', "r") as fh:

    contents = fh.read()

    with open("output.txt", "w") as output:

        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

You need to read the file in as a whole with fh.read() and make sure you amend the pattern to only match letters since [aA-zZ] matches more than just letters.

The pattern I suggest is

(/*~T*/)([^Sn]*ns*)(_[a-zA-Z]*)

See the regex demo. Note that I deliberately subtracted n from the first s* to make matching more efficient.

When reading files in, it is more convenient to use with so that you do not have to use .close():

import re

pattern = re.compile(r'(/*~T*/)(s*?ns*)(_[aA-zZ]*)')



with open('filename.c', "r") as fh:

    contents = fh.read()

    with open("output.txt", "w") as output:

        output.write("n".join([x.group(3) for x in pattern.finditer(contents)]))

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

answered Nov 21 '18 at 18:25

Wiktor Stribiżew

324k16146226

add a comment |

The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.

Consider using this:

t = """

if(condition)

     /*~-*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~-*/

     }

..........

/*~T*/

     _findmethis = FALSE;



     /*~T*/

     do_not_findme_this = FALSE;

"""



import re

pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)

for m in re.finditer(pattern, t):  # use the whole file here - not line-wise

    print(m.group(1))

Printout:

_getmethis

_findmethis

Doku:

re.MULTILINE

re.DOTALL

edited Nov 21 '18 at 18:03

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

I am so silly of it that i always check the regex but not the python. I will try this

– fastlearner
Nov 21 '18 at 16:00

but this also finds the words if the underscore is in the middle of a variable

– fastlearner
Nov 21 '18 at 17:46

@fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

– Patrick Artner
Nov 21 '18 at 18:05

add a comment |

The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.

Consider using this:

t = """

if(condition)

     /*~-*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~-*/

     }

..........

/*~T*/

     _findmethis = FALSE;



     /*~T*/

     do_not_findme_this = FALSE;

"""



import re

pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)

for m in re.finditer(pattern, t):  # use the whole file here - not line-wise

    print(m.group(1))

Printout:

_getmethis

_findmethis

Doku:

re.MULTILINE

re.DOTALL

edited Nov 21 '18 at 18:03

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

I am so silly of it that i always check the regex but not the python. I will try this

– fastlearner
Nov 21 '18 at 16:00

but this also finds the words if the underscore is in the middle of a variable

– fastlearner
Nov 21 '18 at 17:46

@fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

– Patrick Artner
Nov 21 '18 at 18:05

add a comment |

The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.

Consider using this:

t = """

if(condition)

     /*~-*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~-*/

     }

..........

/*~T*/

     _findmethis = FALSE;



     /*~T*/

     do_not_findme_this = FALSE;

"""



import re

pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)

for m in re.finditer(pattern, t):  # use the whole file here - not line-wise

    print(m.group(1))

Printout:

_getmethis

_findmethis

Doku:

re.MULTILINE

re.DOTALL

edited Nov 21 '18 at 18:03

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

The reason you do not find anything is that your pattern crosses multiple lines but you are only looking at your file one line at a time.

Consider using this:

t = """

if(condition)

     /*~-*/

     {

        /*~T*/

        _getmethis = FALSE;

     /*~-*/

     }

..........

/*~T*/

     _findmethis = FALSE;



     /*~T*/

     do_not_findme_this = FALSE;

"""



import re

pattern = re.compile(r'/*~T*/.*?ns+(_[aA-zZ]*)', re.MULTILINE|re.DOTALL)

for m in re.finditer(pattern, t):  # use the whole file here - not line-wise

    print(m.group(1))

Printout:

_getmethis

_findmethis

Doku:

re.MULTILINE

re.DOTALL

edited Nov 21 '18 at 18:03

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

edited Nov 21 '18 at 18:03

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

answered Nov 21 '18 at 15:56

Patrick Artner

25.4k62544

I am so silly of it that i always check the regex but not the python. I will try this

– fastlearner
Nov 21 '18 at 16:00

but this also finds the words if the underscore is in the middle of a variable

– fastlearner
Nov 21 '18 at 17:46

@fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

– Patrick Artner
Nov 21 '18 at 18:05

add a comment |

I am so silly of it that i always check the regex but not the python. I will try this

– fastlearner
Nov 21 '18 at 16:00

but this also finds the words if the underscore is in the middle of a variable

– fastlearner
Nov 21 '18 at 17:46

@fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

– Patrick Artner
Nov 21 '18 at 18:05

I am so silly of it that i always check the regex but not the python. I will try this

– fastlearner
Nov 21 '18 at 16:00

but this also finds the words if the underscore is in the middle of a variable

– fastlearner
Nov 21 '18 at 17:46

@fastlearner Then adjust the pattern? So the (_[aA-zZ]*) is only allowed after a newline and spaces? See edit ... if you want to play with regex, use regex101.com and put it to python mode - copy your text and pattern in it and modify it until it fits. Your example text did not contian any pattern "to be excluded" ...

– Patrick Artner
Nov 21 '18 at 18:05

add a comment |

This is my final version where i also try to avoid duplicates

import re

fh = open('filename.c', "r")

filecontent = fh.read() 

output = open("output.txt", "w")

createlist = 

pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")

for m in re.finditer(pattern, filecontent):

    if m.group(3) not in createlist:

        createlist.append(m.group(3))

        output.write(m.group(3))

        output.write('n')

output.close()

answered Nov 21 '18 at 20:33

fastlearner

3117

add a comment |

This is my final version where i also try to avoid duplicates

import re

fh = open('filename.c', "r")

filecontent = fh.read() 

output = open("output.txt", "w")

createlist = 

pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")

for m in re.finditer(pattern, filecontent):

    if m.group(3) not in createlist:

        createlist.append(m.group(3))

        output.write(m.group(3))

        output.write('n')

output.close()

answered Nov 21 '18 at 20:33

fastlearner

3117

add a comment |

This is my final version where i also try to avoid duplicates

import re

fh = open('filename.c', "r")

filecontent = fh.read() 

output = open("output.txt", "w")

createlist = 

pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")

for m in re.finditer(pattern, filecontent):

    if m.group(3) not in createlist:

        createlist.append(m.group(3))

        output.write(m.group(3))

        output.write('n')

output.close()

answered Nov 21 '18 at 20:33

fastlearner

3117

This is my final version where i also try to avoid duplicates

import re

fh = open('filename.c', "r")

filecontent = fh.read() 

output = open("output.txt", "w")

createlist = 

pattern = re.compile(r"(/*~T*/)(s*?ns*)(_[aA-zZ]*)")

for m in re.finditer(pattern, filecontent):

    if m.group(3) not in createlist:

        createlist.append(m.group(3))

        output.write(m.group(3))

        output.write('n')

output.close()

answered Nov 21 '18 at 20:33

fastlearner

3117

answered Nov 21 '18 at 20:33

fastlearner

3117

answered Nov 21 '18 at 20:33

fastlearner

3117

answered Nov 21 '18 at 20:33

fastlearner

3117

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky