Extracting numbers from text files





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







3















I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1 and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.



Preferably I would be able to do this for multiple strings at once (so also look for string2, do the same there and write away the results in some listed format, say {numbers1,numbers2}. But this last part is less important.



How would I accomplish this?





I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)



The data would look something like



bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth


The patterns to look for would then be label1_, label2_ or label3 =. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
height_2.3 blabla_bla_length_3.4, should give 2.3, 3.4 or {2.3,3.4} depending on whether we ask for height, length or both.)



And the output would be, if given one pattern to look for, say label1_



5234


or when looking for label3 =



-0.34343


Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting



{5234,-0.34343}


Finally it would be nice if it could group results for this for multiple files if fed multiple files:



{out1a,out1b}
{out2a,out2b}









share|improve this question































    3















    I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1 and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.



    Preferably I would be able to do this for multiple strings at once (so also look for string2, do the same there and write away the results in some listed format, say {numbers1,numbers2}. But this last part is less important.



    How would I accomplish this?





    I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)



    The data would look something like



    bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
    label3 = -0.34343
    and_more_text_and_so_on_and_so_forth


    The patterns to look for would then be label1_, label2_ or label3 =. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
    height_2.3 blabla_bla_length_3.4, should give 2.3, 3.4 or {2.3,3.4} depending on whether we ask for height, length or both.)



    And the output would be, if given one pattern to look for, say label1_



    5234


    or when looking for label3 =



    -0.34343


    Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting



    {5234,-0.34343}


    Finally it would be nice if it could group results for this for multiple files if fed multiple files:



    {out1a,out1b}
    {out2a,out2b}









    share|improve this question



























      3












      3








      3








      I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1 and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.



      Preferably I would be able to do this for multiple strings at once (so also look for string2, do the same there and write away the results in some listed format, say {numbers1,numbers2}. But this last part is less important.



      How would I accomplish this?





      I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)



      The data would look something like



      bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
      label3 = -0.34343
      and_more_text_and_so_on_and_so_forth


      The patterns to look for would then be label1_, label2_ or label3 =. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
      height_2.3 blabla_bla_length_3.4, should give 2.3, 3.4 or {2.3,3.4} depending on whether we ask for height, length or both.)



      And the output would be, if given one pattern to look for, say label1_



      5234


      or when looking for label3 =



      -0.34343


      Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting



      {5234,-0.34343}


      Finally it would be nice if it could group results for this for multiple files if fed multiple files:



      {out1a,out1b}
      {out2a,out2b}









      share|improve this question
















      I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1 and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.



      Preferably I would be able to do this for multiple strings at once (so also look for string2, do the same there and write away the results in some listed format, say {numbers1,numbers2}. But this last part is less important.



      How would I accomplish this?





      I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)



      The data would look something like



      bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
      label3 = -0.34343
      and_more_text_and_so_on_and_so_forth


      The patterns to look for would then be label1_, label2_ or label3 =. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
      height_2.3 blabla_bla_length_3.4, should give 2.3, 3.4 or {2.3,3.4} depending on whether we ask for height, length or both.)



      And the output would be, if given one pattern to look for, say label1_



      5234


      or when looking for label3 =



      -0.34343


      Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting



      {5234,-0.34343}


      Finally it would be nice if it could group results for this for multiple files if fed multiple files:



      {out1a,out1b}
      {out2a,out2b}






      text-processing sed






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 18 at 13:01







      Kvothe

















      asked Feb 15 at 11:22









      KvotheKvothe

      1164




      1164






















          3 Answers
          3






          active

          oldest

          votes


















          2














          If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777.



          Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_ or label[123] =



          Putting it together:



          perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]


          Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after






          share|improve this answer
























          • Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

            – Kvothe
            Feb 18 at 11:37











          • Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

            – Kvothe
            Feb 18 at 11:37



















          2

















          sed solution



          With $p holding the label regex, e.g. p='label[13](_| = )':



          sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | 
          sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
          sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'


          The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.



          $p must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:



          p='label1(_)'
          p='label3( = )'
          p='label[13](_| = )'
          p='(label1_|label3 = )'
          p='(height|length)_'


          Multiple different strings in the group are to be separated by |.



          Examples



          $ <input cat
          bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
          label3 = -0.34343
          and_more_text_and_so_on_and_so_forth
          $ p='label1(_)'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          5234
          $ p='label3( = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          -0.34343
          $ p='label[13](_| = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {5234,-0.34343}
          $ echo "height_2.3 blabla_bla_length_3.4" >>input
          $ p='(height)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          2.3
          $ p='(height|length)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {2.3,3.4}





          share|improve this answer


























          • Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

            – Kvothe
            Feb 18 at 10:46











          • @Kvothe If the patterns are in fact different please edit your question post accordingly.

            – dessert
            Feb 18 at 12:03











          • I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

            – Kvothe
            Feb 18 at 12:57













          • @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

            – dessert
            Feb 18 at 13:30











          • thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

            – Kvothe
            Feb 18 at 13:42



















          2














          For single file



          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile


          For multiple files in a folder.

          cd to the folder and run:



          for file in *; do
          if [ "$file" == "newfile" ] ; then continue; fi
          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile
          done





          share|improve this answer


























          • Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

            – Kvothe
            Feb 18 at 11:16











          • Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

            – Kvothe
            Feb 18 at 11:19











          • @Kvothe Thanks. Improved, edited and added "for" loop

            – Vijay
            Feb 19 at 8:55












          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "89"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1118484%2fextracting-numbers-from-text-files%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777.



          Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_ or label[123] =



          Putting it together:



          perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]


          Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after






          share|improve this answer
























          • Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

            – Kvothe
            Feb 18 at 11:37











          • Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

            – Kvothe
            Feb 18 at 11:37
















          2














          If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777.



          Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_ or label[123] =



          Putting it together:



          perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]


          Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after






          share|improve this answer
























          • Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

            – Kvothe
            Feb 18 at 11:37











          • Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

            – Kvothe
            Feb 18 at 11:37














          2












          2








          2







          If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777.



          Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_ or label[123] =



          Putting it together:



          perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]


          Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after






          share|improve this answer













          If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777.



          Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_ or label[123] =



          Putting it together:



          perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]


          Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 15 at 14:38









          steeldriversteeldriver

          70.8k11115187




          70.8k11115187













          • Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

            – Kvothe
            Feb 18 at 11:37











          • Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

            – Kvothe
            Feb 18 at 11:37



















          • Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

            – Kvothe
            Feb 18 at 11:37











          • Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

            – Kvothe
            Feb 18 at 11:37

















          Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

          – Kvothe
          Feb 18 at 11:37





          Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.

          – Kvothe
          Feb 18 at 11:37













          Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

          – Kvothe
          Feb 18 at 11:37





          Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).

          – Kvothe
          Feb 18 at 11:37













          2

















          sed solution



          With $p holding the label regex, e.g. p='label[13](_| = )':



          sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | 
          sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
          sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'


          The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.



          $p must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:



          p='label1(_)'
          p='label3( = )'
          p='label[13](_| = )'
          p='(label1_|label3 = )'
          p='(height|length)_'


          Multiple different strings in the group are to be separated by |.



          Examples



          $ <input cat
          bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
          label3 = -0.34343
          and_more_text_and_so_on_and_so_forth
          $ p='label1(_)'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          5234
          $ p='label3( = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          -0.34343
          $ p='label[13](_| = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {5234,-0.34343}
          $ echo "height_2.3 blabla_bla_length_3.4" >>input
          $ p='(height)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          2.3
          $ p='(height|length)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {2.3,3.4}





          share|improve this answer


























          • Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

            – Kvothe
            Feb 18 at 10:46











          • @Kvothe If the patterns are in fact different please edit your question post accordingly.

            – dessert
            Feb 18 at 12:03











          • I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

            – Kvothe
            Feb 18 at 12:57













          • @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

            – dessert
            Feb 18 at 13:30











          • thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

            – Kvothe
            Feb 18 at 13:42
















          2

















          sed solution



          With $p holding the label regex, e.g. p='label[13](_| = )':



          sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | 
          sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
          sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'


          The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.



          $p must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:



          p='label1(_)'
          p='label3( = )'
          p='label[13](_| = )'
          p='(label1_|label3 = )'
          p='(height|length)_'


          Multiple different strings in the group are to be separated by |.



          Examples



          $ <input cat
          bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
          label3 = -0.34343
          and_more_text_and_so_on_and_so_forth
          $ p='label1(_)'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          5234
          $ p='label3( = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          -0.34343
          $ p='label[13](_| = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {5234,-0.34343}
          $ echo "height_2.3 blabla_bla_length_3.4" >>input
          $ p='(height)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          2.3
          $ p='(height|length)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {2.3,3.4}





          share|improve this answer


























          • Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

            – Kvothe
            Feb 18 at 10:46











          • @Kvothe If the patterns are in fact different please edit your question post accordingly.

            – dessert
            Feb 18 at 12:03











          • I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

            – Kvothe
            Feb 18 at 12:57













          • @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

            – dessert
            Feb 18 at 13:30











          • thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

            – Kvothe
            Feb 18 at 13:42














          2












          2








          2










          sed solution



          With $p holding the label regex, e.g. p='label[13](_| = )':



          sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | 
          sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
          sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'


          The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.



          $p must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:



          p='label1(_)'
          p='label3( = )'
          p='label[13](_| = )'
          p='(label1_|label3 = )'
          p='(height|length)_'


          Multiple different strings in the group are to be separated by |.



          Examples



          $ <input cat
          bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
          label3 = -0.34343
          and_more_text_and_so_on_and_so_forth
          $ p='label1(_)'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          5234
          $ p='label3( = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          -0.34343
          $ p='label[13](_| = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {5234,-0.34343}
          $ echo "height_2.3 blabla_bla_length_3.4" >>input
          $ p='(height)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          2.3
          $ p='(height|length)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {2.3,3.4}





          share|improve this answer


















          sed solution



          With $p holding the label regex, e.g. p='label[13](_| = )':



          sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | 
          sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
          sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'


          The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.



          $p must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:



          p='label1(_)'
          p='label3( = )'
          p='label[13](_| = )'
          p='(label1_|label3 = )'
          p='(height|length)_'


          Multiple different strings in the group are to be separated by |.



          Examples



          $ <input cat
          bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
          label3 = -0.34343
          and_more_text_and_so_on_and_so_forth
          $ p='label1(_)'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          5234
          $ p='label3( = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          -0.34343
          $ p='label[13](_| = )'
          $ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {5234,-0.34343}
          $ echo "height_2.3 blabla_bla_length_3.4" >>input
          $ p='(height)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          2.3
          $ p='(height|length)_'
          $ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
          {2.3,3.4}






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 18 at 13:27

























          answered Feb 15 at 14:36









          dessertdessert

          25.5k674108




          25.5k674108













          • Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

            – Kvothe
            Feb 18 at 10:46











          • @Kvothe If the patterns are in fact different please edit your question post accordingly.

            – dessert
            Feb 18 at 12:03











          • I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

            – Kvothe
            Feb 18 at 12:57













          • @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

            – dessert
            Feb 18 at 13:30











          • thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

            – Kvothe
            Feb 18 at 13:42



















          • Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

            – Kvothe
            Feb 18 at 10:46











          • @Kvothe If the patterns are in fact different please edit your question post accordingly.

            – dessert
            Feb 18 at 12:03











          • I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

            – Kvothe
            Feb 18 at 12:57













          • @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

            – dessert
            Feb 18 at 13:30











          • thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

            – Kvothe
            Feb 18 at 13:42

















          Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

          – Kvothe
          Feb 18 at 10:46





          Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries: p='(label1(_)|label3( = ))' and p='(label1|label3)(_| = )' fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?

          – Kvothe
          Feb 18 at 10:46













          @Kvothe If the patterns are in fact different please edit your question post accordingly.

          – dessert
          Feb 18 at 12:03





          @Kvothe If the patterns are in fact different please edit your question post accordingly.

          – dessert
          Feb 18 at 12:03













          I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

          – Kvothe
          Feb 18 at 12:57







          I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.

          – Kvothe
          Feb 18 at 12:57















          @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

          – dessert
          Feb 18 at 13:30





          @Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.

          – dessert
          Feb 18 at 13:30













          thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

          – Kvothe
          Feb 18 at 13:42





          thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in p='(height|length)_'. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.

          – Kvothe
          Feb 18 at 13:42











          2














          For single file



          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile


          For multiple files in a folder.

          cd to the folder and run:



          for file in *; do
          if [ "$file" == "newfile" ] ; then continue; fi
          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile
          done





          share|improve this answer


























          • Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

            – Kvothe
            Feb 18 at 11:16











          • Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

            – Kvothe
            Feb 18 at 11:19











          • @Kvothe Thanks. Improved, edited and added "for" loop

            – Vijay
            Feb 19 at 8:55
















          2














          For single file



          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile


          For multiple files in a folder.

          cd to the folder and run:



          for file in *; do
          if [ "$file" == "newfile" ] ; then continue; fi
          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile
          done





          share|improve this answer


























          • Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

            – Kvothe
            Feb 18 at 11:16











          • Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

            – Kvothe
            Feb 18 at 11:19











          • @Kvothe Thanks. Improved, edited and added "for" loop

            – Vijay
            Feb 19 at 8:55














          2












          2








          2







          For single file



          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile


          For multiple files in a folder.

          cd to the folder and run:



          for file in *; do
          if [ "$file" == "newfile" ] ; then continue; fi
          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile
          done





          share|improve this answer















          For single file



          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile


          For multiple files in a folder.

          cd to the folder and run:



          for file in *; do
          if [ "$file" == "newfile" ] ; then continue; fi
          grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
          paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
          rm ./tmpfile
          done






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 19 at 12:39

























          answered Feb 15 at 14:23









          VijayVijay

          2,1041822




          2,1041822













          • Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

            – Kvothe
            Feb 18 at 11:16











          • Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

            – Kvothe
            Feb 18 at 11:19











          • @Kvothe Thanks. Improved, edited and added "for" loop

            – Vijay
            Feb 19 at 8:55



















          • Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

            – Kvothe
            Feb 18 at 11:16











          • Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

            – Kvothe
            Feb 18 at 11:19











          • @Kvothe Thanks. Improved, edited and added "for" loop

            – Vijay
            Feb 19 at 8:55

















          Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

          – Kvothe
          Feb 18 at 11:16





          Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match). sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'

          – Kvothe
          Feb 18 at 11:16













          Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

          – Kvothe
          Feb 18 at 11:19





          Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?

          – Kvothe
          Feb 18 at 11:19













          @Kvothe Thanks. Improved, edited and added "for" loop

          – Vijay
          Feb 19 at 8:55





          @Kvothe Thanks. Improved, edited and added "for" loop

          – Vijay
          Feb 19 at 8:55


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Ask Ubuntu!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1118484%2fextracting-numbers-from-text-files%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

          ComboBox Display Member on multiple fields

          Is it possible to collect Nectar points via Trainline?