Extracting numbers from text files
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1
and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.
Preferably I would be able to do this for multiple strings at once (so also look for string2
, do the same there and write away the results in some listed format, say {numbers1,numbers2}
. But this last part is less important.
How would I accomplish this?
I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)
The data would look something like
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
The patterns to look for would then be label1_
, label2_
or label3 =
. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
height_2.3 blabla_bla_length_3.4
, should give 2.3
, 3.4
or {2.3,3.4}
depending on whether we ask for height, length or both.)
And the output would be, if given one pattern to look for, say label1_
5234
or when looking for label3 =
-0.34343
Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting
{5234,-0.34343}
Finally it would be nice if it could group results for this for multiple files if fed multiple files:
{out1a,out1b}
{out2a,out2b}
text-processing sed
add a comment |
I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1
and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.
Preferably I would be able to do this for multiple strings at once (so also look for string2
, do the same there and write away the results in some listed format, say {numbers1,numbers2}
. But this last part is less important.
How would I accomplish this?
I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)
The data would look something like
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
The patterns to look for would then be label1_
, label2_
or label3 =
. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
height_2.3 blabla_bla_length_3.4
, should give 2.3
, 3.4
or {2.3,3.4}
depending on whether we ask for height, length or both.)
And the output would be, if given one pattern to look for, say label1_
5234
or when looking for label3 =
-0.34343
Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting
{5234,-0.34343}
Finally it would be nice if it could group results for this for multiple files if fed multiple files:
{out1a,out1b}
{out2a,out2b}
text-processing sed
add a comment |
I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1
and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.
Preferably I would be able to do this for multiple strings at once (so also look for string2
, do the same there and write away the results in some listed format, say {numbers1,numbers2}
. But this last part is less important.
How would I accomplish this?
I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)
The data would look something like
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
The patterns to look for would then be label1_
, label2_
or label3 =
. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
height_2.3 blabla_bla_length_3.4
, should give 2.3
, 3.4
or {2.3,3.4}
depending on whether we ask for height, length or both.)
And the output would be, if given one pattern to look for, say label1_
5234
or when looking for label3 =
-0.34343
Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting
{5234,-0.34343}
Finally it would be nice if it could group results for this for multiple files if fed multiple files:
{out1a,out1b}
{out2a,out2b}
text-processing sed
I have some text files from which I want to extract certain data. I want to extract some specific numbers from them. In particular I want to search the files for the first occurrence of string1
and take the numbers that follow it. That is, I want to take all numbers, dots, or minus signs and stop once another character is reached. Then I want to write away those numbers to a separate file.
Preferably I would be able to do this for multiple strings at once (so also look for string2
, do the same there and write away the results in some listed format, say {numbers1,numbers2}
. But this last part is less important.
How would I accomplish this?
I did not include specific data since was hoping there was a general solution for the question I asked. Such a tool would be generally useful in numerous occasions. (I tried to piece together a general solution from the various questions on how to extract a number from a specific string, but failed.)
The data would look something like
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
The patterns to look for would then be label1_
, label2_
or label3 =
. (Of course it should work regardless of the exact form of label1. But since that apparently wasn't completely clear let me add another example.
height_2.3 blabla_bla_length_3.4
, should give 2.3
, 3.4
or {2.3,3.4}
depending on whether we ask for height, length or both.)
And the output would be, if given one pattern to look for, say label1_
5234
or when looking for label3 =
-0.34343
Then in addition it would be nice if it could search for two things at once and group them. So for instance giving both patterns above outputting
{5234,-0.34343}
Finally it would be nice if it could group results for this for multiple files if fed multiple files:
{out1a,out1b}
{out2a,out2b}
text-processing sed
text-processing sed
edited Feb 18 at 13:01
Kvothe
asked Feb 15 at 11:22
KvotheKvothe
1164
1164
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl
by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777
.
Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_
or label[123] =
Putting it together:
perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]
Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
add a comment |
sed
solution
With $p
holding the label regex, e.g. p='label[13](_| = )'
:
sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' |
sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.
$p
must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:
p='label1(_)'
p='label3( = )'
p='label[13](_| = )'
p='(label1_|label3 = )'
p='(height|length)_'
Multiple different strings in the group are to be separated by |
.
Examples
$ <input cat
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
$ p='label1(_)'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
5234
$ p='label3( = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
-0.34343
$ p='label[13](_| = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{5234,-0.34343}
$ echo "height_2.3 blabla_bla_length_3.4" >>input
$ p='(height)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
2.3
$ p='(height|length)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{2.3,3.4}
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:p='(label1(_)|label3( = ))'
andp='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?
– Kvothe
Feb 18 at 10:46
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done inp='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.
– Kvothe
Feb 18 at 13:42
|
show 1 more comment
For single file
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
For multiple files in a folder.
cd to the folder and run:
for file in *; do
if [ "$file" == "newfile" ] ; then continue; fi
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
done
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1118484%2fextracting-numbers-from-text-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl
by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777
.
Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_
or label[123] =
Putting it together:
perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]
Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
add a comment |
If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl
by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777
.
Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_
or label[123] =
Putting it together:
perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]
Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
add a comment |
If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl
by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777
.
Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_
or label[123] =
Putting it together:
perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]
Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after
If you want all the results from a single file grouped together, then it's likely easiest to slurp the whole of each file into memory and process it as one block. You can do that in perl
by unsetting the line separator - the conventional way to do that in a perl one-liner is -0777
.
Next you need a regular expression that matches a sequence of decimal digits, decimal separators etc. preceded by label[123]_
or label[123] =
Putting it together:
perl -0777nE 'say "{", (join ",", /label[123](?:_| = )K[0-9.+-]+/g), "}"' file1 file2 [...]
Note: I have not tried to address maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after
answered Feb 15 at 14:38
steeldriversteeldriver
70.8k11115187
70.8k11115187
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
add a comment |
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Thank you! Would you might helping me out with one last thing? I was thinking of the output in terms of an ordered pair so that I know which number corresponds to which label. Another format in which this is still clear would also be great of course. The problem with the solution above is that if I use /(?:label1|label2)(?:_| = ) , it gives results in order of occurrence and I don't know which result corresponds to which label.
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
Is there a way so that the output could be in some form where I know which number corresponds to which label. (For example either keeping the same order as the input or maybe formatted as label1[result],label2[result]).
– Kvothe
Feb 18 at 11:37
add a comment |
sed
solution
With $p
holding the label regex, e.g. p='label[13](_| = )'
:
sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' |
sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.
$p
must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:
p='label1(_)'
p='label3( = )'
p='label[13](_| = )'
p='(label1_|label3 = )'
p='(height|length)_'
Multiple different strings in the group are to be separated by |
.
Examples
$ <input cat
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
$ p='label1(_)'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
5234
$ p='label3( = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
-0.34343
$ p='label[13](_| = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{5234,-0.34343}
$ echo "height_2.3 blabla_bla_length_3.4" >>input
$ p='(height)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
2.3
$ p='(height|length)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{2.3,3.4}
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:p='(label1(_)|label3( = ))'
andp='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?
– Kvothe
Feb 18 at 10:46
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done inp='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.
– Kvothe
Feb 18 at 13:42
|
show 1 more comment
sed
solution
With $p
holding the label regex, e.g. p='label[13](_| = )'
:
sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' |
sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.
$p
must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:
p='label1(_)'
p='label3( = )'
p='label[13](_| = )'
p='(label1_|label3 = )'
p='(height|length)_'
Multiple different strings in the group are to be separated by |
.
Examples
$ <input cat
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
$ p='label1(_)'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
5234
$ p='label3( = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
-0.34343
$ p='label[13](_| = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{5234,-0.34343}
$ echo "height_2.3 blabla_bla_length_3.4" >>input
$ p='(height)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
2.3
$ p='(height|length)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{2.3,3.4}
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:p='(label1(_)|label3( = ))'
andp='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?
– Kvothe
Feb 18 at 10:46
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done inp='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.
– Kvothe
Feb 18 at 13:42
|
show 1 more comment
sed
solution
With $p
holding the label regex, e.g. p='label[13](_| = )'
:
sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' |
sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.
$p
must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:
p='label1(_)'
p='label3( = )'
p='label[13](_| = )'
p='(label1_|label3 = )'
p='(height|length)_'
Multiple different strings in the group are to be separated by |
.
Examples
$ <input cat
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
$ p='label1(_)'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
5234
$ p='label3( = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
-0.34343
$ p='label[13](_| = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{5234,-0.34343}
$ echo "height_2.3 blabla_bla_length_3.4" >>input
$ p='(height)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
2.3
$ p='(height|length)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{2.3,3.4}
sed
solution
With $p
holding the label regex, e.g. p='label[13](_| = )'
:
sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' |
sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' |
sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
The first command removes linebreaks and adds a new one after every match, the second one removes lines without a match and extracts the numbers and the third one makes them comma-separated and encloses them in curly brackets.
$p
must hold a valid regex and exactly one group (or you need to adjust the RHS part of the third substitution expression), for example:
p='label1(_)'
p='label3( = )'
p='label[13](_| = )'
p='(label1_|label3 = )'
p='(height|length)_'
Multiple different strings in the group are to be separated by |
.
Examples
$ <input cat
bla bla bla label1_5234_blablab_some_other_text_and_numbers_23343_blabla_more_text_and_numbers_maybe_label1_again_but_now_I_no_longer_care_about_what_comes_after blabla_label2_34343_this_is_some_other_number_want_to_be_able_to_extract_if_I_look_for_label2_instead_of_label1
label3 = -0.34343
and_more_text_and_so_on_and_so_forth
$ p='label1(_)'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
5234
$ p='label3( = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
-0.34343
$ p='label[13](_| = )'
$ <input sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{5234,-0.34343}
$ echo "height_2.3 blabla_bla_length_3.4" >>input
$ p='(height)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
2.3
$ p='(height|length)_'
$ <input2 sed ':a;N;$!ba;s/n/ /g;s/'"$p"'[-.0-9]+/&n/g' | sed '/.*'"$p"'[-.0-9]+/!d;s/.*'"$p"'([-.0-9]+)/2/' | sed ':a;N;$!ba;s/n/,/g;s/.*/{&}/'
{2.3,3.4}
edited Feb 18 at 13:27
answered Feb 15 at 14:36
dessertdessert
25.5k674108
25.5k674108
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:p='(label1(_)|label3( = ))'
andp='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?
– Kvothe
Feb 18 at 10:46
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done inp='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.
– Kvothe
Feb 18 at 13:42
|
show 1 more comment
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:p='(label1(_)|label3( = ))'
andp='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?
– Kvothe
Feb 18 at 10:46
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done inp='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.
– Kvothe
Feb 18 at 13:42
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:
p='(label1(_)|label3( = ))'
and p='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?– Kvothe
Feb 18 at 10:46
Thank you! This is great! I have some questions if you don't mind? Firstly in the real case the labels are really different words so I would need to ask for alternatives in a different way. But somehow both my tries:
p='(label1(_)|label3( = ))'
and p='(label1|label3)(_| = )'
fail (where each individual pattern does find what I want). What am I doing wrong? Also, why does | have to be escaped? Isn't it acting as alternative operator and not a literal character and should thus not be escaped? And finally is it possible to match (or keep) only the first match (of a specific label)?– Kvothe
Feb 18 at 10:46
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
@Kvothe If the patterns are in fact different please edit your question post accordingly.
– dessert
Feb 18 at 12:03
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
I can add an extra example, but of course an example will always just be an example. I tried to describe more generally what should happen in the text. I meant that label1 stands for some text, say height, while label2 stands for some other word say length. Of course the whole idea is that it should work for any label. I will see if I can clarify that in the question.
– Kvothe
Feb 18 at 12:57
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
@Kvothe I edited to reflect the changes, but my approach works without changes for these labels as well – I added the regex as a further example and showed the effect in the Examples section.
– dessert
Feb 18 at 13:30
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in
p='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.– Kvothe
Feb 18 at 13:42
thanks. The thing I was doing wrong is that I wasn't escaping the initial parentheses and the |, as done in
p='(height|length)_'
. Would you mind explaining why these need to be escaped? I had not expected that since we don't want them to stand for the literal symbol we want them to stand for the operators.– Kvothe
Feb 18 at 13:42
|
show 1 more comment
For single file
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
For multiple files in a folder.
cd to the folder and run:
for file in *; do
if [ "$file" == "newfile" ] ; then continue; fi
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
done
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
add a comment |
For single file
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
For multiple files in a folder.
cd to the folder and run:
for file in *; do
if [ "$file" == "newfile" ] ; then continue; fi
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
done
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
add a comment |
For single file
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
For multiple files in a folder.
cd to the folder and run:
for file in *; do
if [ "$file" == "newfile" ] ; then continue; fi
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
done
For single file
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" ./file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
For multiple files in a folder.
cd to the folder and run:
for file in *; do
if [ "$file" == "newfile" ] ; then continue; fi
grep -oP "(?<=label1_)[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
grep -oP "(?<=label3 = )[0-9.+-]+[^_ ]+" $file | head -n 1 >> ./tmpfile
paste -sd, ./tmpfile | awk '{ print "{"$0"}" }' >> ./newfile
rm ./tmpfile
done
edited Feb 19 at 12:39
answered Feb 15 at 14:23
VijayVijay
2,1041822
2,1041822
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
add a comment |
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).
sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Thanks! The second pipe does not seem to do what I wanted. It takes out the minus sign I wanted to keep for example. I edited to a form where I think it answers my question. (Also added the taking of only the first resulting match).
sed 's/_/ /g' ./file | grep -oP "(?<=label1 )[^ ]+" | grep -oE -m1 '(-)?[0-9](.)?([0-9]+)?'
– Kvothe
Feb 18 at 11:16
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
Finally I am confused by the space in (?<=label1 ). There was no space after label1. There was instead an underscore. What is going on here? Why should there be a space here?
– Kvothe
Feb 18 at 11:19
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
@Kvothe Thanks. Improved, edited and added "for" loop
– Vijay
Feb 19 at 8:55
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1118484%2fextracting-numbers-from-text-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown