Split file in different records using a loop and give the files new names
I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?
Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r
characters.
sed -n '1,20000p;20001q' file1 > file1_1
sed -e 's/r//g' file1_1 > file1
Is it possible to do something in a loop? or do I have to write this 40 times?
The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?
Thank you all for your answers!!
sed split
add a comment |
I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?
Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r
characters.
sed -n '1,20000p;20001q' file1 > file1_1
sed -e 's/r//g' file1_1 > file1
Is it possible to do something in a loop? or do I have to write this 40 times?
The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?
Thank you all for your answers!!
sed split
Wait, so you only want to keep the first 20000 of each file?
– terdon♦
Feb 4 at 13:42
No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...
– Katleen
Feb 4 at 13:51
1
OK, but the command you show simply takes the 1st 20000 lines and deletes everything else:sed -e 's/r//g' file1_1 > file1
will replace the contents offile1
with the modified contents offile1_1
.
– terdon♦
Feb 4 at 13:53
add a comment |
I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?
Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r
characters.
sed -n '1,20000p;20001q' file1 > file1_1
sed -e 's/r//g' file1_1 > file1
Is it possible to do something in a loop? or do I have to write this 40 times?
The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?
Thank you all for your answers!!
sed split
I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?
Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r
characters.
sed -n '1,20000p;20001q' file1 > file1_1
sed -e 's/r//g' file1_1 > file1
Is it possible to do something in a loop? or do I have to write this 40 times?
The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?
Thank you all for your answers!!
sed split
sed split
edited Feb 4 at 13:25
terdon♦
130k32255433
130k32255433
asked Feb 4 at 12:43
KatleenKatleen
233
233
Wait, so you only want to keep the first 20000 of each file?
– terdon♦
Feb 4 at 13:42
No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...
– Katleen
Feb 4 at 13:51
1
OK, but the command you show simply takes the 1st 20000 lines and deletes everything else:sed -e 's/r//g' file1_1 > file1
will replace the contents offile1
with the modified contents offile1_1
.
– terdon♦
Feb 4 at 13:53
add a comment |
Wait, so you only want to keep the first 20000 of each file?
– terdon♦
Feb 4 at 13:42
No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...
– Katleen
Feb 4 at 13:51
1
OK, but the command you show simply takes the 1st 20000 lines and deletes everything else:sed -e 's/r//g' file1_1 > file1
will replace the contents offile1
with the modified contents offile1_1
.
– terdon♦
Feb 4 at 13:53
Wait, so you only want to keep the first 20000 of each file?
– terdon♦
Feb 4 at 13:42
Wait, so you only want to keep the first 20000 of each file?
– terdon♦
Feb 4 at 13:42
No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...
– Katleen
Feb 4 at 13:51
No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...
– Katleen
Feb 4 at 13:51
1
1
OK, but the command you show simply takes the 1st 20000 lines and deletes everything else:
sed -e 's/r//g' file1_1 > file1
will replace the contents of file1
with the modified contents of file1_1
.– terdon♦
Feb 4 at 13:53
OK, but the command you show simply takes the 1st 20000 lines and deletes everything else:
sed -e 's/r//g' file1_1 > file1
will replace the contents of file1
with the modified contents of file1_1
.– terdon♦
Feb 4 at 13:53
add a comment |
2 Answers
2
active
oldest
votes
Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed
, you could do the same thing with:
i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done
That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:
sed -i 's/r//; 200001q' file
Problem solved!
– Katleen
Feb 4 at 16:09
1
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives thesplit
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.
– terdon♦
Feb 4 at 16:15
add a comment |
You can try to split the file with command split
. If you want 20k records in file the command will be:
split -l 20000 file1
If you want specific prefix for result files use command like:
split -l 20000 file1 PREFIX
If you want numeric suffixes for result files use command like:
split -d -l 20000 file1 PREFIX
Those commands will create bunch of files, splitted by 20k lines each.
To remove ^M
you can use loop like:
for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done
This will add _unix
on the end of files which have ^M
removed
1
FWIW with GNUsplit
, the CR removal could be added as a filter e.g.split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498590%2fsplit-file-in-different-records-using-a-loop-and-give-the-files-new-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed
, you could do the same thing with:
i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done
That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:
sed -i 's/r//; 200001q' file
Problem solved!
– Katleen
Feb 4 at 16:09
1
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives thesplit
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.
– terdon♦
Feb 4 at 16:15
add a comment |
Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed
, you could do the same thing with:
i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done
That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:
sed -i 's/r//; 200001q' file
Problem solved!
– Katleen
Feb 4 at 16:09
1
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives thesplit
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.
– terdon♦
Feb 4 at 16:15
add a comment |
Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed
, you could do the same thing with:
i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done
That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:
sed -i 's/r//; 200001q' file
Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed
, you could do the same thing with:
i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done
That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:
sed -i 's/r//; 200001q' file
edited Feb 5 at 0:13
Old Pro
5872514
5872514
answered Feb 4 at 13:41
terdon♦terdon
130k32255433
130k32255433
Problem solved!
– Katleen
Feb 4 at 16:09
1
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives thesplit
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.
– terdon♦
Feb 4 at 16:15
add a comment |
Problem solved!
– Katleen
Feb 4 at 16:09
1
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives thesplit
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.
– terdon♦
Feb 4 at 16:15
Problem solved!
– Katleen
Feb 4 at 16:09
Problem solved!
– Katleen
Feb 4 at 16:09
1
1
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the
split
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.– terdon♦
Feb 4 at 16:15
@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the
split
solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.– terdon♦
Feb 4 at 16:15
add a comment |
You can try to split the file with command split
. If you want 20k records in file the command will be:
split -l 20000 file1
If you want specific prefix for result files use command like:
split -l 20000 file1 PREFIX
If you want numeric suffixes for result files use command like:
split -d -l 20000 file1 PREFIX
Those commands will create bunch of files, splitted by 20k lines each.
To remove ^M
you can use loop like:
for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done
This will add _unix
on the end of files which have ^M
removed
1
FWIW with GNUsplit
, the CR removal could be added as a filter e.g.split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
add a comment |
You can try to split the file with command split
. If you want 20k records in file the command will be:
split -l 20000 file1
If you want specific prefix for result files use command like:
split -l 20000 file1 PREFIX
If you want numeric suffixes for result files use command like:
split -d -l 20000 file1 PREFIX
Those commands will create bunch of files, splitted by 20k lines each.
To remove ^M
you can use loop like:
for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done
This will add _unix
on the end of files which have ^M
removed
1
FWIW with GNUsplit
, the CR removal could be added as a filter e.g.split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
add a comment |
You can try to split the file with command split
. If you want 20k records in file the command will be:
split -l 20000 file1
If you want specific prefix for result files use command like:
split -l 20000 file1 PREFIX
If you want numeric suffixes for result files use command like:
split -d -l 20000 file1 PREFIX
Those commands will create bunch of files, splitted by 20k lines each.
To remove ^M
you can use loop like:
for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done
This will add _unix
on the end of files which have ^M
removed
You can try to split the file with command split
. If you want 20k records in file the command will be:
split -l 20000 file1
If you want specific prefix for result files use command like:
split -l 20000 file1 PREFIX
If you want numeric suffixes for result files use command like:
split -d -l 20000 file1 PREFIX
Those commands will create bunch of files, splitted by 20k lines each.
To remove ^M
you can use loop like:
for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done
This will add _unix
on the end of files which have ^M
removed
edited Feb 4 at 13:49
answered Feb 4 at 13:09
Romeo NinovRomeo Ninov
6,10832028
6,10832028
1
FWIW with GNUsplit
, the CR removal could be added as a filter e.g.split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
add a comment |
1
FWIW with GNUsplit
, the CR removal could be added as a filter e.g.split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
1
1
FWIW with GNU
split
, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
FWIW with GNU
split
, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX
– steeldriver
Feb 4 at 13:32
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?
– Katleen
Feb 4 at 13:45
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)
– Romeo Ninov
Feb 4 at 13:49
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498590%2fsplit-file-in-different-records-using-a-loop-and-give-the-files-new-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Wait, so you only want to keep the first 20000 of each file?
– terdon♦
Feb 4 at 13:42
No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...
– Katleen
Feb 4 at 13:51
1
OK, but the command you show simply takes the 1st 20000 lines and deletes everything else:
sed -e 's/r//g' file1_1 > file1
will replace the contents offile1
with the modified contents offile1_1
.– terdon♦
Feb 4 at 13:53