Split file in different records using a loop and give the files new names












2















I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?



Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.



sed -n '1,20000p;20001q'  file1 > file1_1
sed -e 's/r//g' file1_1 > file1


Is it possible to do something in a loop? or do I have to write this 40 times?



The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?



Thank you all for your answers!!










share|improve this question

























  • Wait, so you only want to keep the first 20000 of each file?

    – terdon
    Feb 4 at 13:42











  • No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

    – Katleen
    Feb 4 at 13:51








  • 1





    OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

    – terdon
    Feb 4 at 13:53
















2















I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?



Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.



sed -n '1,20000p;20001q'  file1 > file1_1
sed -e 's/r//g' file1_1 > file1


Is it possible to do something in a loop? or do I have to write this 40 times?



The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?



Thank you all for your answers!!










share|improve this question

























  • Wait, so you only want to keep the first 20000 of each file?

    – terdon
    Feb 4 at 13:42











  • No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

    – Katleen
    Feb 4 at 13:51








  • 1





    OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

    – terdon
    Feb 4 at 13:53














2












2








2








I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?



Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.



sed -n '1,20000p;20001q'  file1 > file1_1
sed -e 's/r//g' file1_1 > file1


Is it possible to do something in a loop? or do I have to write this 40 times?



The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?



Thank you all for your answers!!










share|improve this question
















I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?



Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.



sed -n '1,20000p;20001q'  file1 > file1_1
sed -e 's/r//g' file1_1 > file1


Is it possible to do something in a loop? or do I have to write this 40 times?



The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?



Thank you all for your answers!!







sed split






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 4 at 13:25









terdon

130k32255433




130k32255433










asked Feb 4 at 12:43









KatleenKatleen

233




233













  • Wait, so you only want to keep the first 20000 of each file?

    – terdon
    Feb 4 at 13:42











  • No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

    – Katleen
    Feb 4 at 13:51








  • 1





    OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

    – terdon
    Feb 4 at 13:53



















  • Wait, so you only want to keep the first 20000 of each file?

    – terdon
    Feb 4 at 13:42











  • No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

    – Katleen
    Feb 4 at 13:51








  • 1





    OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

    – terdon
    Feb 4 at 13:53

















Wait, so you only want to keep the first 20000 of each file?

– terdon
Feb 4 at 13:42





Wait, so you only want to keep the first 20000 of each file?

– terdon
Feb 4 at 13:42













No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51







No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51






1




1





OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon
Feb 4 at 13:53





OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon
Feb 4 at 13:53










2 Answers
2






active

oldest

votes


















5














Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:



i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done


That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:



sed -i 's/r//; 200001q' file





share|improve this answer


























  • Problem solved!

    – Katleen
    Feb 4 at 16:09






  • 1





    @Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

    – terdon
    Feb 4 at 16:15



















6














You can try to split the file with command split. If you want 20k records in file the command will be:



split -l 20000 file1


If you want specific prefix for result files use command like:



split -l 20000 file1 PREFIX


If you want numeric suffixes for result files use command like:



split -d -l 20000 file1 PREFIX


Those commands will create bunch of files, splitted by 20k lines each.



To remove ^M you can use loop like:



for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done


This will add _unix on the end of files which have ^M removed






share|improve this answer





















  • 1





    FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

    – steeldriver
    Feb 4 at 13:32











  • that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

    – Katleen
    Feb 4 at 13:45











  • @Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

    – Romeo Ninov
    Feb 4 at 13:49











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498590%2fsplit-file-in-different-records-using-a-loop-and-give-the-files-new-names%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









5














Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:



i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done


That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:



sed -i 's/r//; 200001q' file





share|improve this answer


























  • Problem solved!

    – Katleen
    Feb 4 at 16:09






  • 1





    @Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

    – terdon
    Feb 4 at 16:15
















5














Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:



i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done


That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:



sed -i 's/r//; 200001q' file





share|improve this answer


























  • Problem solved!

    – Katleen
    Feb 4 at 16:09






  • 1





    @Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

    – terdon
    Feb 4 at 16:15














5












5








5







Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:



i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done


That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:



sed -i 's/r//; 200001q' file





share|improve this answer















Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:



i=1;
filelen=$(wc -l < file1)
while [[ $i -le $filelen ]]; do
sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;
((i+=20000));
done


That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:



sed -i 's/r//; 200001q' file






share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 5 at 0:13









Old Pro

5872514




5872514










answered Feb 4 at 13:41









terdonterdon

130k32255433




130k32255433













  • Problem solved!

    – Katleen
    Feb 4 at 16:09






  • 1





    @Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

    – terdon
    Feb 4 at 16:15



















  • Problem solved!

    – Katleen
    Feb 4 at 16:09






  • 1





    @Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

    – terdon
    Feb 4 at 16:15

















Problem solved!

– Katleen
Feb 4 at 16:09





Problem solved!

– Katleen
Feb 4 at 16:09




1




1





@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon
Feb 4 at 16:15





@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon
Feb 4 at 16:15













6














You can try to split the file with command split. If you want 20k records in file the command will be:



split -l 20000 file1


If you want specific prefix for result files use command like:



split -l 20000 file1 PREFIX


If you want numeric suffixes for result files use command like:



split -d -l 20000 file1 PREFIX


Those commands will create bunch of files, splitted by 20k lines each.



To remove ^M you can use loop like:



for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done


This will add _unix on the end of files which have ^M removed






share|improve this answer





















  • 1





    FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

    – steeldriver
    Feb 4 at 13:32











  • that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

    – Katleen
    Feb 4 at 13:45











  • @Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

    – Romeo Ninov
    Feb 4 at 13:49
















6














You can try to split the file with command split. If you want 20k records in file the command will be:



split -l 20000 file1


If you want specific prefix for result files use command like:



split -l 20000 file1 PREFIX


If you want numeric suffixes for result files use command like:



split -d -l 20000 file1 PREFIX


Those commands will create bunch of files, splitted by 20k lines each.



To remove ^M you can use loop like:



for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done


This will add _unix on the end of files which have ^M removed






share|improve this answer





















  • 1





    FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

    – steeldriver
    Feb 4 at 13:32











  • that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

    – Katleen
    Feb 4 at 13:45











  • @Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

    – Romeo Ninov
    Feb 4 at 13:49














6












6








6







You can try to split the file with command split. If you want 20k records in file the command will be:



split -l 20000 file1


If you want specific prefix for result files use command like:



split -l 20000 file1 PREFIX


If you want numeric suffixes for result files use command like:



split -d -l 20000 file1 PREFIX


Those commands will create bunch of files, splitted by 20k lines each.



To remove ^M you can use loop like:



for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done


This will add _unix on the end of files which have ^M removed






share|improve this answer















You can try to split the file with command split. If you want 20k records in file the command will be:



split -l 20000 file1


If you want specific prefix for result files use command like:



split -l 20000 file1 PREFIX


If you want numeric suffixes for result files use command like:



split -d -l 20000 file1 PREFIX


Those commands will create bunch of files, splitted by 20k lines each.



To remove ^M you can use loop like:



for i in PREFIX??
do
dos2unix "$i" "${i}_unix"
done


This will add _unix on the end of files which have ^M removed







share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 4 at 13:49

























answered Feb 4 at 13:09









Romeo NinovRomeo Ninov

6,10832028




6,10832028








  • 1





    FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

    – steeldriver
    Feb 4 at 13:32











  • that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

    – Katleen
    Feb 4 at 13:45











  • @Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

    – Romeo Ninov
    Feb 4 at 13:49














  • 1





    FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

    – steeldriver
    Feb 4 at 13:32











  • that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

    – Katleen
    Feb 4 at 13:45











  • @Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

    – Romeo Ninov
    Feb 4 at 13:49








1




1





FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32





FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32













that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45





that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45













@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49





@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498590%2fsplit-file-in-different-records-using-a-loop-and-give-the-files-new-names%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?