Split file in different records using a loop and give the files new names

I have a large files (800.000 records) and I want to split this into different files of 20.000 records each. This one I can do, but my next problem is that I want to know if it's possible to automatically generate the new files?

Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.

sed -n '1,20000p;20001q'  file1 > file1_1

sed -e 's/r//g' file1_1 > file1

Is it possible to do something in a loop? or do I have to write this 40 times?

The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?

Thank you all for your answers!!

edited Feb 4 at 13:25

terdon♦

130k32255433

asked Feb 4 at 12:43

Katleen

233

Wait, so you only want to keep the first 20000 of each file?

– terdon♦
Feb 4 at 13:42

No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51

1

OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon♦
Feb 4 at 13:53

add a comment |

Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.

sed -n '1,20000p;20001q'  file1 > file1_1

sed -e 's/r//g' file1_1 > file1

Is it possible to do something in a loop? or do I have to write this 40 times?

The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?

Thank you all for your answers!!

edited Feb 4 at 13:25

terdon♦

130k32255433

asked Feb 4 at 12:43

Katleen

233

Wait, so you only want to keep the first 20000 of each file?

– terdon♦
Feb 4 at 13:42

No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51

1

OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon♦
Feb 4 at 13:53

add a comment |

Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.

sed -n '1,20000p;20001q'  file1 > file1_1

sed -e 's/r//g' file1_1 > file1

Is it possible to do something in a loop? or do I have to write this 40 times?

The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?

Thank you all for your answers!!

edited Feb 4 at 13:25

terdon♦

130k32255433

asked Feb 4 at 12:43

Katleen

233

Example: file1 contains 800.000 records. First I get 20000 records out of it and move to another file, and then I remove the r characters.

sed -n '1,20000p;20001q'  file1 > file1_1

sed -e 's/r//g' file1_1 > file1

Is it possible to do something in a loop? or do I have to write this 40 times?

The number of records is variable, today it contains 800.000 records, but tomorrow it can contain 789.123 of 812.321 records. Do I have to give an 'end number' with the sed-command?

Thank you all for your answers!!

sed split

edited Feb 4 at 13:25

terdon♦

130k32255433

asked Feb 4 at 12:43

Katleen

233

edited Feb 4 at 13:25

terdon♦

130k32255433

asked Feb 4 at 12:43

Katleen

233

edited Feb 4 at 13:25

terdon♦

130k32255433

edited Feb 4 at 13:25

terdon♦

130k32255433

edited Feb 4 at 13:25

terdon♦

130k32255433

asked Feb 4 at 12:43

Katleen

233

asked Feb 4 at 12:43

Katleen

233

asked Feb 4 at 12:43

Katleen

233

Wait, so you only want to keep the first 20000 of each file?

– terdon♦
Feb 4 at 13:42

No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51

1

OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon♦
Feb 4 at 13:53

add a comment |

Wait, so you only want to keep the first 20000 of each file?

– terdon♦
Feb 4 at 13:42

No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51

1

OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon♦
Feb 4 at 13:53

Wait, so you only want to keep the first 20000 of each file?

– terdon♦
Feb 4 at 13:42

No, I want all the records but split in files having 20000 records. Instead of having a file called xaa I want the file named file1 ; the next file not xab but file2 ; ...

– Katleen
Feb 4 at 13:51

OK, but the command you show simply takes the 1st 20000 lines and deletes everything else: sed -e 's/r//g' file1_1 > file1 will replace the contents of file1 with the modified contents of file1_1.

– terdon♦
Feb 4 at 13:53

add a comment |

2 Answers
2

active

oldest

votes

Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:

i=1;

filelen=$(wc -l < file1)

while [[ $i -le $filelen ]]; do 

    sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;

    ((i+=20000)); 

done

That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:

sed -i 's/r//; 200001q' file

edited Feb 5 at 0:13

Old Pro

5872514

answered Feb 4 at 13:41

terdon♦

130k32255433

Problem solved!

– Katleen
Feb 4 at 16:09

1

@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon♦
Feb 4 at 16:15

add a comment |

You can try to split the file with command split. If you want 20k records in file the command will be:

split -l 20000 file1

If you want specific prefix for result files use command like:

split -l 20000 file1 PREFIX

If you want numeric suffixes for result files use command like:

split -d -l 20000 file1 PREFIX

Those commands will create bunch of files, splitted by 20k lines each.

To remove ^M you can use loop like:

for i in PREFIX??

do

    dos2unix "$i" "${i}_unix"

done

This will add _unix on the end of files which have ^M removed

edited Feb 4 at 13:49

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

1

FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32

that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45

@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498590%2fsplit-file-in-different-records-using-a-loop-and-give-the-files-new-names%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:

i=1;

filelen=$(wc -l < file1)

while [[ $i -le $filelen ]]; do 

    sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;

    ((i+=20000)); 

done

That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:

sed -i 's/r//; 200001q' file

edited Feb 5 at 0:13

Old Pro

5872514

answered Feb 4 at 13:41

terdon♦

130k32255433

Problem solved!

– Katleen
Feb 4 at 16:09

1

@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon♦
Feb 4 at 16:15

add a comment |

Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:

i=1;

filelen=$(wc -l < file1)

while [[ $i -le $filelen ]]; do 

    sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;

    ((i+=20000)); 

done

That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:

sed -i 's/r//; 200001q' file

edited Feb 5 at 0:13

Old Pro

5872514

answered Feb 4 at 13:41

terdon♦

130k32255433

Problem solved!

– Katleen
Feb 4 at 16:09

1

@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon♦
Feb 4 at 16:15

add a comment |

Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:

i=1;

filelen=$(wc -l < file1)

while [[ $i -le $filelen ]]; do 

    sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;

    ((i+=20000)); 

done

That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:

sed -i 's/r//; 200001q' file

edited Feb 5 at 0:13

Old Pro

5872514

answered Feb 4 at 13:41

terdon♦

130k32255433

Romeo Ninov already gave you The Right Answer™: use split. But to answer the general case about sed, you could do the same thing with:

i=1;

filelen=$(wc -l < file1)

while [[ $i -le $filelen ]]; do 

    sed -n "s/r//;$i,$((i+19999))p;$(($i+20000))q;" file1 > file1.$i;

    ((i+=20000)); 

done

That saves each set of 20000 lines in a new file. If you really want to do what your question shows and only keep the 1st 20000 lines, it is much simpler:

sed -i 's/r//; 200001q' file

edited Feb 5 at 0:13

Old Pro

5872514

answered Feb 4 at 13:41

terdon♦

130k32255433

edited Feb 5 at 0:13

Old Pro

5872514

edited Feb 5 at 0:13

Old Pro

5872514

edited Feb 5 at 0:13

Old Pro

5872514

answered Feb 4 at 13:41

terdon♦

130k32255433

answered Feb 4 at 13:41

terdon♦

130k32255433

answered Feb 4 at 13:41

terdon♦

130k32255433

Problem solved!

– Katleen
Feb 4 at 16:09

1

@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon♦
Feb 4 at 16:15

add a comment |

Problem solved!

– Katleen
Feb 4 at 16:09

1

@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon♦
Feb 4 at 16:15

Problem solved!

– Katleen
Feb 4 at 16:09

@Katleen yay! But if one of the answers solved your issue (preferably Romeo's which gives the split solution), please take a moment to accept it by clicking on the checkmark on the left. That will mark the question as answered and is the way that thanks are conveyed on the Stack Exchange sites.

– terdon♦
Feb 4 at 16:15

add a comment |

You can try to split the file with command split. If you want 20k records in file the command will be:

split -l 20000 file1

If you want specific prefix for result files use command like:

split -l 20000 file1 PREFIX

If you want numeric suffixes for result files use command like:

split -d -l 20000 file1 PREFIX

Those commands will create bunch of files, splitted by 20k lines each.

To remove ^M you can use loop like:

for i in PREFIX??

do

    dos2unix "$i" "${i}_unix"

done

This will add _unix on the end of files which have ^M removed

edited Feb 4 at 13:49

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

1

FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32

that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45

@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49

add a comment |

You can try to split the file with command split. If you want 20k records in file the command will be:

split -l 20000 file1

If you want specific prefix for result files use command like:

split -l 20000 file1 PREFIX

If you want numeric suffixes for result files use command like:

split -d -l 20000 file1 PREFIX

Those commands will create bunch of files, splitted by 20k lines each.

To remove ^M you can use loop like:

for i in PREFIX??

do

    dos2unix "$i" "${i}_unix"

done

This will add _unix on the end of files which have ^M removed

edited Feb 4 at 13:49

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

1

FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32

that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45

@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49

add a comment |

You can try to split the file with command split. If you want 20k records in file the command will be:

split -l 20000 file1

If you want specific prefix for result files use command like:

split -l 20000 file1 PREFIX

If you want numeric suffixes for result files use command like:

split -d -l 20000 file1 PREFIX

Those commands will create bunch of files, splitted by 20k lines each.

To remove ^M you can use loop like:

for i in PREFIX??

do

    dos2unix "$i" "${i}_unix"

done

This will add _unix on the end of files which have ^M removed

edited Feb 4 at 13:49

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

You can try to split the file with command split. If you want 20k records in file the command will be:

split -l 20000 file1

If you want specific prefix for result files use command like:

split -l 20000 file1 PREFIX

If you want numeric suffixes for result files use command like:

split -d -l 20000 file1 PREFIX

Those commands will create bunch of files, splitted by 20k lines each.

To remove ^M you can use loop like:

for i in PREFIX??

do

    dos2unix "$i" "${i}_unix"

done

This will add _unix on the end of files which have ^M removed

edited Feb 4 at 13:49

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

edited Feb 4 at 13:49

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

answered Feb 4 at 13:09

Romeo Ninov

6,10832028

1

FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32

that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45

@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49

add a comment |

1

FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32

that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45

@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49

FWIW with GNU split, the CR removal could be added as a filter e.g. split -l 20000 --filter='sed "s/r$//" > "$FILE"' file1 PREFIX

– steeldriver
Feb 4 at 13:32

that's an easy answer! And it works! Thank you so much, I thought it would me more difficult. So right now I have 51 new files, but the name is like xaa till xby. Is it also possible to chose the name?

– Katleen
Feb 4 at 13:45

@Katleen, use my second example, this will make your files like PREFIXaa, PREFIXab and so on. Will add one more option to the answer :)

– Romeo Ninov
Feb 4 at 13:49

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky