Add lines to files to make them equal length
up vote
4
down vote
favorite
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
add a comment |
up vote
4
down vote
favorite
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
1
A simple (but probably not optimal) way to do it would be to usewcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.
– Bear'sBeard
Dec 4 at 10:45
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
Dec 4 at 11:12
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
Dec 4 at 11:50
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
Dec 4 at 11:51
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
shell-script text-processing awk files csv
edited Dec 4 at 10:35
Jeff Schaller
37.5k1052121
37.5k1052121
asked Dec 4 at 9:49
myradio
2459
2459
1
A simple (but probably not optimal) way to do it would be to usewcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.
– Bear'sBeard
Dec 4 at 10:45
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
Dec 4 at 11:12
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
Dec 4 at 11:50
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
Dec 4 at 11:51
add a comment |
1
A simple (but probably not optimal) way to do it would be to usewcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.
– Bear'sBeard
Dec 4 at 10:45
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
Dec 4 at 11:12
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
Dec 4 at 11:50
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
Dec 4 at 11:51
1
1
A simple (but probably not optimal) way to do it would be to use
wc to count the line count of each file to find the max. You can then echo ";;;;" >> file in each file until the line count reach the max.– Bear'sBeard
Dec 4 at 10:45
A simple (but probably not optimal) way to do it would be to use
wc to count the line count of each file to find the max. You can then echo ";;;;" >> file in each file until the line count reach the max.– Bear'sBeard
Dec 4 at 10:45
1
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
Dec 4 at 11:12
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
Dec 4 at 11:12
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
Dec 4 at 11:50
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
Dec 4 at 11:50
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
Dec 4 at 11:51
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
Dec 4 at 11:51
add a comment |
3 Answers
3
active
oldest
votes
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
Dec 4 at 12:28
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
Dec 4 at 12:30
2
@Sparhawk: I think you meantwc -l < $name
– Thor
Dec 4 at 12:37
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
Dec 4 at 20:54
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
Dec 5 at 8:07
|
show 1 more comment
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
Dec 5 at 9:22
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
Dec 5 at 9:52
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
add a comment |
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
Dec 4 at 12:28
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
Dec 4 at 12:30
2
@Sparhawk: I think you meantwc -l < $name
– Thor
Dec 4 at 12:37
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
Dec 4 at 20:54
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
Dec 5 at 8:07
|
show 1 more comment
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
Dec 4 at 12:28
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
Dec 4 at 12:30
2
@Sparhawk: I think you meantwc -l < $name
– Thor
Dec 4 at 12:37
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
Dec 4 at 20:54
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
Dec 5 at 8:07
|
show 1 more comment
up vote
3
down vote
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
edited Dec 5 at 9:08
answered Dec 4 at 11:54
myradio
2459
2459
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
Dec 4 at 12:28
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
Dec 4 at 12:30
2
@Sparhawk: I think you meantwc -l < $name
– Thor
Dec 4 at 12:37
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
Dec 4 at 20:54
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
Dec 5 at 8:07
|
show 1 more comment
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
Dec 4 at 12:28
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
Dec 4 at 12:30
2
@Sparhawk: I think you meantwc -l < $name
– Thor
Dec 4 at 12:37
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
Dec 4 at 20:54
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
Dec 5 at 8:07
1
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just use
for name in files*pattern.txt; do instead.– Sparhawk
Dec 4 at 12:28
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just use
for name in files*pattern.txt; do instead.– Sparhawk
Dec 4 at 12:28
1
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just do
wc -l $name– Sparhawk
Dec 4 at 12:30
And while I'm nit-picking, there's a "useless use of cat" there too. Just do
wc -l $name– Sparhawk
Dec 4 at 12:30
2
2
@Sparhawk: I think you meant
wc -l < $name– Thor
Dec 4 at 12:37
@Sparhawk: I think you meant
wc -l < $name– Thor
Dec 4 at 12:37
@Thor No?
wc [OPTION]... [FILE]... works too, as per the man. In fact, this script uses this construction in an earlier line.– Sparhawk
Dec 4 at 20:54
@Thor No?
wc [OPTION]... [FILE]... works too, as per the man. In fact, this script uses this construction in an earlier line.– Sparhawk
Dec 4 at 20:54
@Sparhawk: Sure, but if you wanted the equivalent output of
cat file | wc -l redirection is the way to go.– Thor
Dec 5 at 8:07
@Sparhawk: Sure, but if you wanted the equivalent output of
cat file | wc -l redirection is the way to go.– Thor
Dec 5 at 8:07
|
show 1 more comment
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
Dec 5 at 9:22
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
Dec 5 at 9:52
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
add a comment |
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
Dec 5 at 9:22
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
Dec 5 at 9:52
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
add a comment |
up vote
2
down vote
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
edited Dec 5 at 9:51
answered Dec 4 at 13:53
RoVo
2,444215
2,444215
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
Dec 5 at 9:22
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
Dec 5 at 9:52
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
add a comment |
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
Dec 5 at 9:22
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
Dec 5 at 9:52
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
I like this solution. It's certainly harder to read but is good that had a dynamic
FS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem, awk complains about gensub being undefined. Is gensub maybe on gawk instead?– myradio
Dec 5 at 9:22
I like this solution. It's certainly harder to read but is good that had a dynamic
FS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem, awk complains about gensub being undefined. Is gensub maybe on gawk instead?– myradio
Dec 5 at 9:22
yeah that seems to be GNU Awk. I replaced it with the
gsub solution from the linked answer.– RoVo
Dec 5 at 9:52
yeah that seems to be GNU Awk. I replaced it with the
gsub solution from the linked answer.– RoVo
Dec 5 at 9:52
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
Dec 5 at 9:54
add a comment |
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
add a comment |
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
add a comment |
up vote
1
down vote
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
answered Dec 4 at 15:42
JigglyNaga
3,593829
3,593829
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485857%2fadd-lines-to-files-to-make-them-equal-length%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
A simple (but probably not optimal) way to do it would be to use
wcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.– Bear'sBeard
Dec 4 at 10:45
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
Dec 4 at 11:12
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
Dec 4 at 11:50
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
Dec 4 at 11:51