Why is a directory copied with the cp command smaller than the original?












17















I am tying to copy one directory with a large number of files to another destination. I did:



cp -r src_dir another_destination/


Then I wanted to confirm that the size of the destination directory is the same as the original one:



du -s src_dir
3782288 src_dir

du -s another_destination/src_dir
3502320 another_destination/src_dir


Then I had a thought that there might be several symbolic links that are not followed by the cp command and added the -a flag:




-a Same as -pPR options. Preserves structure and attributes of files but not directory structure.




cp -a src_dir another_destination/


but du -s gave me the same results. It is interesting that both the source and destination have the same number of files and directories:



tree src_dir | wc -l
4293

tree another_destination/src_dir | wc -l
4293


What am I doing wrong that I get different sizes with the du command?



UPDATE



When I try to get sizes of individual directories with the du command I get different results:



du -s src_dir/sub_dir1
1112 src_dir/sub_dir1

du -s another_destination/src_dir/sub_dir1
1168 another_destination/src_dir/sub_dir1


When I view files with ls -la, individual file sizes are the same but totals are different:



ls -la src_dir/sub_dir1
total 1168
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png

ls -la another_destination/src_dir/sub_dir1
total 1112
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png









share|improve this question




















  • 1





    Interesting question. Are the source and destination different drives/ I winder if this comes down to the block size of the filesystems.

    – davidgo
    Feb 1 at 18:33













  • Hi @davidgo, the source and destination are different directories on the same drive. I updated the question with ls -la results. See UPDATE

    – Hirurg103
    Feb 1 at 18:35






  • 2





    What filesystem? It may be the directories themselves are larger (take more space) than they need to be. Compare this question. New directories created by cp are exactly as large as they need to be.

    – Kamil Maciorowski
    Feb 1 at 20:03











  • Use ls -ls to see how much disk space the files are using.

    – Barmar
    Feb 1 at 23:20






  • 1





    recursive md5sum is your friend when you need to verify that all files are actually copied and contents are same. rsync is another tool that can both copy and verify whole structures and files, also speeds up process if some of the files are already in place.

    – Sampo Sarrala
    Feb 3 at 15:35
















17















I am tying to copy one directory with a large number of files to another destination. I did:



cp -r src_dir another_destination/


Then I wanted to confirm that the size of the destination directory is the same as the original one:



du -s src_dir
3782288 src_dir

du -s another_destination/src_dir
3502320 another_destination/src_dir


Then I had a thought that there might be several symbolic links that are not followed by the cp command and added the -a flag:




-a Same as -pPR options. Preserves structure and attributes of files but not directory structure.




cp -a src_dir another_destination/


but du -s gave me the same results. It is interesting that both the source and destination have the same number of files and directories:



tree src_dir | wc -l
4293

tree another_destination/src_dir | wc -l
4293


What am I doing wrong that I get different sizes with the du command?



UPDATE



When I try to get sizes of individual directories with the du command I get different results:



du -s src_dir/sub_dir1
1112 src_dir/sub_dir1

du -s another_destination/src_dir/sub_dir1
1168 another_destination/src_dir/sub_dir1


When I view files with ls -la, individual file sizes are the same but totals are different:



ls -la src_dir/sub_dir1
total 1168
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png

ls -la another_destination/src_dir/sub_dir1
total 1112
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png









share|improve this question




















  • 1





    Interesting question. Are the source and destination different drives/ I winder if this comes down to the block size of the filesystems.

    – davidgo
    Feb 1 at 18:33













  • Hi @davidgo, the source and destination are different directories on the same drive. I updated the question with ls -la results. See UPDATE

    – Hirurg103
    Feb 1 at 18:35






  • 2





    What filesystem? It may be the directories themselves are larger (take more space) than they need to be. Compare this question. New directories created by cp are exactly as large as they need to be.

    – Kamil Maciorowski
    Feb 1 at 20:03











  • Use ls -ls to see how much disk space the files are using.

    – Barmar
    Feb 1 at 23:20






  • 1





    recursive md5sum is your friend when you need to verify that all files are actually copied and contents are same. rsync is another tool that can both copy and verify whole structures and files, also speeds up process if some of the files are already in place.

    – Sampo Sarrala
    Feb 3 at 15:35














17












17








17


3






I am tying to copy one directory with a large number of files to another destination. I did:



cp -r src_dir another_destination/


Then I wanted to confirm that the size of the destination directory is the same as the original one:



du -s src_dir
3782288 src_dir

du -s another_destination/src_dir
3502320 another_destination/src_dir


Then I had a thought that there might be several symbolic links that are not followed by the cp command and added the -a flag:




-a Same as -pPR options. Preserves structure and attributes of files but not directory structure.




cp -a src_dir another_destination/


but du -s gave me the same results. It is interesting that both the source and destination have the same number of files and directories:



tree src_dir | wc -l
4293

tree another_destination/src_dir | wc -l
4293


What am I doing wrong that I get different sizes with the du command?



UPDATE



When I try to get sizes of individual directories with the du command I get different results:



du -s src_dir/sub_dir1
1112 src_dir/sub_dir1

du -s another_destination/src_dir/sub_dir1
1168 another_destination/src_dir/sub_dir1


When I view files with ls -la, individual file sizes are the same but totals are different:



ls -la src_dir/sub_dir1
total 1168
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png

ls -la another_destination/src_dir/sub_dir1
total 1112
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png









share|improve this question
















I am tying to copy one directory with a large number of files to another destination. I did:



cp -r src_dir another_destination/


Then I wanted to confirm that the size of the destination directory is the same as the original one:



du -s src_dir
3782288 src_dir

du -s another_destination/src_dir
3502320 another_destination/src_dir


Then I had a thought that there might be several symbolic links that are not followed by the cp command and added the -a flag:




-a Same as -pPR options. Preserves structure and attributes of files but not directory structure.




cp -a src_dir another_destination/


but du -s gave me the same results. It is interesting that both the source and destination have the same number of files and directories:



tree src_dir | wc -l
4293

tree another_destination/src_dir | wc -l
4293


What am I doing wrong that I get different sizes with the du command?



UPDATE



When I try to get sizes of individual directories with the du command I get different results:



du -s src_dir/sub_dir1
1112 src_dir/sub_dir1

du -s another_destination/src_dir/sub_dir1
1168 another_destination/src_dir/sub_dir1


When I view files with ls -la, individual file sizes are the same but totals are different:



ls -la src_dir/sub_dir1
total 1168
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png

ls -la another_destination/src_dir/sub_dir1
total 1112
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png






command-line mac cp du






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 2 at 14:47









Boann

856717




856717










asked Feb 1 at 18:24









Hirurg103Hirurg103

1867




1867








  • 1





    Interesting question. Are the source and destination different drives/ I winder if this comes down to the block size of the filesystems.

    – davidgo
    Feb 1 at 18:33













  • Hi @davidgo, the source and destination are different directories on the same drive. I updated the question with ls -la results. See UPDATE

    – Hirurg103
    Feb 1 at 18:35






  • 2





    What filesystem? It may be the directories themselves are larger (take more space) than they need to be. Compare this question. New directories created by cp are exactly as large as they need to be.

    – Kamil Maciorowski
    Feb 1 at 20:03











  • Use ls -ls to see how much disk space the files are using.

    – Barmar
    Feb 1 at 23:20






  • 1





    recursive md5sum is your friend when you need to verify that all files are actually copied and contents are same. rsync is another tool that can both copy and verify whole structures and files, also speeds up process if some of the files are already in place.

    – Sampo Sarrala
    Feb 3 at 15:35














  • 1





    Interesting question. Are the source and destination different drives/ I winder if this comes down to the block size of the filesystems.

    – davidgo
    Feb 1 at 18:33













  • Hi @davidgo, the source and destination are different directories on the same drive. I updated the question with ls -la results. See UPDATE

    – Hirurg103
    Feb 1 at 18:35






  • 2





    What filesystem? It may be the directories themselves are larger (take more space) than they need to be. Compare this question. New directories created by cp are exactly as large as they need to be.

    – Kamil Maciorowski
    Feb 1 at 20:03











  • Use ls -ls to see how much disk space the files are using.

    – Barmar
    Feb 1 at 23:20






  • 1





    recursive md5sum is your friend when you need to verify that all files are actually copied and contents are same. rsync is another tool that can both copy and verify whole structures and files, also speeds up process if some of the files are already in place.

    – Sampo Sarrala
    Feb 3 at 15:35








1




1





Interesting question. Are the source and destination different drives/ I winder if this comes down to the block size of the filesystems.

– davidgo
Feb 1 at 18:33







Interesting question. Are the source and destination different drives/ I winder if this comes down to the block size of the filesystems.

– davidgo
Feb 1 at 18:33















Hi @davidgo, the source and destination are different directories on the same drive. I updated the question with ls -la results. See UPDATE

– Hirurg103
Feb 1 at 18:35





Hi @davidgo, the source and destination are different directories on the same drive. I updated the question with ls -la results. See UPDATE

– Hirurg103
Feb 1 at 18:35




2




2





What filesystem? It may be the directories themselves are larger (take more space) than they need to be. Compare this question. New directories created by cp are exactly as large as they need to be.

– Kamil Maciorowski
Feb 1 at 20:03





What filesystem? It may be the directories themselves are larger (take more space) than they need to be. Compare this question. New directories created by cp are exactly as large as they need to be.

– Kamil Maciorowski
Feb 1 at 20:03













Use ls -ls to see how much disk space the files are using.

– Barmar
Feb 1 at 23:20





Use ls -ls to see how much disk space the files are using.

– Barmar
Feb 1 at 23:20




1




1





recursive md5sum is your friend when you need to verify that all files are actually copied and contents are same. rsync is another tool that can both copy and verify whole structures and files, also speeds up process if some of the files are already in place.

– Sampo Sarrala
Feb 3 at 15:35





recursive md5sum is your friend when you need to verify that all files are actually copied and contents are same. rsync is another tool that can both copy and verify whole structures and files, also speeds up process if some of the files are already in place.

– Sampo Sarrala
Feb 3 at 15:35










2 Answers
2






active

oldest

votes


















21














That is because du by default shows not the size of the file(s), but the disk space that they are using. You need to use the -b option to get sum of file sizes, instead of total of disk space used. For example:



% printf test123 > a
% ls -l a
-rw-r--r-- 1 mnalis mnalis 7 Feb 1 19:57 a
% du -h a
4,0K a
% du -hb a
7 a


Even though the file is only 7 bytes long, it will occupy a whole 4096 bytes of disk space (in my particular example; it will vary depending on the filesystem used, cluster size etc).



Also, some filesystems support so-called sparse files, which do not use any disk space for blocks which are all zeros. For example:



% dd if=/dev/zero of=regular.bin bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0,000131003 s, 313 MB/s
% cp --sparse=always regular.bin sparse.bin
% ls -l *.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 regular.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 sparse.bin
% du -h *.bin
40K regular.bin
0 sparse.bin
% du -hb *.bin
40960 regular.bin
40960 sparse.bin


In short, to verify all files were copied, you'd use du -sb instead of du -s.






share|improve this answer





















  • 1





    not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

    – phuclv
    Feb 2 at 6:58






  • 1





    And weird results on btrfs/zfs.

    – val
    Feb 2 at 11:19






  • 2





    @val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

    – Peter Cordes
    Feb 2 at 19:08











  • @PeterCordes But CoW stuff makes du output pretty senseless.

    – val
    Feb 2 at 20:09











  • What about duplicate files? Can't modern systems save space by recognizing duplicate content?

    – FreeSoftwareServers
    Feb 2 at 21:39



















11














It might be due to the size of the directory "files".



In most filesystems, on disk, a directory is much like a regular file (with just a list of names and node numbers, mostly), using more blocks as it grows.



If you add many files, the directory itself grows. But if you remove them afterwards, in many filesystems, the directory will not shrink.



So if one of the directories in your original tree had many files at some point, which were later deleted, the copy of that directory will be "smaller", as it only uses as many blocks as it needs for the current number of files.



In the listings in your update, there are 3 directories you haven't listed. Compare the size of those (or descendants of those) in your ls -al output.



To find where the difference is, you can try an ls -alr on both directories, redirected to a file, and then a diff of the two outputs.






share|improve this answer





















  • 1





    Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

    – Matija Nalis
    Feb 2 at 6:44











  • @MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

    – TripeHound
    Feb 2 at 7:36











  • > In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

    – Hirurg103
    Feb 2 at 8:42








  • 1





    @Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

    – jcaron
    Feb 2 at 12:06











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1401094%2fwhy-is-a-directory-copied-with-the-cp-command-smaller-than-the-original%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









21














That is because du by default shows not the size of the file(s), but the disk space that they are using. You need to use the -b option to get sum of file sizes, instead of total of disk space used. For example:



% printf test123 > a
% ls -l a
-rw-r--r-- 1 mnalis mnalis 7 Feb 1 19:57 a
% du -h a
4,0K a
% du -hb a
7 a


Even though the file is only 7 bytes long, it will occupy a whole 4096 bytes of disk space (in my particular example; it will vary depending on the filesystem used, cluster size etc).



Also, some filesystems support so-called sparse files, which do not use any disk space for blocks which are all zeros. For example:



% dd if=/dev/zero of=regular.bin bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0,000131003 s, 313 MB/s
% cp --sparse=always regular.bin sparse.bin
% ls -l *.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 regular.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 sparse.bin
% du -h *.bin
40K regular.bin
0 sparse.bin
% du -hb *.bin
40960 regular.bin
40960 sparse.bin


In short, to verify all files were copied, you'd use du -sb instead of du -s.






share|improve this answer





















  • 1





    not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

    – phuclv
    Feb 2 at 6:58






  • 1





    And weird results on btrfs/zfs.

    – val
    Feb 2 at 11:19






  • 2





    @val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

    – Peter Cordes
    Feb 2 at 19:08











  • @PeterCordes But CoW stuff makes du output pretty senseless.

    – val
    Feb 2 at 20:09











  • What about duplicate files? Can't modern systems save space by recognizing duplicate content?

    – FreeSoftwareServers
    Feb 2 at 21:39
















21














That is because du by default shows not the size of the file(s), but the disk space that they are using. You need to use the -b option to get sum of file sizes, instead of total of disk space used. For example:



% printf test123 > a
% ls -l a
-rw-r--r-- 1 mnalis mnalis 7 Feb 1 19:57 a
% du -h a
4,0K a
% du -hb a
7 a


Even though the file is only 7 bytes long, it will occupy a whole 4096 bytes of disk space (in my particular example; it will vary depending on the filesystem used, cluster size etc).



Also, some filesystems support so-called sparse files, which do not use any disk space for blocks which are all zeros. For example:



% dd if=/dev/zero of=regular.bin bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0,000131003 s, 313 MB/s
% cp --sparse=always regular.bin sparse.bin
% ls -l *.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 regular.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 sparse.bin
% du -h *.bin
40K regular.bin
0 sparse.bin
% du -hb *.bin
40960 regular.bin
40960 sparse.bin


In short, to verify all files were copied, you'd use du -sb instead of du -s.






share|improve this answer





















  • 1





    not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

    – phuclv
    Feb 2 at 6:58






  • 1





    And weird results on btrfs/zfs.

    – val
    Feb 2 at 11:19






  • 2





    @val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

    – Peter Cordes
    Feb 2 at 19:08











  • @PeterCordes But CoW stuff makes du output pretty senseless.

    – val
    Feb 2 at 20:09











  • What about duplicate files? Can't modern systems save space by recognizing duplicate content?

    – FreeSoftwareServers
    Feb 2 at 21:39














21












21








21







That is because du by default shows not the size of the file(s), but the disk space that they are using. You need to use the -b option to get sum of file sizes, instead of total of disk space used. For example:



% printf test123 > a
% ls -l a
-rw-r--r-- 1 mnalis mnalis 7 Feb 1 19:57 a
% du -h a
4,0K a
% du -hb a
7 a


Even though the file is only 7 bytes long, it will occupy a whole 4096 bytes of disk space (in my particular example; it will vary depending on the filesystem used, cluster size etc).



Also, some filesystems support so-called sparse files, which do not use any disk space for blocks which are all zeros. For example:



% dd if=/dev/zero of=regular.bin bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0,000131003 s, 313 MB/s
% cp --sparse=always regular.bin sparse.bin
% ls -l *.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 regular.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 sparse.bin
% du -h *.bin
40K regular.bin
0 sparse.bin
% du -hb *.bin
40960 regular.bin
40960 sparse.bin


In short, to verify all files were copied, you'd use du -sb instead of du -s.






share|improve this answer















That is because du by default shows not the size of the file(s), but the disk space that they are using. You need to use the -b option to get sum of file sizes, instead of total of disk space used. For example:



% printf test123 > a
% ls -l a
-rw-r--r-- 1 mnalis mnalis 7 Feb 1 19:57 a
% du -h a
4,0K a
% du -hb a
7 a


Even though the file is only 7 bytes long, it will occupy a whole 4096 bytes of disk space (in my particular example; it will vary depending on the filesystem used, cluster size etc).



Also, some filesystems support so-called sparse files, which do not use any disk space for blocks which are all zeros. For example:



% dd if=/dev/zero of=regular.bin bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0,000131003 s, 313 MB/s
% cp --sparse=always regular.bin sparse.bin
% ls -l *.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 regular.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 sparse.bin
% du -h *.bin
40K regular.bin
0 sparse.bin
% du -hb *.bin
40960 regular.bin
40960 sparse.bin


In short, to verify all files were copied, you'd use du -sb instead of du -s.







share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 2 at 18:03









Boann

856717




856717










answered Feb 1 at 18:59









Matija NalisMatija Nalis

1,885917




1,885917








  • 1





    not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

    – phuclv
    Feb 2 at 6:58






  • 1





    And weird results on btrfs/zfs.

    – val
    Feb 2 at 11:19






  • 2





    @val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

    – Peter Cordes
    Feb 2 at 19:08











  • @PeterCordes But CoW stuff makes du output pretty senseless.

    – val
    Feb 2 at 20:09











  • What about duplicate files? Can't modern systems save space by recognizing duplicate content?

    – FreeSoftwareServers
    Feb 2 at 21:39














  • 1





    not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

    – phuclv
    Feb 2 at 6:58






  • 1





    And weird results on btrfs/zfs.

    – val
    Feb 2 at 11:19






  • 2





    @val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

    – Peter Cordes
    Feb 2 at 19:08











  • @PeterCordes But CoW stuff makes du output pretty senseless.

    – val
    Feb 2 at 20:09











  • What about duplicate files? Can't modern systems save space by recognizing duplicate content?

    – FreeSoftwareServers
    Feb 2 at 21:39








1




1





not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

– phuclv
Feb 2 at 6:58





not only sparse files but compressed files and inline files/resident files also cause the size on disk to become smaller than the file size

– phuclv
Feb 2 at 6:58




1




1





And weird results on btrfs/zfs.

– val
Feb 2 at 11:19





And weird results on btrfs/zfs.

– val
Feb 2 at 11:19




2




2





@val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

– Peter Cordes
Feb 2 at 19:08





@val: BTRFS compression doesn't affect du output: that would make compressed files look sparse to programs that use the usual algorithm of length != used blocks. btrfs.wiki.kernel.org/index.php/…

– Peter Cordes
Feb 2 at 19:08













@PeterCordes But CoW stuff makes du output pretty senseless.

– val
Feb 2 at 20:09





@PeterCordes But CoW stuff makes du output pretty senseless.

– val
Feb 2 at 20:09













What about duplicate files? Can't modern systems save space by recognizing duplicate content?

– FreeSoftwareServers
Feb 2 at 21:39





What about duplicate files? Can't modern systems save space by recognizing duplicate content?

– FreeSoftwareServers
Feb 2 at 21:39













11














It might be due to the size of the directory "files".



In most filesystems, on disk, a directory is much like a regular file (with just a list of names and node numbers, mostly), using more blocks as it grows.



If you add many files, the directory itself grows. But if you remove them afterwards, in many filesystems, the directory will not shrink.



So if one of the directories in your original tree had many files at some point, which were later deleted, the copy of that directory will be "smaller", as it only uses as many blocks as it needs for the current number of files.



In the listings in your update, there are 3 directories you haven't listed. Compare the size of those (or descendants of those) in your ls -al output.



To find where the difference is, you can try an ls -alr on both directories, redirected to a file, and then a diff of the two outputs.






share|improve this answer





















  • 1





    Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

    – Matija Nalis
    Feb 2 at 6:44











  • @MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

    – TripeHound
    Feb 2 at 7:36











  • > In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

    – Hirurg103
    Feb 2 at 8:42








  • 1





    @Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

    – jcaron
    Feb 2 at 12:06
















11














It might be due to the size of the directory "files".



In most filesystems, on disk, a directory is much like a regular file (with just a list of names and node numbers, mostly), using more blocks as it grows.



If you add many files, the directory itself grows. But if you remove them afterwards, in many filesystems, the directory will not shrink.



So if one of the directories in your original tree had many files at some point, which were later deleted, the copy of that directory will be "smaller", as it only uses as many blocks as it needs for the current number of files.



In the listings in your update, there are 3 directories you haven't listed. Compare the size of those (or descendants of those) in your ls -al output.



To find where the difference is, you can try an ls -alr on both directories, redirected to a file, and then a diff of the two outputs.






share|improve this answer





















  • 1





    Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

    – Matija Nalis
    Feb 2 at 6:44











  • @MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

    – TripeHound
    Feb 2 at 7:36











  • > In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

    – Hirurg103
    Feb 2 at 8:42








  • 1





    @Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

    – jcaron
    Feb 2 at 12:06














11












11








11







It might be due to the size of the directory "files".



In most filesystems, on disk, a directory is much like a regular file (with just a list of names and node numbers, mostly), using more blocks as it grows.



If you add many files, the directory itself grows. But if you remove them afterwards, in many filesystems, the directory will not shrink.



So if one of the directories in your original tree had many files at some point, which were later deleted, the copy of that directory will be "smaller", as it only uses as many blocks as it needs for the current number of files.



In the listings in your update, there are 3 directories you haven't listed. Compare the size of those (or descendants of those) in your ls -al output.



To find where the difference is, you can try an ls -alr on both directories, redirected to a file, and then a diff of the two outputs.






share|improve this answer















It might be due to the size of the directory "files".



In most filesystems, on disk, a directory is much like a regular file (with just a list of names and node numbers, mostly), using more blocks as it grows.



If you add many files, the directory itself grows. But if you remove them afterwards, in many filesystems, the directory will not shrink.



So if one of the directories in your original tree had many files at some point, which were later deleted, the copy of that directory will be "smaller", as it only uses as many blocks as it needs for the current number of files.



In the listings in your update, there are 3 directories you haven't listed. Compare the size of those (or descendants of those) in your ls -al output.



To find where the difference is, you can try an ls -alr on both directories, redirected to a file, and then a diff of the two outputs.







share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 4 at 12:20









Ramhound

20.2k156085




20.2k156085










answered Feb 1 at 23:29









jcaronjcaron

548211




548211








  • 1





    Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

    – Matija Nalis
    Feb 2 at 6:44











  • @MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

    – TripeHound
    Feb 2 at 7:36











  • > In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

    – Hirurg103
    Feb 2 at 8:42








  • 1





    @Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

    – jcaron
    Feb 2 at 12:06














  • 1





    Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

    – Matija Nalis
    Feb 2 at 6:44











  • @MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

    – TripeHound
    Feb 2 at 7:36











  • > In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

    – Hirurg103
    Feb 2 at 8:42








  • 1





    @Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

    – jcaron
    Feb 2 at 12:06








1




1





Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

– Matija Nalis
Feb 2 at 6:44





Good catch for another possibility! However, in case of OPs cp -a src_dir another_destination/ it is unlikely, as another_destionation would be newly created and thus optimized, while src_dir (which might have had some bigger directories from past creation/additions) could indeed be bigger than needed. However results show that src_dir is actually smaller (1112 < 1168).

– Matija Nalis
Feb 2 at 6:44













@MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

– TripeHound
Feb 2 at 7:36





@MatijaNalis Only the first example after "Update" shows that (1112 < 1168)... the example below that has the figures reversed, and the first example also shows the source larger (3782288 vs. 3502320). Possibly a typo by OP?

– TripeHound
Feb 2 at 7:36













> In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

– Hirurg103
Feb 2 at 8:42







> In the listings in your update, there are 3 directories you haven't listed. Actually they are files, not directories. see the file names > if one of the directories in your original tree had many files at some point, which were later deleted. I copied the source directory from a remote server with the rsync command and didn't delete anything from it

– Hirurg103
Feb 2 at 8:42






1




1





@Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

– jcaron
Feb 2 at 12:06





@Hirurg103 the . entries show 5 links on the inode. One is the link from the parent directory to this one. Another is .. There are 3 more links, which should be .. links from subdirectories. Unless I’m missing something very weird, there must be 3 subdirectories in those. Are you saying that those listings are the full output?

– jcaron
Feb 2 at 12:06


















draft saved

draft discarded




















































Thanks for contributing an answer to Super User!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1401094%2fwhy-is-a-directory-copied-with-the-cp-command-smaller-than-the-original%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to change which sound is reproduced for terminal bell?

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

Can I use Tabulator js library in my java Spring + Thymeleaf project?