Why is the size of npy bigger than csv?
I changed csv to npy file. After the change, size of csv file is 5GB, and npy is 13GB.
I thought a npy file is more efficient than csv.
Am I misunderstanding this? Why is the size of npy bigger than csv?
I just used this code
full = pd.read_csv('data/RGB.csv', header=None).values
np.save('data/RGB.npy', full, allow_pickle=False, fix_imports=False)
and data structure like this:
R, G, B, is_skin
2, 5, 1, 0
10, 52, 242, 1
52, 240, 42, 0
...(row is 420,711,257)
python-3.x csv numpy
add a comment |
I changed csv to npy file. After the change, size of csv file is 5GB, and npy is 13GB.
I thought a npy file is more efficient than csv.
Am I misunderstanding this? Why is the size of npy bigger than csv?
I just used this code
full = pd.read_csv('data/RGB.csv', header=None).values
np.save('data/RGB.npy', full, allow_pickle=False, fix_imports=False)
and data structure like this:
R, G, B, is_skin
2, 5, 1, 0
10, 52, 242, 1
52, 240, 42, 0
...(row is 420,711,257)
python-3.x csv numpy
1
Maybe you can share relevant parts of the code that you used to convert your csv to numpy ?
– Fredz0r
Nov 21 '18 at 8:10
1
to me it looks more like 5GB vs 13GB. Can you also post a typical line from RGB.csv? Depending on how your data looks one format can lead to smaller file sizes than another.
– ead
Nov 21 '18 at 8:11
@Fredz0r I updated code!
– YeongHwa Jin
Nov 21 '18 at 8:15
@ead yes, 5Gb and 13GB. I'm confused :( . Data is just 2D array of RGB values. I posted it.
– YeongHwa Jin
Nov 21 '18 at 8:29
add a comment |
I changed csv to npy file. After the change, size of csv file is 5GB, and npy is 13GB.
I thought a npy file is more efficient than csv.
Am I misunderstanding this? Why is the size of npy bigger than csv?
I just used this code
full = pd.read_csv('data/RGB.csv', header=None).values
np.save('data/RGB.npy', full, allow_pickle=False, fix_imports=False)
and data structure like this:
R, G, B, is_skin
2, 5, 1, 0
10, 52, 242, 1
52, 240, 42, 0
...(row is 420,711,257)
python-3.x csv numpy
I changed csv to npy file. After the change, size of csv file is 5GB, and npy is 13GB.
I thought a npy file is more efficient than csv.
Am I misunderstanding this? Why is the size of npy bigger than csv?
I just used this code
full = pd.read_csv('data/RGB.csv', header=None).values
np.save('data/RGB.npy', full, allow_pickle=False, fix_imports=False)
and data structure like this:
R, G, B, is_skin
2, 5, 1, 0
10, 52, 242, 1
52, 240, 42, 0
...(row is 420,711,257)
python-3.x csv numpy
python-3.x csv numpy
edited Nov 21 '18 at 9:23
ead
13.3k23059
13.3k23059
asked Nov 21 '18 at 7:55
YeongHwa JinYeongHwa Jin
386
386
1
Maybe you can share relevant parts of the code that you used to convert your csv to numpy ?
– Fredz0r
Nov 21 '18 at 8:10
1
to me it looks more like 5GB vs 13GB. Can you also post a typical line from RGB.csv? Depending on how your data looks one format can lead to smaller file sizes than another.
– ead
Nov 21 '18 at 8:11
@Fredz0r I updated code!
– YeongHwa Jin
Nov 21 '18 at 8:15
@ead yes, 5Gb and 13GB. I'm confused :( . Data is just 2D array of RGB values. I posted it.
– YeongHwa Jin
Nov 21 '18 at 8:29
add a comment |
1
Maybe you can share relevant parts of the code that you used to convert your csv to numpy ?
– Fredz0r
Nov 21 '18 at 8:10
1
to me it looks more like 5GB vs 13GB. Can you also post a typical line from RGB.csv? Depending on how your data looks one format can lead to smaller file sizes than another.
– ead
Nov 21 '18 at 8:11
@Fredz0r I updated code!
– YeongHwa Jin
Nov 21 '18 at 8:15
@ead yes, 5Gb and 13GB. I'm confused :( . Data is just 2D array of RGB values. I posted it.
– YeongHwa Jin
Nov 21 '18 at 8:29
1
1
Maybe you can share relevant parts of the code that you used to convert your csv to numpy ?
– Fredz0r
Nov 21 '18 at 8:10
Maybe you can share relevant parts of the code that you used to convert your csv to numpy ?
– Fredz0r
Nov 21 '18 at 8:10
1
1
to me it looks more like 5GB vs 13GB. Can you also post a typical line from RGB.csv? Depending on how your data looks one format can lead to smaller file sizes than another.
– ead
Nov 21 '18 at 8:11
to me it looks more like 5GB vs 13GB. Can you also post a typical line from RGB.csv? Depending on how your data looks one format can lead to smaller file sizes than another.
– ead
Nov 21 '18 at 8:11
@Fredz0r I updated code!
– YeongHwa Jin
Nov 21 '18 at 8:15
@Fredz0r I updated code!
– YeongHwa Jin
Nov 21 '18 at 8:15
@ead yes, 5Gb and 13GB. I'm confused :( . Data is just 2D array of RGB values. I posted it.
– YeongHwa Jin
Nov 21 '18 at 8:29
@ead yes, 5Gb and 13GB. I'm confused :( . Data is just 2D array of RGB values. I posted it.
– YeongHwa Jin
Nov 21 '18 at 8:29
add a comment |
1 Answer
1
active
oldest
votes
In your case an element is an integer between 0 and 255, inclusive. That means, saved as ASCII it will need at most
- 3 chars for the number
- 1 char for
,
- 1 char for the whitespace
which results in at most 5 bytes (somewhat less on average) per element on the disc.
Pandas reads/interprets this as an int64
array (see full.dtype
) as default, which means it needs 8 bytes per element, which leads to a bigger size of the npy-file (most of which are zeros!).
To save an integer between 0 and 255 we need only one byte, thus the size of the npy-file could be reduced by factor 8 without loosing any information - just tell pandas it needs to interpret the data as unsigned 8bit-integers:
full = pd.read_csv(r'e:data.csv', dtype=np.uint8).values
# or to get rid of pandas-dependency:
# full = np.genfromtxt(r'e:data.csv', delimiter=',', dtype=np.uint8, skip_header=1)
np.save(r'e:/RGB.npy', full, allow_pickle=False, fix_imports=False)
# an 8 times smaller npy-file
Most of the time npy-format needs less space, however there can be situations when the ASCII format results in smaller files.
For example if data consist mostly of very small numbers with one digit and some few very big numbers, that for them really 8bytes are needed:
- in ASCII-format you pay on average 2 bytes per element (there is no need to write whitespace,
,
alone as delimiter is good enough). - in numpy-format you will pay 8 bytes per element.
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407468%2fwhy-is-the-size-of-npy-bigger-than-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
In your case an element is an integer between 0 and 255, inclusive. That means, saved as ASCII it will need at most
- 3 chars for the number
- 1 char for
,
- 1 char for the whitespace
which results in at most 5 bytes (somewhat less on average) per element on the disc.
Pandas reads/interprets this as an int64
array (see full.dtype
) as default, which means it needs 8 bytes per element, which leads to a bigger size of the npy-file (most of which are zeros!).
To save an integer between 0 and 255 we need only one byte, thus the size of the npy-file could be reduced by factor 8 without loosing any information - just tell pandas it needs to interpret the data as unsigned 8bit-integers:
full = pd.read_csv(r'e:data.csv', dtype=np.uint8).values
# or to get rid of pandas-dependency:
# full = np.genfromtxt(r'e:data.csv', delimiter=',', dtype=np.uint8, skip_header=1)
np.save(r'e:/RGB.npy', full, allow_pickle=False, fix_imports=False)
# an 8 times smaller npy-file
Most of the time npy-format needs less space, however there can be situations when the ASCII format results in smaller files.
For example if data consist mostly of very small numbers with one digit and some few very big numbers, that for them really 8bytes are needed:
- in ASCII-format you pay on average 2 bytes per element (there is no need to write whitespace,
,
alone as delimiter is good enough). - in numpy-format you will pay 8 bytes per element.
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
add a comment |
In your case an element is an integer between 0 and 255, inclusive. That means, saved as ASCII it will need at most
- 3 chars for the number
- 1 char for
,
- 1 char for the whitespace
which results in at most 5 bytes (somewhat less on average) per element on the disc.
Pandas reads/interprets this as an int64
array (see full.dtype
) as default, which means it needs 8 bytes per element, which leads to a bigger size of the npy-file (most of which are zeros!).
To save an integer between 0 and 255 we need only one byte, thus the size of the npy-file could be reduced by factor 8 without loosing any information - just tell pandas it needs to interpret the data as unsigned 8bit-integers:
full = pd.read_csv(r'e:data.csv', dtype=np.uint8).values
# or to get rid of pandas-dependency:
# full = np.genfromtxt(r'e:data.csv', delimiter=',', dtype=np.uint8, skip_header=1)
np.save(r'e:/RGB.npy', full, allow_pickle=False, fix_imports=False)
# an 8 times smaller npy-file
Most of the time npy-format needs less space, however there can be situations when the ASCII format results in smaller files.
For example if data consist mostly of very small numbers with one digit and some few very big numbers, that for them really 8bytes are needed:
- in ASCII-format you pay on average 2 bytes per element (there is no need to write whitespace,
,
alone as delimiter is good enough). - in numpy-format you will pay 8 bytes per element.
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
add a comment |
In your case an element is an integer between 0 and 255, inclusive. That means, saved as ASCII it will need at most
- 3 chars for the number
- 1 char for
,
- 1 char for the whitespace
which results in at most 5 bytes (somewhat less on average) per element on the disc.
Pandas reads/interprets this as an int64
array (see full.dtype
) as default, which means it needs 8 bytes per element, which leads to a bigger size of the npy-file (most of which are zeros!).
To save an integer between 0 and 255 we need only one byte, thus the size of the npy-file could be reduced by factor 8 without loosing any information - just tell pandas it needs to interpret the data as unsigned 8bit-integers:
full = pd.read_csv(r'e:data.csv', dtype=np.uint8).values
# or to get rid of pandas-dependency:
# full = np.genfromtxt(r'e:data.csv', delimiter=',', dtype=np.uint8, skip_header=1)
np.save(r'e:/RGB.npy', full, allow_pickle=False, fix_imports=False)
# an 8 times smaller npy-file
Most of the time npy-format needs less space, however there can be situations when the ASCII format results in smaller files.
For example if data consist mostly of very small numbers with one digit and some few very big numbers, that for them really 8bytes are needed:
- in ASCII-format you pay on average 2 bytes per element (there is no need to write whitespace,
,
alone as delimiter is good enough). - in numpy-format you will pay 8 bytes per element.
In your case an element is an integer between 0 and 255, inclusive. That means, saved as ASCII it will need at most
- 3 chars for the number
- 1 char for
,
- 1 char for the whitespace
which results in at most 5 bytes (somewhat less on average) per element on the disc.
Pandas reads/interprets this as an int64
array (see full.dtype
) as default, which means it needs 8 bytes per element, which leads to a bigger size of the npy-file (most of which are zeros!).
To save an integer between 0 and 255 we need only one byte, thus the size of the npy-file could be reduced by factor 8 without loosing any information - just tell pandas it needs to interpret the data as unsigned 8bit-integers:
full = pd.read_csv(r'e:data.csv', dtype=np.uint8).values
# or to get rid of pandas-dependency:
# full = np.genfromtxt(r'e:data.csv', delimiter=',', dtype=np.uint8, skip_header=1)
np.save(r'e:/RGB.npy', full, allow_pickle=False, fix_imports=False)
# an 8 times smaller npy-file
Most of the time npy-format needs less space, however there can be situations when the ASCII format results in smaller files.
For example if data consist mostly of very small numbers with one digit and some few very big numbers, that for them really 8bytes are needed:
- in ASCII-format you pay on average 2 bytes per element (there is no need to write whitespace,
,
alone as delimiter is good enough). - in numpy-format you will pay 8 bytes per element.
edited Nov 21 '18 at 9:36
answered Nov 21 '18 at 9:21
eadead
13.3k23059
13.3k23059
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
add a comment |
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
Wow, beautiful answer...! (๑°ㅁ°๑)‼ thanks ead!!
– YeongHwa Jin
Nov 21 '18 at 11:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53407468%2fwhy-is-the-size-of-npy-bigger-than-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Maybe you can share relevant parts of the code that you used to convert your csv to numpy ?
– Fredz0r
Nov 21 '18 at 8:10
1
to me it looks more like 5GB vs 13GB. Can you also post a typical line from RGB.csv? Depending on how your data looks one format can lead to smaller file sizes than another.
– ead
Nov 21 '18 at 8:11
@Fredz0r I updated code!
– YeongHwa Jin
Nov 21 '18 at 8:15
@ead yes, 5Gb and 13GB. I'm confused :( . Data is just 2D array of RGB values. I posted it.
– YeongHwa Jin
Nov 21 '18 at 8:29