C: How to read portion of a file in chunks
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have to implement for a course assignment the Huffman encryption & decryption algorithm first in the classic way, then I have to try to make it parallel using various methods (openMP
, MPI
, phtreads
). The scope of the project is not to make it necessarily faster, but to analyze the results and talk about them and why are they like that.
The serial version works perfectly. However, for the parallel version, I stumble with a reading from file problem. In the serial version, I have a pice of code that looks like this:
char *buffer = calloc(1, MAX_BUFF_SZ);
while (bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input) > 0) {
compress_chunk(buffer, t, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
This reads at most MAX_BUFF_SZ
bytes from the input file and then encrypts them. I used the memset
call for the case when bytes_read < MAX_BUFF_SZ
(maybe a cleaner solution exists though).
However, for the parallel version (using openMP for example), I want each thread to analyze only a portion of the file, but the reading to be done still in chunks. Knowing that each thread has and id thread_id
and there are at most total_threads
, I calculate the start and the end positions as following:
int slice_size = (file_size + total_threads - 1) / total_threads;
int start = slice_size * thread_id;
int end = min((thread_id + 1) * slice_size, file_size);
I can move to the start position with a simple fseek(input, start, SEEK_SET)
operation. However, I am not able to read the content in chunks. I tried with the following code (just to make sure the operation is okay):
int total_bytes = 0;
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
output
is a different file for each thread. Even when I try with just 2 threads, there are some missing characters from them. I think I am close to the right solution and I have something like an error-by-one.
So the question is: how can I read a slice of a file, but in chunks? Can you please help me identify the bug in the above code and make it work?
Edit:
If MAX_BUFF_SZ
would be bigger than the size of the input and I'll have for example 4 threads, how should a clean code look to ensure that T0
will do all the job and T1
, T2
and T3
will do nothing?
Some simple code that may be use to test the behavior is the following (note that is not from the Huffman code, is some auxiliary code to test things):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#define MAX_BUFF_SZ 32
#define min(a, b)
({ __typeof__ (a) _a = (a);
__typeof__ (b) _b = (b);
_a < _b ? _a : _b; })
int get_filesize(char *filename) {
FILE *f = fopen(filename, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fclose(f);
return size;
}
static void compress(char *filename, int id, int tt) {
int total_bytes = 0;
int bytes_read;
char *newname;
char *buffer;
FILE *output;
FILE *input;
int fsize;
int slice;
int start;
int end;
newname = (char *) malloc(strlen(filename) + 2);
sprintf(newname, "%s-%d", filename, id);
fsize = get_filesize(filename);
buffer = calloc(1, MAX_BUFF_SZ);
input = fopen(filename, "r");
output = fopen(newname, "w");
slice = (fsize + tt - 1) / tt;
end = min((id + 1) * slice, fsize);
start = slice * id;
fseek(input, start, SEEK_SET);
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
printf("%sn", buffer);
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
fclose(output);
fclose(input);
}
int main() {
omp_set_num_threads(4);
#pragma omp parallel
{
int tt = omp_get_num_threads();;
int id = omp_get_thread_num();
compress("test.txt", id, tt);
}
}
You can compile it with gcc test.c -o test -fopenmp
. You may generate a file test.txt
with some random characters, more than 32 (or change the max buffer size).
Edit 2:
Again, my problem is reading a slice of a file in chunks, not the analysis per se. I know how to do that. It's an University course, I can't just say "IO bound, end of story, analysis complete".
c file openmp chunks
|
show 6 more comments
I have to implement for a course assignment the Huffman encryption & decryption algorithm first in the classic way, then I have to try to make it parallel using various methods (openMP
, MPI
, phtreads
). The scope of the project is not to make it necessarily faster, but to analyze the results and talk about them and why are they like that.
The serial version works perfectly. However, for the parallel version, I stumble with a reading from file problem. In the serial version, I have a pice of code that looks like this:
char *buffer = calloc(1, MAX_BUFF_SZ);
while (bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input) > 0) {
compress_chunk(buffer, t, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
This reads at most MAX_BUFF_SZ
bytes from the input file and then encrypts them. I used the memset
call for the case when bytes_read < MAX_BUFF_SZ
(maybe a cleaner solution exists though).
However, for the parallel version (using openMP for example), I want each thread to analyze only a portion of the file, but the reading to be done still in chunks. Knowing that each thread has and id thread_id
and there are at most total_threads
, I calculate the start and the end positions as following:
int slice_size = (file_size + total_threads - 1) / total_threads;
int start = slice_size * thread_id;
int end = min((thread_id + 1) * slice_size, file_size);
I can move to the start position with a simple fseek(input, start, SEEK_SET)
operation. However, I am not able to read the content in chunks. I tried with the following code (just to make sure the operation is okay):
int total_bytes = 0;
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
output
is a different file for each thread. Even when I try with just 2 threads, there are some missing characters from them. I think I am close to the right solution and I have something like an error-by-one.
So the question is: how can I read a slice of a file, but in chunks? Can you please help me identify the bug in the above code and make it work?
Edit:
If MAX_BUFF_SZ
would be bigger than the size of the input and I'll have for example 4 threads, how should a clean code look to ensure that T0
will do all the job and T1
, T2
and T3
will do nothing?
Some simple code that may be use to test the behavior is the following (note that is not from the Huffman code, is some auxiliary code to test things):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#define MAX_BUFF_SZ 32
#define min(a, b)
({ __typeof__ (a) _a = (a);
__typeof__ (b) _b = (b);
_a < _b ? _a : _b; })
int get_filesize(char *filename) {
FILE *f = fopen(filename, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fclose(f);
return size;
}
static void compress(char *filename, int id, int tt) {
int total_bytes = 0;
int bytes_read;
char *newname;
char *buffer;
FILE *output;
FILE *input;
int fsize;
int slice;
int start;
int end;
newname = (char *) malloc(strlen(filename) + 2);
sprintf(newname, "%s-%d", filename, id);
fsize = get_filesize(filename);
buffer = calloc(1, MAX_BUFF_SZ);
input = fopen(filename, "r");
output = fopen(newname, "w");
slice = (fsize + tt - 1) / tt;
end = min((id + 1) * slice, fsize);
start = slice * id;
fseek(input, start, SEEK_SET);
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
printf("%sn", buffer);
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
fclose(output);
fclose(input);
}
int main() {
omp_set_num_threads(4);
#pragma omp parallel
{
int tt = omp_get_num_threads();;
int id = omp_get_thread_num();
compress("test.txt", id, tt);
}
}
You can compile it with gcc test.c -o test -fopenmp
. You may generate a file test.txt
with some random characters, more than 32 (or change the max buffer size).
Edit 2:
Again, my problem is reading a slice of a file in chunks, not the analysis per se. I know how to do that. It's an University course, I can't just say "IO bound, end of story, analysis complete".
c file openmp chunks
Threading the read will not make it any faster. Totally useless.
– n.m.
Nov 22 '18 at 20:01
The scope of the project is not to make it faster, but to analyze the results and talk about why it is not faster. Also, in the example I'm threading the read only for debug purposes, the real version does also the encryption in parallel, so each thread will encrypt a piece of the file and then I'll merge them. Please read the entire post :)
– Adrian Pop
Nov 22 '18 at 20:03
1
buffer[diff] = '';
- this is wrong. Think of whentotal_bytes
is exactly equal toend
. Thediff
will be zero. So, then you want to keep the whole buffer, and you also want to write it to the output file, which you don't at the moment.
– kfx
Nov 22 '18 at 20:21
1
Also, you want to comparetotal_bytes + start
to theend
, not justtotal_bytes
(assuming you did the initialfseek
).
– kfx
Nov 22 '18 at 20:22
@kfx Yeah, I forgot to add thefwrite
in thatif
(to write the remaining bytes); I also changed the checking to be likeint diff = total_bytes - end; buffer[total_bytes - diff] = '';
but there are still some problems.
– Adrian Pop
Nov 22 '18 at 20:25
|
show 6 more comments
I have to implement for a course assignment the Huffman encryption & decryption algorithm first in the classic way, then I have to try to make it parallel using various methods (openMP
, MPI
, phtreads
). The scope of the project is not to make it necessarily faster, but to analyze the results and talk about them and why are they like that.
The serial version works perfectly. However, for the parallel version, I stumble with a reading from file problem. In the serial version, I have a pice of code that looks like this:
char *buffer = calloc(1, MAX_BUFF_SZ);
while (bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input) > 0) {
compress_chunk(buffer, t, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
This reads at most MAX_BUFF_SZ
bytes from the input file and then encrypts them. I used the memset
call for the case when bytes_read < MAX_BUFF_SZ
(maybe a cleaner solution exists though).
However, for the parallel version (using openMP for example), I want each thread to analyze only a portion of the file, but the reading to be done still in chunks. Knowing that each thread has and id thread_id
and there are at most total_threads
, I calculate the start and the end positions as following:
int slice_size = (file_size + total_threads - 1) / total_threads;
int start = slice_size * thread_id;
int end = min((thread_id + 1) * slice_size, file_size);
I can move to the start position with a simple fseek(input, start, SEEK_SET)
operation. However, I am not able to read the content in chunks. I tried with the following code (just to make sure the operation is okay):
int total_bytes = 0;
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
output
is a different file for each thread. Even when I try with just 2 threads, there are some missing characters from them. I think I am close to the right solution and I have something like an error-by-one.
So the question is: how can I read a slice of a file, but in chunks? Can you please help me identify the bug in the above code and make it work?
Edit:
If MAX_BUFF_SZ
would be bigger than the size of the input and I'll have for example 4 threads, how should a clean code look to ensure that T0
will do all the job and T1
, T2
and T3
will do nothing?
Some simple code that may be use to test the behavior is the following (note that is not from the Huffman code, is some auxiliary code to test things):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#define MAX_BUFF_SZ 32
#define min(a, b)
({ __typeof__ (a) _a = (a);
__typeof__ (b) _b = (b);
_a < _b ? _a : _b; })
int get_filesize(char *filename) {
FILE *f = fopen(filename, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fclose(f);
return size;
}
static void compress(char *filename, int id, int tt) {
int total_bytes = 0;
int bytes_read;
char *newname;
char *buffer;
FILE *output;
FILE *input;
int fsize;
int slice;
int start;
int end;
newname = (char *) malloc(strlen(filename) + 2);
sprintf(newname, "%s-%d", filename, id);
fsize = get_filesize(filename);
buffer = calloc(1, MAX_BUFF_SZ);
input = fopen(filename, "r");
output = fopen(newname, "w");
slice = (fsize + tt - 1) / tt;
end = min((id + 1) * slice, fsize);
start = slice * id;
fseek(input, start, SEEK_SET);
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
printf("%sn", buffer);
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
fclose(output);
fclose(input);
}
int main() {
omp_set_num_threads(4);
#pragma omp parallel
{
int tt = omp_get_num_threads();;
int id = omp_get_thread_num();
compress("test.txt", id, tt);
}
}
You can compile it with gcc test.c -o test -fopenmp
. You may generate a file test.txt
with some random characters, more than 32 (or change the max buffer size).
Edit 2:
Again, my problem is reading a slice of a file in chunks, not the analysis per se. I know how to do that. It's an University course, I can't just say "IO bound, end of story, analysis complete".
c file openmp chunks
I have to implement for a course assignment the Huffman encryption & decryption algorithm first in the classic way, then I have to try to make it parallel using various methods (openMP
, MPI
, phtreads
). The scope of the project is not to make it necessarily faster, but to analyze the results and talk about them and why are they like that.
The serial version works perfectly. However, for the parallel version, I stumble with a reading from file problem. In the serial version, I have a pice of code that looks like this:
char *buffer = calloc(1, MAX_BUFF_SZ);
while (bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input) > 0) {
compress_chunk(buffer, t, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
This reads at most MAX_BUFF_SZ
bytes from the input file and then encrypts them. I used the memset
call for the case when bytes_read < MAX_BUFF_SZ
(maybe a cleaner solution exists though).
However, for the parallel version (using openMP for example), I want each thread to analyze only a portion of the file, but the reading to be done still in chunks. Knowing that each thread has and id thread_id
and there are at most total_threads
, I calculate the start and the end positions as following:
int slice_size = (file_size + total_threads - 1) / total_threads;
int start = slice_size * thread_id;
int end = min((thread_id + 1) * slice_size, file_size);
I can move to the start position with a simple fseek(input, start, SEEK_SET)
operation. However, I am not able to read the content in chunks. I tried with the following code (just to make sure the operation is okay):
int total_bytes = 0;
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
output
is a different file for each thread. Even when I try with just 2 threads, there are some missing characters from them. I think I am close to the right solution and I have something like an error-by-one.
So the question is: how can I read a slice of a file, but in chunks? Can you please help me identify the bug in the above code and make it work?
Edit:
If MAX_BUFF_SZ
would be bigger than the size of the input and I'll have for example 4 threads, how should a clean code look to ensure that T0
will do all the job and T1
, T2
and T3
will do nothing?
Some simple code that may be use to test the behavior is the following (note that is not from the Huffman code, is some auxiliary code to test things):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <omp.h>
#define MAX_BUFF_SZ 32
#define min(a, b)
({ __typeof__ (a) _a = (a);
__typeof__ (b) _b = (b);
_a < _b ? _a : _b; })
int get_filesize(char *filename) {
FILE *f = fopen(filename, "r");
fseek(f, 0L, SEEK_END);
int size = ftell(f);
fclose(f);
return size;
}
static void compress(char *filename, int id, int tt) {
int total_bytes = 0;
int bytes_read;
char *newname;
char *buffer;
FILE *output;
FILE *input;
int fsize;
int slice;
int start;
int end;
newname = (char *) malloc(strlen(filename) + 2);
sprintf(newname, "%s-%d", filename, id);
fsize = get_filesize(filename);
buffer = calloc(1, MAX_BUFF_SZ);
input = fopen(filename, "r");
output = fopen(newname, "w");
slice = (fsize + tt - 1) / tt;
end = min((id + 1) * slice, fsize);
start = slice * id;
fseek(input, start, SEEK_SET);
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
printf("%sn", buffer);
if (total_bytes >= end) {
int diff = total_bytes - end;
buffer[diff] = '';
break;
}
fwrite(buffer, 1, bytes_read, output);
memset(buffer, 0, MAX_BUFF_SZ);
}
fclose(output);
fclose(input);
}
int main() {
omp_set_num_threads(4);
#pragma omp parallel
{
int tt = omp_get_num_threads();;
int id = omp_get_thread_num();
compress("test.txt", id, tt);
}
}
You can compile it with gcc test.c -o test -fopenmp
. You may generate a file test.txt
with some random characters, more than 32 (or change the max buffer size).
Edit 2:
Again, my problem is reading a slice of a file in chunks, not the analysis per se. I know how to do that. It's an University course, I can't just say "IO bound, end of story, analysis complete".
c file openmp chunks
c file openmp chunks
edited Nov 23 '18 at 9:07
Adrian Pop
asked Nov 22 '18 at 19:57
Adrian PopAdrian Pop
1,17441323
1,17441323
Threading the read will not make it any faster. Totally useless.
– n.m.
Nov 22 '18 at 20:01
The scope of the project is not to make it faster, but to analyze the results and talk about why it is not faster. Also, in the example I'm threading the read only for debug purposes, the real version does also the encryption in parallel, so each thread will encrypt a piece of the file and then I'll merge them. Please read the entire post :)
– Adrian Pop
Nov 22 '18 at 20:03
1
buffer[diff] = '';
- this is wrong. Think of whentotal_bytes
is exactly equal toend
. Thediff
will be zero. So, then you want to keep the whole buffer, and you also want to write it to the output file, which you don't at the moment.
– kfx
Nov 22 '18 at 20:21
1
Also, you want to comparetotal_bytes + start
to theend
, not justtotal_bytes
(assuming you did the initialfseek
).
– kfx
Nov 22 '18 at 20:22
@kfx Yeah, I forgot to add thefwrite
in thatif
(to write the remaining bytes); I also changed the checking to be likeint diff = total_bytes - end; buffer[total_bytes - diff] = '';
but there are still some problems.
– Adrian Pop
Nov 22 '18 at 20:25
|
show 6 more comments
Threading the read will not make it any faster. Totally useless.
– n.m.
Nov 22 '18 at 20:01
The scope of the project is not to make it faster, but to analyze the results and talk about why it is not faster. Also, in the example I'm threading the read only for debug purposes, the real version does also the encryption in parallel, so each thread will encrypt a piece of the file and then I'll merge them. Please read the entire post :)
– Adrian Pop
Nov 22 '18 at 20:03
1
buffer[diff] = '';
- this is wrong. Think of whentotal_bytes
is exactly equal toend
. Thediff
will be zero. So, then you want to keep the whole buffer, and you also want to write it to the output file, which you don't at the moment.
– kfx
Nov 22 '18 at 20:21
1
Also, you want to comparetotal_bytes + start
to theend
, not justtotal_bytes
(assuming you did the initialfseek
).
– kfx
Nov 22 '18 at 20:22
@kfx Yeah, I forgot to add thefwrite
in thatif
(to write the remaining bytes); I also changed the checking to be likeint diff = total_bytes - end; buffer[total_bytes - diff] = '';
but there are still some problems.
– Adrian Pop
Nov 22 '18 at 20:25
Threading the read will not make it any faster. Totally useless.
– n.m.
Nov 22 '18 at 20:01
Threading the read will not make it any faster. Totally useless.
– n.m.
Nov 22 '18 at 20:01
The scope of the project is not to make it faster, but to analyze the results and talk about why it is not faster. Also, in the example I'm threading the read only for debug purposes, the real version does also the encryption in parallel, so each thread will encrypt a piece of the file and then I'll merge them. Please read the entire post :)
– Adrian Pop
Nov 22 '18 at 20:03
The scope of the project is not to make it faster, but to analyze the results and talk about why it is not faster. Also, in the example I'm threading the read only for debug purposes, the real version does also the encryption in parallel, so each thread will encrypt a piece of the file and then I'll merge them. Please read the entire post :)
– Adrian Pop
Nov 22 '18 at 20:03
1
1
buffer[diff] = '';
- this is wrong. Think of when total_bytes
is exactly equal to end
. The diff
will be zero. So, then you want to keep the whole buffer, and you also want to write it to the output file, which you don't at the moment.– kfx
Nov 22 '18 at 20:21
buffer[diff] = '';
- this is wrong. Think of when total_bytes
is exactly equal to end
. The diff
will be zero. So, then you want to keep the whole buffer, and you also want to write it to the output file, which you don't at the moment.– kfx
Nov 22 '18 at 20:21
1
1
Also, you want to compare
total_bytes + start
to the end
, not just total_bytes
(assuming you did the initial fseek
).– kfx
Nov 22 '18 at 20:22
Also, you want to compare
total_bytes + start
to the end
, not just total_bytes
(assuming you did the initial fseek
).– kfx
Nov 22 '18 at 20:22
@kfx Yeah, I forgot to add the
fwrite
in that if
(to write the remaining bytes); I also changed the checking to be like int diff = total_bytes - end; buffer[total_bytes - diff] = '';
but there are still some problems.– Adrian Pop
Nov 22 '18 at 20:25
@kfx Yeah, I forgot to add the
fwrite
in that if
(to write the remaining bytes); I also changed the checking to be like int diff = total_bytes - end; buffer[total_bytes - diff] = '';
but there are still some problems.– Adrian Pop
Nov 22 '18 at 20:25
|
show 6 more comments
1 Answer
1
active
oldest
votes
Apparently I just had to take a pen and a paper and make a little scheme. After playing around with some indices, I came out with the following code (encbuff
and written_bits
are some auxiliary variables I use, since I am actually writing bits to a file and I use an intermediary buffer to limit the writes):
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (start + total_bytes > end) {
int diff = start + total_bytes - end;
buffer[bytes_read - diff] = '';
compress_chunk(buffer, t, output, encbuff, &written_bits);
break;
}
compress_chunk(buffer, t, output, encbuff, &written_bits);
memset(buffer, 0, MAX_BUFF_SZ);
}
I also finished implementing the openMP version. For small files the serial one is faster, but starting from 25+MB, the parallel one starts to beats the serial one with about 35-45%. Thank you all for the advice.
Cheers!
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437381%2fc-how-to-read-portion-of-a-file-in-chunks%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Apparently I just had to take a pen and a paper and make a little scheme. After playing around with some indices, I came out with the following code (encbuff
and written_bits
are some auxiliary variables I use, since I am actually writing bits to a file and I use an intermediary buffer to limit the writes):
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (start + total_bytes > end) {
int diff = start + total_bytes - end;
buffer[bytes_read - diff] = '';
compress_chunk(buffer, t, output, encbuff, &written_bits);
break;
}
compress_chunk(buffer, t, output, encbuff, &written_bits);
memset(buffer, 0, MAX_BUFF_SZ);
}
I also finished implementing the openMP version. For small files the serial one is faster, but starting from 25+MB, the parallel one starts to beats the serial one with about 35-45%. Thank you all for the advice.
Cheers!
add a comment |
Apparently I just had to take a pen and a paper and make a little scheme. After playing around with some indices, I came out with the following code (encbuff
and written_bits
are some auxiliary variables I use, since I am actually writing bits to a file and I use an intermediary buffer to limit the writes):
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (start + total_bytes > end) {
int diff = start + total_bytes - end;
buffer[bytes_read - diff] = '';
compress_chunk(buffer, t, output, encbuff, &written_bits);
break;
}
compress_chunk(buffer, t, output, encbuff, &written_bits);
memset(buffer, 0, MAX_BUFF_SZ);
}
I also finished implementing the openMP version. For small files the serial one is faster, but starting from 25+MB, the parallel one starts to beats the serial one with about 35-45%. Thank you all for the advice.
Cheers!
add a comment |
Apparently I just had to take a pen and a paper and make a little scheme. After playing around with some indices, I came out with the following code (encbuff
and written_bits
are some auxiliary variables I use, since I am actually writing bits to a file and I use an intermediary buffer to limit the writes):
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (start + total_bytes > end) {
int diff = start + total_bytes - end;
buffer[bytes_read - diff] = '';
compress_chunk(buffer, t, output, encbuff, &written_bits);
break;
}
compress_chunk(buffer, t, output, encbuff, &written_bits);
memset(buffer, 0, MAX_BUFF_SZ);
}
I also finished implementing the openMP version. For small files the serial one is faster, but starting from 25+MB, the parallel one starts to beats the serial one with about 35-45%. Thank you all for the advice.
Cheers!
Apparently I just had to take a pen and a paper and make a little scheme. After playing around with some indices, I came out with the following code (encbuff
and written_bits
are some auxiliary variables I use, since I am actually writing bits to a file and I use an intermediary buffer to limit the writes):
while ((bytes_read = fread(buffer, 1, MAX_BUFF_SZ, input)) > 0) {
total_bytes += bytes_read;
if (start + total_bytes > end) {
int diff = start + total_bytes - end;
buffer[bytes_read - diff] = '';
compress_chunk(buffer, t, output, encbuff, &written_bits);
break;
}
compress_chunk(buffer, t, output, encbuff, &written_bits);
memset(buffer, 0, MAX_BUFF_SZ);
}
I also finished implementing the openMP version. For small files the serial one is faster, but starting from 25+MB, the parallel one starts to beats the serial one with about 35-45%. Thank you all for the advice.
Cheers!
answered Nov 22 '18 at 23:02
Adrian PopAdrian Pop
1,17441323
1,17441323
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437381%2fc-how-to-read-portion-of-a-file-in-chunks%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Threading the read will not make it any faster. Totally useless.
– n.m.
Nov 22 '18 at 20:01
The scope of the project is not to make it faster, but to analyze the results and talk about why it is not faster. Also, in the example I'm threading the read only for debug purposes, the real version does also the encryption in parallel, so each thread will encrypt a piece of the file and then I'll merge them. Please read the entire post :)
– Adrian Pop
Nov 22 '18 at 20:03
1
buffer[diff] = '';
- this is wrong. Think of whentotal_bytes
is exactly equal toend
. Thediff
will be zero. So, then you want to keep the whole buffer, and you also want to write it to the output file, which you don't at the moment.– kfx
Nov 22 '18 at 20:21
1
Also, you want to compare
total_bytes + start
to theend
, not justtotal_bytes
(assuming you did the initialfseek
).– kfx
Nov 22 '18 at 20:22
@kfx Yeah, I forgot to add the
fwrite
in thatif
(to write the remaining bytes); I also changed the checking to be likeint diff = total_bytes - end; buffer[total_bytes - diff] = '';
but there are still some problems.– Adrian Pop
Nov 22 '18 at 20:25