OSError: [Errno 22] Invalid argument, when reading files by line using Python
I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows
with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue
However I got this OSError: [Errno 22] Invalid argument
from the line = next(fi)
line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?
Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.
The full stack trace
Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument
EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).
filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878
with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break
python file io
|
show 1 more comment
I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows
with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue
However I got this OSError: [Errno 22] Invalid argument
from the line = next(fi)
line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?
Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.
The full stack trace
Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument
EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).
filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878
with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break
python file io
1
Please add the full error traceback to your question..
– Klaus D.
Nov 20 '18 at 7:51
@KlausD. Sure it's just simply two lines likeOSError: [Errno 22] Invalid argument from line line = next(fi)
, I'll paste the full stacktrace after reproducing it. :)
– dontloo
Nov 20 '18 at 8:02
the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.
– Andrew Henle
Nov 20 '18 at 10:49
@AndrewHenle added thanks!
– dontloo
Nov 20 '18 at 13:31
What if you useline = fi.readline()
or directly iterate usingfor line in fi:
?
– myrmica
Nov 20 '18 at 14:59
|
show 1 more comment
I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows
with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue
However I got this OSError: [Errno 22] Invalid argument
from the line = next(fi)
line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?
Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.
The full stack trace
Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument
EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).
filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878
with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break
python file io
I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows
with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue
However I got this OSError: [Errno 22] Invalid argument
from the line = next(fi)
line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?
Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.
The full stack trace
Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument
EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).
filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878
with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break
python file io
python file io
edited Nov 22 '18 at 5:13
dontloo
asked Nov 20 '18 at 7:42
dontloodontloo
3,21421629
3,21421629
1
Please add the full error traceback to your question..
– Klaus D.
Nov 20 '18 at 7:51
@KlausD. Sure it's just simply two lines likeOSError: [Errno 22] Invalid argument from line line = next(fi)
, I'll paste the full stacktrace after reproducing it. :)
– dontloo
Nov 20 '18 at 8:02
the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.
– Andrew Henle
Nov 20 '18 at 10:49
@AndrewHenle added thanks!
– dontloo
Nov 20 '18 at 13:31
What if you useline = fi.readline()
or directly iterate usingfor line in fi:
?
– myrmica
Nov 20 '18 at 14:59
|
show 1 more comment
1
Please add the full error traceback to your question..
– Klaus D.
Nov 20 '18 at 7:51
@KlausD. Sure it's just simply two lines likeOSError: [Errno 22] Invalid argument from line line = next(fi)
, I'll paste the full stacktrace after reproducing it. :)
– dontloo
Nov 20 '18 at 8:02
the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.
– Andrew Henle
Nov 20 '18 at 10:49
@AndrewHenle added thanks!
– dontloo
Nov 20 '18 at 13:31
What if you useline = fi.readline()
or directly iterate usingfor line in fi:
?
– myrmica
Nov 20 '18 at 14:59
1
1
Please add the full error traceback to your question..
– Klaus D.
Nov 20 '18 at 7:51
Please add the full error traceback to your question..
– Klaus D.
Nov 20 '18 at 7:51
@KlausD. Sure it's just simply two lines like
OSError: [Errno 22] Invalid argument from line line = next(fi)
, I'll paste the full stacktrace after reproducing it. :)– dontloo
Nov 20 '18 at 8:02
@KlausD. Sure it's just simply two lines like
OSError: [Errno 22] Invalid argument from line line = next(fi)
, I'll paste the full stacktrace after reproducing it. :)– dontloo
Nov 20 '18 at 8:02
the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.
– Andrew Henle
Nov 20 '18 at 10:49
the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.
– Andrew Henle
Nov 20 '18 at 10:49
@AndrewHenle added thanks!
– dontloo
Nov 20 '18 at 13:31
@AndrewHenle added thanks!
– dontloo
Nov 20 '18 at 13:31
What if you use
line = fi.readline()
or directly iterate using for line in fi:
?– myrmica
Nov 20 '18 at 14:59
What if you use
line = fi.readline()
or directly iterate using for line in fi:
?– myrmica
Nov 20 '18 at 14:59
|
show 1 more comment
1 Answer
1
active
oldest
votes
It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388319%2foserror-errno-22-invalid-argument-when-reading-files-by-line-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.
add a comment |
It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.
add a comment |
It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.
It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.
edited Nov 21 '18 at 12:10
answered Nov 21 '18 at 9:52
dontloodontloo
3,21421629
3,21421629
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388319%2foserror-errno-22-invalid-argument-when-reading-files-by-line-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please add the full error traceback to your question..
– Klaus D.
Nov 20 '18 at 7:51
@KlausD. Sure it's just simply two lines like
OSError: [Errno 22] Invalid argument from line line = next(fi)
, I'll paste the full stacktrace after reproducing it. :)– dontloo
Nov 20 '18 at 8:02
the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.
– Andrew Henle
Nov 20 '18 at 10:49
@AndrewHenle added thanks!
– dontloo
Nov 20 '18 at 13:31
What if you use
line = fi.readline()
or directly iterate usingfor line in fi:
?– myrmica
Nov 20 '18 at 14:59