OSError: [Errno 22] Invalid argument, when reading files by line using Python












1















I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows



with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue


However I got this OSError: [Errno 22] Invalid argument from the line = next(fi) line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?



Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.



The full stack trace



Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument




EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).



filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878

with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue

with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break









share|improve this question




















  • 1





    Please add the full error traceback to your question..

    – Klaus D.
    Nov 20 '18 at 7:51











  • @KlausD. Sure it's just simply two lines like OSError: [Errno 22] Invalid argument from line line = next(fi), I'll paste the full stacktrace after reproducing it. :)

    – dontloo
    Nov 20 '18 at 8:02











  • the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.

    – Andrew Henle
    Nov 20 '18 at 10:49











  • @AndrewHenle added thanks!

    – dontloo
    Nov 20 '18 at 13:31











  • What if you use line = fi.readline() or directly iterate using for line in fi:?

    – myrmica
    Nov 20 '18 at 14:59
















1















I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows



with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue


However I got this OSError: [Errno 22] Invalid argument from the line = next(fi) line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?



Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.



The full stack trace



Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument




EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).



filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878

with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue

with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break









share|improve this question




















  • 1





    Please add the full error traceback to your question..

    – Klaus D.
    Nov 20 '18 at 7:51











  • @KlausD. Sure it's just simply two lines like OSError: [Errno 22] Invalid argument from line line = next(fi), I'll paste the full stacktrace after reproducing it. :)

    – dontloo
    Nov 20 '18 at 8:02











  • the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.

    – Andrew Henle
    Nov 20 '18 at 10:49











  • @AndrewHenle added thanks!

    – dontloo
    Nov 20 '18 at 13:31











  • What if you use line = fi.readline() or directly iterate using for line in fi:?

    – myrmica
    Nov 20 '18 at 14:59














1












1








1








I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows



with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue


However I got this OSError: [Errno 22] Invalid argument from the line = next(fi) line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?



Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.



The full stack trace



Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument




EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).



filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878

with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue

with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break









share|improve this question
















I'm trying to get rid of some error lines in a very large file (200GB on Windows), the code is basically as follows



with open(filename, encoding='utf-8') as fi:
with open(outputfile, 'w', encoding='utf-8') as fo:
while True:
try:
line = next(fi)
fo.write(line)
except UnicodeDecodeError:
line = next(fi)
continue


However I got this OSError: [Errno 22] Invalid argument from the line = next(fi) line inside the try block after having processed about 30GB data. What is the possible reason for this? how could I handle it?



Also I noticed weird memory allocation behavior around this point, it first blows up to full memory size then went down to its original level, I don't know if this is relevant or just coincidence though.



The full stack trace



Traceback (most recent call last):
File "C:/Users/concat_split_files.py", line 23, in <module>
line = next(fi)
OSError: [Errno 22] Invalid argument




EDIT Here's the exact code, the main difference is that since I have already successfully checked part of the file, I skipped the first few lines (30GB).



filename = r"data.tsv"
outputfile2 = r"p2.tsv"
line_no = 306878

with open(filename, encoding='utf-8') as fi:
for _ in range(line_no):
try:
next(fi)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue

with open(outputfile2, 'w', encoding='utf-8') as fo2:
while True:
try:
line = next(fi)
fo2.write(line)
except UnicodeDecodeError:
line = next(fi)
print(line)
continue
except StopIteration:
break






python file io






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 5:13







dontloo

















asked Nov 20 '18 at 7:42









dontloodontloo

3,21421629




3,21421629








  • 1





    Please add the full error traceback to your question..

    – Klaus D.
    Nov 20 '18 at 7:51











  • @KlausD. Sure it's just simply two lines like OSError: [Errno 22] Invalid argument from line line = next(fi), I'll paste the full stacktrace after reproducing it. :)

    – dontloo
    Nov 20 '18 at 8:02











  • the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.

    – Andrew Henle
    Nov 20 '18 at 10:49











  • @AndrewHenle added thanks!

    – dontloo
    Nov 20 '18 at 13:31











  • What if you use line = fi.readline() or directly iterate using for line in fi:?

    – myrmica
    Nov 20 '18 at 14:59














  • 1





    Please add the full error traceback to your question..

    – Klaus D.
    Nov 20 '18 at 7:51











  • @KlausD. Sure it's just simply two lines like OSError: [Errno 22] Invalid argument from line line = next(fi), I'll paste the full stacktrace after reproducing it. :)

    – dontloo
    Nov 20 '18 at 8:02











  • the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.

    – Andrew Henle
    Nov 20 '18 at 10:49











  • @AndrewHenle added thanks!

    – dontloo
    Nov 20 '18 at 13:31











  • What if you use line = fi.readline() or directly iterate using for line in fi:?

    – myrmica
    Nov 20 '18 at 14:59








1




1





Please add the full error traceback to your question..

– Klaus D.
Nov 20 '18 at 7:51





Please add the full error traceback to your question..

– Klaus D.
Nov 20 '18 at 7:51













@KlausD. Sure it's just simply two lines like OSError: [Errno 22] Invalid argument from line line = next(fi), I'll paste the full stacktrace after reproducing it. :)

– dontloo
Nov 20 '18 at 8:02





@KlausD. Sure it's just simply two lines like OSError: [Errno 22] Invalid argument from line line = next(fi), I'll paste the full stacktrace after reproducing it. :)

– dontloo
Nov 20 '18 at 8:02













the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.

– Andrew Henle
Nov 20 '18 at 10:49





the code is basically as follows Please post your exact code. If you're having "weird memory allocatoin behavior" when the process fails, the exact code will be necessary so others can try to reproduce the problem.

– Andrew Henle
Nov 20 '18 at 10:49













@AndrewHenle added thanks!

– dontloo
Nov 20 '18 at 13:31





@AndrewHenle added thanks!

– dontloo
Nov 20 '18 at 13:31













What if you use line = fi.readline() or directly iterate using for line in fi:?

– myrmica
Nov 20 '18 at 14:59





What if you use line = fi.readline() or directly iterate using for line in fi:?

– myrmica
Nov 20 '18 at 14:59












1 Answer
1






active

oldest

votes


















1














It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388319%2foserror-errno-22-invalid-argument-when-reading-files-by-line-using-python%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.






    share|improve this answer






























      1














      It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.






      share|improve this answer




























        1












        1








        1







        It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.






        share|improve this answer















        It turned out the file was incomplete. So I guess at some point the actual data ends without a line break, then the program just kept reading the file until the memory blew up and threw this error.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 21 '18 at 12:10

























        answered Nov 21 '18 at 9:52









        dontloodontloo

        3,21421629




        3,21421629
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388319%2foserror-errno-22-invalid-argument-when-reading-files-by-line-using-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?