Merging non-zero differing bytes in large files












2














I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows



file1.iso   file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5


file3.iso should be as follows:



0
1
2
3
4
5


The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04



Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l as intermediate data but want to avoid writing code.










share|improve this question
























  • Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
    – wjandrea
    Nov 28 at 20:37






  • 1




    Which tool did you use to read from the DVD disk (and write to iso files)? ddrescue in the program package gddrescue is a good alternative. See the info page info ddrescue. There is also dvdisaster. These programs may give you better data than you have now.
    – sudodus
    Nov 28 at 20:51












  • @wjandrea No. Only zero bytes differ.
    – Jus12
    Nov 28 at 21:56










  • @sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
    – Jus12
    Nov 28 at 21:57










  • Good luck with ddrescue:-) and please let us know about your progress.
    – sudodus
    Nov 29 at 5:12
















2














I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows



file1.iso   file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5


file3.iso should be as follows:



0
1
2
3
4
5


The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04



Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l as intermediate data but want to avoid writing code.










share|improve this question
























  • Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
    – wjandrea
    Nov 28 at 20:37






  • 1




    Which tool did you use to read from the DVD disk (and write to iso files)? ddrescue in the program package gddrescue is a good alternative. See the info page info ddrescue. There is also dvdisaster. These programs may give you better data than you have now.
    – sudodus
    Nov 28 at 20:51












  • @wjandrea No. Only zero bytes differ.
    – Jus12
    Nov 28 at 21:56










  • @sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
    – Jus12
    Nov 28 at 21:57










  • Good luck with ddrescue:-) and please let us know about your progress.
    – sudodus
    Nov 29 at 5:12














2












2








2







I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows



file1.iso   file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5


file3.iso should be as follows:



0
1
2
3
4
5


The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04



Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l as intermediate data but want to avoid writing code.










share|improve this question















I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows



file1.iso   file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5


file3.iso should be as follows:



0
1
2
3
4
5


The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04



Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l as intermediate data but want to avoid writing code.







files iso binary merge comparison






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 28 at 20:33

























asked Nov 28 at 20:25









Jus12

12617




12617












  • Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
    – wjandrea
    Nov 28 at 20:37






  • 1




    Which tool did you use to read from the DVD disk (and write to iso files)? ddrescue in the program package gddrescue is a good alternative. See the info page info ddrescue. There is also dvdisaster. These programs may give you better data than you have now.
    – sudodus
    Nov 28 at 20:51












  • @wjandrea No. Only zero bytes differ.
    – Jus12
    Nov 28 at 21:56










  • @sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
    – Jus12
    Nov 28 at 21:57










  • Good luck with ddrescue:-) and please let us know about your progress.
    – sudodus
    Nov 29 at 5:12


















  • Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
    – wjandrea
    Nov 28 at 20:37






  • 1




    Which tool did you use to read from the DVD disk (and write to iso files)? ddrescue in the program package gddrescue is a good alternative. See the info page info ddrescue. There is also dvdisaster. These programs may give you better data than you have now.
    – sudodus
    Nov 28 at 20:51












  • @wjandrea No. Only zero bytes differ.
    – Jus12
    Nov 28 at 21:56










  • @sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
    – Jus12
    Nov 28 at 21:57










  • Good luck with ddrescue:-) and please let us know about your progress.
    – sudodus
    Nov 29 at 5:12
















Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37




Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37




1




1




Which tool did you use to read from the DVD disk (and write to iso files)? ddrescue in the program package gddrescue is a good alternative. See the info page info ddrescue. There is also dvdisaster. These programs may give you better data than you have now.
– sudodus
Nov 28 at 20:51






Which tool did you use to read from the DVD disk (and write to iso files)? ddrescue in the program package gddrescue is a good alternative. See the info page info ddrescue. There is also dvdisaster. These programs may give you better data than you have now.
– sudodus
Nov 28 at 20:51














@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56




@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56












@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57




@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57












Good luck with ddrescue:-) and please let us know about your progress.
– sudodus
Nov 29 at 5:12




Good luck with ddrescue:-) and please let us know about your progress.
– sudodus
Nov 29 at 5:12










1 Answer
1






active

oldest

votes


















2














I wrote a Python script for you.



#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''

import sys

args = sys.argv[1:]

file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')

def get_bytes(file):
'''Return a generator that yields each byte in the given file.'''
def get_byte():
return file.read(1)
return iter(get_byte, b'')

for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
if byte1 == byte2:
byte_out = byte1
elif ord(byte1) == 0:
byte_out = byte2
elif ord(byte2) == 0:
byte_out = byte1
else:
msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
raise ValueError(msg.format(i, byte1, byte2))
file_out.write(byte_out)


Make it executable then call it like so:



$ ./test.py file1.iso file2.iso file3.iso


Or for short:



$ ./test.py file{1,2,3}.iso


P.s. I've recently been studying reading files in different ways, so this is nice serendipity.






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "89"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1096922%2fmerging-non-zero-differing-bytes-in-large-files%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    I wrote a Python script for you.



    #!/usr/bin/env python3
    '''
    Given two input files and one output file, merge the input files on
    matching bytes or bytes that are null in one file but not the other.
    Non-matching non-null bytes will raise a ValueError.
    '''

    import sys

    args = sys.argv[1:]

    file1 = open(args[0], 'rb')
    file2 = open(args[1], 'rb')
    file_out = open(args[2], 'wb')

    def get_bytes(file):
    '''Return a generator that yields each byte in the given file.'''
    def get_byte():
    return file.read(1)
    return iter(get_byte, b'')

    for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
    if byte1 == byte2:
    byte_out = byte1
    elif ord(byte1) == 0:
    byte_out = byte2
    elif ord(byte2) == 0:
    byte_out = byte1
    else:
    msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
    raise ValueError(msg.format(i, byte1, byte2))
    file_out.write(byte_out)


    Make it executable then call it like so:



    $ ./test.py file1.iso file2.iso file3.iso


    Or for short:



    $ ./test.py file{1,2,3}.iso


    P.s. I've recently been studying reading files in different ways, so this is nice serendipity.






    share|improve this answer




























      2














      I wrote a Python script for you.



      #!/usr/bin/env python3
      '''
      Given two input files and one output file, merge the input files on
      matching bytes or bytes that are null in one file but not the other.
      Non-matching non-null bytes will raise a ValueError.
      '''

      import sys

      args = sys.argv[1:]

      file1 = open(args[0], 'rb')
      file2 = open(args[1], 'rb')
      file_out = open(args[2], 'wb')

      def get_bytes(file):
      '''Return a generator that yields each byte in the given file.'''
      def get_byte():
      return file.read(1)
      return iter(get_byte, b'')

      for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
      if byte1 == byte2:
      byte_out = byte1
      elif ord(byte1) == 0:
      byte_out = byte2
      elif ord(byte2) == 0:
      byte_out = byte1
      else:
      msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
      raise ValueError(msg.format(i, byte1, byte2))
      file_out.write(byte_out)


      Make it executable then call it like so:



      $ ./test.py file1.iso file2.iso file3.iso


      Or for short:



      $ ./test.py file{1,2,3}.iso


      P.s. I've recently been studying reading files in different ways, so this is nice serendipity.






      share|improve this answer


























        2












        2








        2






        I wrote a Python script for you.



        #!/usr/bin/env python3
        '''
        Given two input files and one output file, merge the input files on
        matching bytes or bytes that are null in one file but not the other.
        Non-matching non-null bytes will raise a ValueError.
        '''

        import sys

        args = sys.argv[1:]

        file1 = open(args[0], 'rb')
        file2 = open(args[1], 'rb')
        file_out = open(args[2], 'wb')

        def get_bytes(file):
        '''Return a generator that yields each byte in the given file.'''
        def get_byte():
        return file.read(1)
        return iter(get_byte, b'')

        for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
        if byte1 == byte2:
        byte_out = byte1
        elif ord(byte1) == 0:
        byte_out = byte2
        elif ord(byte2) == 0:
        byte_out = byte1
        else:
        msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
        raise ValueError(msg.format(i, byte1, byte2))
        file_out.write(byte_out)


        Make it executable then call it like so:



        $ ./test.py file1.iso file2.iso file3.iso


        Or for short:



        $ ./test.py file{1,2,3}.iso


        P.s. I've recently been studying reading files in different ways, so this is nice serendipity.






        share|improve this answer














        I wrote a Python script for you.



        #!/usr/bin/env python3
        '''
        Given two input files and one output file, merge the input files on
        matching bytes or bytes that are null in one file but not the other.
        Non-matching non-null bytes will raise a ValueError.
        '''

        import sys

        args = sys.argv[1:]

        file1 = open(args[0], 'rb')
        file2 = open(args[1], 'rb')
        file_out = open(args[2], 'wb')

        def get_bytes(file):
        '''Return a generator that yields each byte in the given file.'''
        def get_byte():
        return file.read(1)
        return iter(get_byte, b'')

        for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
        if byte1 == byte2:
        byte_out = byte1
        elif ord(byte1) == 0:
        byte_out = byte2
        elif ord(byte2) == 0:
        byte_out = byte1
        else:
        msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
        raise ValueError(msg.format(i, byte1, byte2))
        file_out.write(byte_out)


        Make it executable then call it like so:



        $ ./test.py file1.iso file2.iso file3.iso


        Or for short:



        $ ./test.py file{1,2,3}.iso


        P.s. I've recently been studying reading files in different ways, so this is nice serendipity.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 29 at 5:29

























        answered Nov 28 at 21:42









        wjandrea

        8,26842259




        8,26842259






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Ask Ubuntu!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1096922%2fmerging-non-zero-differing-bytes-in-large-files%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Can I use Tabulator js library in my java Spring + Thymeleaf project?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents