Merging non-zero differing bytes in large files
I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso
, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso
that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows
file1.iso file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5
file3.iso
should be as follows:
0
1
2
3
4
5
The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04
Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l
as intermediate data but want to avoid writing code.
files iso binary merge comparison
|
show 2 more comments
I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso
, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso
that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows
file1.iso file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5
file3.iso
should be as follows:
0
1
2
3
4
5
The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04
Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l
as intermediate data but want to avoid writing code.
files iso binary merge comparison
Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37
1
Which tool did you use to read from the DVD disk (and write to iso files)?ddrescue
in the program packagegddrescue
is a good alternative. See the info pageinfo ddrescue
. There is alsodvdisaster
. These programs may give you better data than you have now.
– sudodus
Nov 28 at 20:51
@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56
@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57
Good luck withddrescue
:-) and please let us know about your progress.
– sudodus
Nov 29 at 5:12
|
show 2 more comments
I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso
, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso
that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows
file1.iso file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5
file3.iso
should be as follows:
0
1
2
3
4
5
The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04
Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l
as intermediate data but want to avoid writing code.
files iso binary merge comparison
I am trying to a salvage an old scratched DVD by ripping it to ISO. I have two readers and created an ISO from each. Each reader is unable to read certain different bytes of the DVD and replaces them with 0s. When I compare the files using cmp -l file1.iso file2.iso
, I see that certain bytes on the left are 0 while certain other bytes on the right are 0 (the corresponding bytes on the other files are non-zero). I want to create a 3rd file, say file3.iso
that merges the non-zero differing bytes from the above two files. As an example, assume for simplicity that each file has 6 bytes as follows
file1.iso file2.iso
--------- ---------
0 0
1 1
2 0
3 0
0 4
0 5
file3.iso
should be as follows:
0
1
2
3
4
5
The files are quite large (around 8GB). Each file has the same number of bytes. I am using Ubuntu 16.04
Can anyone suggest the easiest way to do what I want. I can use the output of cmp -l
as intermediate data but want to avoid writing code.
files iso binary merge comparison
files iso binary merge comparison
edited Nov 28 at 20:33
asked Nov 28 at 20:25
Jus12
12617
12617
Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37
1
Which tool did you use to read from the DVD disk (and write to iso files)?ddrescue
in the program packagegddrescue
is a good alternative. See the info pageinfo ddrescue
. There is alsodvdisaster
. These programs may give you better data than you have now.
– sudodus
Nov 28 at 20:51
@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56
@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57
Good luck withddrescue
:-) and please let us know about your progress.
– sudodus
Nov 29 at 5:12
|
show 2 more comments
Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37
1
Which tool did you use to read from the DVD disk (and write to iso files)?ddrescue
in the program packagegddrescue
is a good alternative. See the info pageinfo ddrescue
. There is alsodvdisaster
. These programs may give you better data than you have now.
– sudodus
Nov 28 at 20:51
@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56
@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57
Good luck withddrescue
:-) and please let us know about your progress.
– sudodus
Nov 29 at 5:12
Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37
Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37
1
1
Which tool did you use to read from the DVD disk (and write to iso files)?
ddrescue
in the program package gddrescue
is a good alternative. See the info page info ddrescue
. There is also dvdisaster
. These programs may give you better data than you have now.– sudodus
Nov 28 at 20:51
Which tool did you use to read from the DVD disk (and write to iso files)?
ddrescue
in the program package gddrescue
is a good alternative. See the info page info ddrescue
. There is also dvdisaster
. These programs may give you better data than you have now.– sudodus
Nov 28 at 20:51
@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56
@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56
@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57
@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57
Good luck with
ddrescue
:-) and please let us know about your progress.– sudodus
Nov 29 at 5:12
Good luck with
ddrescue
:-) and please let us know about your progress.– sudodus
Nov 29 at 5:12
|
show 2 more comments
1 Answer
1
active
oldest
votes
I wrote a Python script for you.
#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''
import sys
args = sys.argv[1:]
file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')
def get_bytes(file):
'''Return a generator that yields each byte in the given file.'''
def get_byte():
return file.read(1)
return iter(get_byte, b'')
for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
if byte1 == byte2:
byte_out = byte1
elif ord(byte1) == 0:
byte_out = byte2
elif ord(byte2) == 0:
byte_out = byte1
else:
msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
raise ValueError(msg.format(i, byte1, byte2))
file_out.write(byte_out)
Make it executable then call it like so:
$ ./test.py file1.iso file2.iso file3.iso
Or for short:
$ ./test.py file{1,2,3}.iso
P.s. I've recently been studying reading files in different ways, so this is nice serendipity.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1096922%2fmerging-non-zero-differing-bytes-in-large-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I wrote a Python script for you.
#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''
import sys
args = sys.argv[1:]
file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')
def get_bytes(file):
'''Return a generator that yields each byte in the given file.'''
def get_byte():
return file.read(1)
return iter(get_byte, b'')
for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
if byte1 == byte2:
byte_out = byte1
elif ord(byte1) == 0:
byte_out = byte2
elif ord(byte2) == 0:
byte_out = byte1
else:
msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
raise ValueError(msg.format(i, byte1, byte2))
file_out.write(byte_out)
Make it executable then call it like so:
$ ./test.py file1.iso file2.iso file3.iso
Or for short:
$ ./test.py file{1,2,3}.iso
P.s. I've recently been studying reading files in different ways, so this is nice serendipity.
add a comment |
I wrote a Python script for you.
#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''
import sys
args = sys.argv[1:]
file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')
def get_bytes(file):
'''Return a generator that yields each byte in the given file.'''
def get_byte():
return file.read(1)
return iter(get_byte, b'')
for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
if byte1 == byte2:
byte_out = byte1
elif ord(byte1) == 0:
byte_out = byte2
elif ord(byte2) == 0:
byte_out = byte1
else:
msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
raise ValueError(msg.format(i, byte1, byte2))
file_out.write(byte_out)
Make it executable then call it like so:
$ ./test.py file1.iso file2.iso file3.iso
Or for short:
$ ./test.py file{1,2,3}.iso
P.s. I've recently been studying reading files in different ways, so this is nice serendipity.
add a comment |
I wrote a Python script for you.
#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''
import sys
args = sys.argv[1:]
file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')
def get_bytes(file):
'''Return a generator that yields each byte in the given file.'''
def get_byte():
return file.read(1)
return iter(get_byte, b'')
for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
if byte1 == byte2:
byte_out = byte1
elif ord(byte1) == 0:
byte_out = byte2
elif ord(byte2) == 0:
byte_out = byte1
else:
msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
raise ValueError(msg.format(i, byte1, byte2))
file_out.write(byte_out)
Make it executable then call it like so:
$ ./test.py file1.iso file2.iso file3.iso
Or for short:
$ ./test.py file{1,2,3}.iso
P.s. I've recently been studying reading files in different ways, so this is nice serendipity.
I wrote a Python script for you.
#!/usr/bin/env python3
'''
Given two input files and one output file, merge the input files on
matching bytes or bytes that are null in one file but not the other.
Non-matching non-null bytes will raise a ValueError.
'''
import sys
args = sys.argv[1:]
file1 = open(args[0], 'rb')
file2 = open(args[1], 'rb')
file_out = open(args[2], 'wb')
def get_bytes(file):
'''Return a generator that yields each byte in the given file.'''
def get_byte():
return file.read(1)
return iter(get_byte, b'')
for i, (byte1, byte2) in enumerate(zip(get_bytes(file1), get_bytes(file2))):
if byte1 == byte2:
byte_out = byte1
elif ord(byte1) == 0:
byte_out = byte2
elif ord(byte2) == 0:
byte_out = byte1
else:
msg = 'Bytes at {:#x} are both non-zero and do not match: {}, {}'
raise ValueError(msg.format(i, byte1, byte2))
file_out.write(byte_out)
Make it executable then call it like so:
$ ./test.py file1.iso file2.iso file3.iso
Or for short:
$ ./test.py file{1,2,3}.iso
P.s. I've recently been studying reading files in different ways, so this is nice serendipity.
edited Nov 29 at 5:29
answered Nov 28 at 21:42
wjandrea
8,26842259
8,26842259
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1096922%2fmerging-non-zero-differing-bytes-in-large-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are there any non-zero bytes that differ? say in file1.iso you have a 0x1 while in file2.iso you have a 0x2
– wjandrea
Nov 28 at 20:37
1
Which tool did you use to read from the DVD disk (and write to iso files)?
ddrescue
in the program packagegddrescue
is a good alternative. See the info pageinfo ddrescue
. There is alsodvdisaster
. These programs may give you better data than you have now.– sudodus
Nov 28 at 20:51
@wjandrea No. Only zero bytes differ.
– Jus12
Nov 28 at 21:56
@sudodus I used "disks" program to rip the DVD. I will look at ddrescue. Thanks for the reference.
– Jus12
Nov 28 at 21:57
Good luck with
ddrescue
:-) and please let us know about your progress.– sudodus
Nov 29 at 5:12