Reconstructing and files uploaded in SQL Server with python
I am working with a SQL Server database table similar to this
USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext
sample data:
USER_ID: 1
FILE_NAME: (AttachedFiles:1)=file1.pdf
FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….
Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"
My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(content_str))
...getting a TypeError: expected bytes-like object, not str
Investigating further, I found this other post and proceeded like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
encoded = content_str.encode('ascii')
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(encoded))
...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.
I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!
python sql-server
add a comment |
I am working with a SQL Server database table similar to this
USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext
sample data:
USER_ID: 1
FILE_NAME: (AttachedFiles:1)=file1.pdf
FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….
Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"
My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(content_str))
...getting a TypeError: expected bytes-like object, not str
Investigating further, I found this other post and proceeded like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
encoded = content_str.encode('ascii')
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(encoded))
...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.
I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!
python sql-server
add a comment |
I am working with a SQL Server database table similar to this
USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext
sample data:
USER_ID: 1
FILE_NAME: (AttachedFiles:1)=file1.pdf
FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….
Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"
My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(content_str))
...getting a TypeError: expected bytes-like object, not str
Investigating further, I found this other post and proceeded like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
encoded = content_str.encode('ascii')
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(encoded))
...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.
I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!
python sql-server
I am working with a SQL Server database table similar to this
USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext
sample data:
USER_ID: 1
FILE_NAME: (AttachedFiles:1)=file1.pdf
FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….
Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"
My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(content_str))
...getting a TypeError: expected bytes-like object, not str
Investigating further, I found this other post and proceeded like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
encoded = content_str.encode('ascii')
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(encoded))
...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.
I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!
python sql-server
python sql-server
edited Nov 19 '18 at 19:56
Tomalak
257k51427545
257k51427545
asked Nov 19 '18 at 16:26
DanielaCDanielaC
111
111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The value of the FILE_CONTENT
is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI"
at the start of your content string translates to the bytes 0x1f
, 0x8b
, 0x08
. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
First, try to write the stream to file without passing it throughgzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybegzip.decompress()
is not the right tool yet, it was an educated guess of mine.
– Tomalak
Nov 20 '18 at 15:29
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378886%2freconstructing-and-files-uploaded-in-sql-server-with-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The value of the FILE_CONTENT
is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI"
at the start of your content string translates to the bytes 0x1f
, 0x8b
, 0x08
. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
First, try to write the stream to file without passing it throughgzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybegzip.decompress()
is not the right tool yet, it was an educated guess of mine.
– Tomalak
Nov 20 '18 at 15:29
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
|
show 4 more comments
The value of the FILE_CONTENT
is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI"
at the start of your content string translates to the bytes 0x1f
, 0x8b
, 0x08
. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
First, try to write the stream to file without passing it throughgzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybegzip.decompress()
is not the right tool yet, it was an educated guess of mine.
– Tomalak
Nov 20 '18 at 15:29
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
|
show 4 more comments
The value of the FILE_CONTENT
is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI"
at the start of your content string translates to the bytes 0x1f
, 0x8b
, 0x08
. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))
The value of the FILE_CONTENT
is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI"
at the start of your content string translates to the bytes 0x1f
, 0x8b
, 0x08
. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))
edited Nov 19 '18 at 20:28
answered Nov 19 '18 at 20:16
TomalakTomalak
257k51427545
257k51427545
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
First, try to write the stream to file without passing it throughgzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybegzip.decompress()
is not the right tool yet, it was an educated guess of mine.
– Tomalak
Nov 20 '18 at 15:29
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
|
show 4 more comments
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
First, try to write the stream to file without passing it throughgzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybegzip.decompress()
is not the right tool yet, it was an educated guess of mine.
– Tomalak
Nov 20 '18 at 15:29
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
Thanks Tomalak! I tried your suggestion but I am now getting an "EOFError: Compressed file ended before the end-of-stream marker was reached" When investigating further, I reached some threads suggesting that the error is due to file corruption. Any further suggestions would be much appreciated.
– DanielaC
Nov 20 '18 at 15:23
First, try to write the stream to file without passing it through
gzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress()
is not the right tool yet, it was an educated guess of mine.– Tomalak
Nov 20 '18 at 15:29
First, try to write the stream to file without passing it through
gzip.decompress()
. Then try to open the resulting file with your PDF reader, just to check the off-chance that it knows what to do. If it complains, try opening the resulting file in 7zip (which can deal with all kinds of compression formats) to find out if there is anything in it at all. Maybe gzip.decompress()
is not the right tool yet, it was an educated guess of mine.– Tomalak
Nov 20 '18 at 15:29
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
I created a pdf without gzip.decompress() and failed to open it in the reader. I proceeded to change the extension of the pdf to .zip, .rar, .7z and failed to extract with 7zip. However when decompressing the .gzip the error I get is "Unexpected end of data". Thanks again!
– DanielaC
Nov 20 '18 at 15:58
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Can you upload the file you currently have somewhere? I can try and take a look at it, maybe I can figure something out. No promises though.
– Tomalak
Nov 20 '18 at 16:02
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
Thank you so much Tomalak! On my github now: github.com/dcct84/encodedfiles_test
– DanielaC
Nov 20 '18 at 16:52
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378886%2freconstructing-and-files-uploaded-in-sql-server-with-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown