Scan many pages straight into a PDF

Multi tool use

Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

asked Oct 3 '10 at 8:50

pupeno

7262711

Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27

@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49

add a comment |

Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

asked Oct 3 '10 at 8:50

pupeno

7262711

Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27

@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49

add a comment |

Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

asked Oct 3 '10 at 8:50

pupeno

7262711

Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?

pdf scanning

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

asked Oct 3 '10 at 8:50

pupeno

7262711

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

asked Oct 3 '10 at 8:50

pupeno

7262711

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

edited Oct 3 '10 at 10:11

Marcel Stimberg

26k63944

asked Oct 3 '10 at 8:50

pupeno

7262711

asked Oct 3 '10 at 8:50

pupeno

7262711

asked Oct 3 '10 at 8:50

pupeno

7262711

Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27

@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49

add a comment |

Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27

@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49

Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27

@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49

add a comment |

6 Answers
6

active

oldest

votes

The idea of having a simple scan utility was behind the development of, well, Simple Scan - the scanning tool installed by default from 10.04 on (Applications ‣ Graphics ‣ Simple Scan).
alt text

Simply scan as many pages as you want and choose PDF as file format when saving.

Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
alt text

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

3

+1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

– 8128
Oct 3 '10 at 14:56

add a comment |

"Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

Personally I see xsane as far from easy to use...

– 8128
Oct 3 '10 at 15:05

I've been using xsane all this time. It never occurred to me that there might be a better tool.

– Amanda
Jun 8 '11 at 14:24

add a comment |

I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.

Upon seeing this question I went looking and found gscan2pdf living in the Ubuntu Lucid/Maverick repositories. It uses the same scanning (libsane) engine but the UI is far more Gnome-ish. For a good time, try:

sudo apt-get install gscan2pdf

answered Oct 3 '10 at 10:11

msw

4,19611826

add a comment |

Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.

Tested on Ubuntu 14.04, Simple Scan 3.12.1.

This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.

This feature is documented on Help > Contents:

From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.

It says that the following formats are supported:

JPEG

TIFF

Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.

edited Feb 25 '16 at 22:14

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

add a comment |

Scan pages from USB scanner. Use tesseract to OCR into a PDF.
Merge multiple pages into one PDF.
Usage: scan2PDF outputfilename number_of_pages

#!/bin/bash

#scan2PDF

#Requires:      tesseract 3.03 for OCR to PDF

#               scanimage for scanning, I use  1.0.24

#               pdfunite to merge multiple PDF into one, I use 0.26.5

#

#       Use scanimage -L to get a list of devices.

#       e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner

#       then copy/paste genesys:libusb:006:003 into SCANNER below.

#       play with CONTRAST to get good images

DPI=300

TESS_LANG=nor  #Language that Tesseract uses for OCR

SCANNER=genesys:libusb:006:003  #My USB scanner

CONTRAST=35   #Contrast to remove paper look



FILENAME=$1 #Agrument 1,filename

PAGES=$2    #Argument 2, number of pages



re='^[0-9]+$'  #Check if second argument is a number

if ! [[ ${PAGES} =~ $re ]] ; then

   echo "error: Usage: $0 filename number_of_pages" >&2; exit 1

fi



SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files

TMP_DIR=${SCRIPT_NAME}-tmp



if [ -d ${TMP_DIR} ]  #Check if it exists a directory already

then

        echo Error: The directory ${TMP_DIR} exists.

        exit 2

fi

mkdir ${TMP_DIR}  #Make and go to temp dir

cd ${TMP_DIR}



echo Starts Scanimage...

scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES}  --batch-prompt





echo Starts Tesseract OCR



for file in  *.tif  #Goes through every tif file in temp dir

do

        tesseract $file  ${file%.tif} -l ${TESS_LANG} pdf



done



if [ "$PAGES" = "1" ] #How many pages

then

    cp out1.pdf ../${FILENAME}.pdf  #Only one page, just copy the PDF back

else

        for file in *.pdf  #More pages, merge the pages into one PDF and copy back

    do

            pdfuniteargs+=${file} 

            pdfuniteargs+=" "

    done

    pdfunite $pdfuniteargs ../${FILENAME}.pdf

fi

    echo ${FILENAME}.pdf done



rm *                    #Done, clean up

cd ..

rmdir ${TMP_DIR}

edited Feb 12 '16 at 14:08

answered Feb 12 '16 at 13:52

morten

112

it is a very Linuxoidal method

– rth
Oct 5 '18 at 11:21

add a comment |

For those of you wishing to use XSANE. It is very powerful, and intuitive once you read the setup guide linked from Help > XSane Doc in the program - to know how much you can do with it. It's also worth checking your SANE backend is working properly (not too Arch specific): https://wiki.archlinux.org/index.php/SANE

If you want to automatically scan documents from a feeder, and wonder if XSane will know when to stop (and not stop too early), simply input a number at the top left (number of scans icon) larger than the number of pages that fit in your feeder. I.e. if your feeder can take 10 pages, then enter 15 (to account for thickness variation). If you have a duplex scanner, double this number.

When the feeder runs out, you will get a dialog box with a green warning triangle saying ""Scanned pages: 0". This just means that the feeder is empty and you can close the dialog. If you selected "viewer" or "save" at the top right of XSane, then the files will all be there - remember to save them from the viewer. Now you can press scan again to carry on where you left off, with the numbers incrementing from the same point or you can start a new project. There will not be any blank pages added. If you selected "Multipage" the project dialog should show all the completed scans and you can click to save as a multipage PDF or TIFF or PostScript.

HTH,

answered Dec 9 '18 at 7:25

user901387

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f5112%2fscan-many-pages-straight-into-a-pdf%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Simply scan as many pages as you want and choose PDF as file format when saving.

Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
alt text

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

3

+1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

– 8128
Oct 3 '10 at 14:56

add a comment |

Simply scan as many pages as you want and choose PDF as file format when saving.

Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
alt text

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

3

+1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

– 8128
Oct 3 '10 at 14:56

add a comment |

Simply scan as many pages as you want and choose PDF as file format when saving.

Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
alt text

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

Simply scan as many pages as you want and choose PDF as file format when saving.

Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
alt text

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

answered Oct 3 '10 at 10:10

Marcel Stimberg

26k63944

3

+1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

– 8128
Oct 3 '10 at 14:56

add a comment |

3

+1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

– 8128
Oct 3 '10 at 14:56

+1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

– 8128
Oct 3 '10 at 14:56

add a comment |

"Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

Personally I see xsane as far from easy to use...

– 8128
Oct 3 '10 at 15:05

I've been using xsane all this time. It never occurred to me that there might be a better tool.

– Amanda
Jun 8 '11 at 14:24

add a comment |

"Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

Personally I see xsane as far from easy to use...

– 8128
Oct 3 '10 at 15:05

I've been using xsane all this time. It never occurred to me that there might be a better tool.

– Amanda
Jun 8 '11 at 14:24

add a comment |

"Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

"Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

answered Oct 3 '10 at 9:40

Karl Bielefeldt

72348

Personally I see xsane as far from easy to use...

– 8128
Oct 3 '10 at 15:05

I've been using xsane all this time. It never occurred to me that there might be a better tool.

– Amanda
Jun 8 '11 at 14:24

add a comment |

Personally I see xsane as far from easy to use...

– 8128
Oct 3 '10 at 15:05

I've been using xsane all this time. It never occurred to me that there might be a better tool.

– Amanda
Jun 8 '11 at 14:24

Personally I see xsane as far from easy to use...

– 8128
Oct 3 '10 at 15:05

I've been using xsane all this time. It never occurred to me that there might be a better tool.

– Amanda
Jun 8 '11 at 14:24

add a comment |

I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.

sudo apt-get install gscan2pdf

answered Oct 3 '10 at 10:11

msw

4,19611826

add a comment |

I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.

sudo apt-get install gscan2pdf

answered Oct 3 '10 at 10:11

msw

4,19611826

add a comment |

I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.

sudo apt-get install gscan2pdf

answered Oct 3 '10 at 10:11

msw

4,19611826

I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.

sudo apt-get install gscan2pdf

answered Oct 3 '10 at 10:11

msw

4,19611826

answered Oct 3 '10 at 10:11

msw

4,19611826

answered Oct 3 '10 at 10:11

msw

4,19611826

answered Oct 3 '10 at 10:11

msw

4,19611826

add a comment |

Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.

Tested on Ubuntu 14.04, Simple Scan 3.12.1.

This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.

This feature is documented on Help > Contents:

From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.

It says that the following formats are supported:

JPEG

TIFF

Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.

edited Feb 25 '16 at 22:14

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

add a comment |

Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.

Tested on Ubuntu 14.04, Simple Scan 3.12.1.

This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.

This feature is documented on Help > Contents:

From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.

It says that the following formats are supported:

JPEG

TIFF

Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.

edited Feb 25 '16 at 22:14

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

add a comment |

Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.

Tested on Ubuntu 14.04, Simple Scan 3.12.1.

This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.

This feature is documented on Help > Contents:

From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.

It says that the following formats are supported:

JPEG

TIFF

Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.

edited Feb 25 '16 at 22:14

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.

Tested on Ubuntu 14.04, Simple Scan 3.12.1.

This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.

This feature is documented on Help > Contents:

From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.

It says that the following formats are supported:

JPEG

TIFF

Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.

edited Feb 25 '16 at 22:14

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

edited Feb 25 '16 at 22:14

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

answered Aug 18 '15 at 10:31

Ciro Santilli 新疆改造中心六四事件法轮功

9,26444346

add a comment |

Scan pages from USB scanner. Use tesseract to OCR into a PDF.
Merge multiple pages into one PDF.
Usage: scan2PDF outputfilename number_of_pages

#!/bin/bash

#scan2PDF

#Requires:      tesseract 3.03 for OCR to PDF

#               scanimage for scanning, I use  1.0.24

#               pdfunite to merge multiple PDF into one, I use 0.26.5

#

#       Use scanimage -L to get a list of devices.

#       e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner

#       then copy/paste genesys:libusb:006:003 into SCANNER below.

#       play with CONTRAST to get good images

DPI=300

TESS_LANG=nor  #Language that Tesseract uses for OCR

SCANNER=genesys:libusb:006:003  #My USB scanner

CONTRAST=35   #Contrast to remove paper look



FILENAME=$1 #Agrument 1,filename

PAGES=$2    #Argument 2, number of pages



re='^[0-9]+$'  #Check if second argument is a number

if ! [[ ${PAGES} =~ $re ]] ; then

   echo "error: Usage: $0 filename number_of_pages" >&2; exit 1

fi



SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files

TMP_DIR=${SCRIPT_NAME}-tmp



if [ -d ${TMP_DIR} ]  #Check if it exists a directory already

then

        echo Error: The directory ${TMP_DIR} exists.

        exit 2

fi

mkdir ${TMP_DIR}  #Make and go to temp dir

cd ${TMP_DIR}



echo Starts Scanimage...

scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES}  --batch-prompt





echo Starts Tesseract OCR



for file in  *.tif  #Goes through every tif file in temp dir

do

        tesseract $file  ${file%.tif} -l ${TESS_LANG} pdf



done



if [ "$PAGES" = "1" ] #How many pages

then

    cp out1.pdf ../${FILENAME}.pdf  #Only one page, just copy the PDF back

else

        for file in *.pdf  #More pages, merge the pages into one PDF and copy back

    do

            pdfuniteargs+=${file} 

            pdfuniteargs+=" "

    done

    pdfunite $pdfuniteargs ../${FILENAME}.pdf

fi

    echo ${FILENAME}.pdf done



rm *                    #Done, clean up

cd ..

rmdir ${TMP_DIR}

edited Feb 12 '16 at 14:08

answered Feb 12 '16 at 13:52

morten

112

it is a very Linuxoidal method

– rth
Oct 5 '18 at 11:21

add a comment |

Scan pages from USB scanner. Use tesseract to OCR into a PDF.
Merge multiple pages into one PDF.
Usage: scan2PDF outputfilename number_of_pages

#!/bin/bash

#scan2PDF

#Requires:      tesseract 3.03 for OCR to PDF

#               scanimage for scanning, I use  1.0.24

#               pdfunite to merge multiple PDF into one, I use 0.26.5

#

#       Use scanimage -L to get a list of devices.

#       e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner

#       then copy/paste genesys:libusb:006:003 into SCANNER below.

#       play with CONTRAST to get good images

DPI=300

TESS_LANG=nor  #Language that Tesseract uses for OCR

SCANNER=genesys:libusb:006:003  #My USB scanner

CONTRAST=35   #Contrast to remove paper look



FILENAME=$1 #Agrument 1,filename

PAGES=$2    #Argument 2, number of pages



re='^[0-9]+$'  #Check if second argument is a number

if ! [[ ${PAGES} =~ $re ]] ; then

   echo "error: Usage: $0 filename number_of_pages" >&2; exit 1

fi



SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files

TMP_DIR=${SCRIPT_NAME}-tmp



if [ -d ${TMP_DIR} ]  #Check if it exists a directory already

then

        echo Error: The directory ${TMP_DIR} exists.

        exit 2

fi

mkdir ${TMP_DIR}  #Make and go to temp dir

cd ${TMP_DIR}



echo Starts Scanimage...

scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES}  --batch-prompt





echo Starts Tesseract OCR



for file in  *.tif  #Goes through every tif file in temp dir

do

        tesseract $file  ${file%.tif} -l ${TESS_LANG} pdf



done



if [ "$PAGES" = "1" ] #How many pages

then

    cp out1.pdf ../${FILENAME}.pdf  #Only one page, just copy the PDF back

else

        for file in *.pdf  #More pages, merge the pages into one PDF and copy back

    do

            pdfuniteargs+=${file} 

            pdfuniteargs+=" "

    done

    pdfunite $pdfuniteargs ../${FILENAME}.pdf

fi

    echo ${FILENAME}.pdf done



rm *                    #Done, clean up

cd ..

rmdir ${TMP_DIR}

edited Feb 12 '16 at 14:08

answered Feb 12 '16 at 13:52

morten

112

it is a very Linuxoidal method

– rth
Oct 5 '18 at 11:21

add a comment |

Scan pages from USB scanner. Use tesseract to OCR into a PDF.
Merge multiple pages into one PDF.
Usage: scan2PDF outputfilename number_of_pages

#!/bin/bash

#scan2PDF

#Requires:      tesseract 3.03 for OCR to PDF

#               scanimage for scanning, I use  1.0.24

#               pdfunite to merge multiple PDF into one, I use 0.26.5

#

#       Use scanimage -L to get a list of devices.

#       e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner

#       then copy/paste genesys:libusb:006:003 into SCANNER below.

#       play with CONTRAST to get good images

DPI=300

TESS_LANG=nor  #Language that Tesseract uses for OCR

SCANNER=genesys:libusb:006:003  #My USB scanner

CONTRAST=35   #Contrast to remove paper look



FILENAME=$1 #Agrument 1,filename

PAGES=$2    #Argument 2, number of pages



re='^[0-9]+$'  #Check if second argument is a number

if ! [[ ${PAGES} =~ $re ]] ; then

   echo "error: Usage: $0 filename number_of_pages" >&2; exit 1

fi



SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files

TMP_DIR=${SCRIPT_NAME}-tmp



if [ -d ${TMP_DIR} ]  #Check if it exists a directory already

then

        echo Error: The directory ${TMP_DIR} exists.

        exit 2

fi

mkdir ${TMP_DIR}  #Make and go to temp dir

cd ${TMP_DIR}



echo Starts Scanimage...

scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES}  --batch-prompt





echo Starts Tesseract OCR



for file in  *.tif  #Goes through every tif file in temp dir

do

        tesseract $file  ${file%.tif} -l ${TESS_LANG} pdf



done



if [ "$PAGES" = "1" ] #How many pages

then

    cp out1.pdf ../${FILENAME}.pdf  #Only one page, just copy the PDF back

else

        for file in *.pdf  #More pages, merge the pages into one PDF and copy back

    do

            pdfuniteargs+=${file} 

            pdfuniteargs+=" "

    done

    pdfunite $pdfuniteargs ../${FILENAME}.pdf

fi

    echo ${FILENAME}.pdf done



rm *                    #Done, clean up

cd ..

rmdir ${TMP_DIR}

edited Feb 12 '16 at 14:08

answered Feb 12 '16 at 13:52

morten

112

Scan pages from USB scanner. Use tesseract to OCR into a PDF.
Merge multiple pages into one PDF.
Usage: scan2PDF outputfilename number_of_pages

#!/bin/bash

#scan2PDF

#Requires:      tesseract 3.03 for OCR to PDF

#               scanimage for scanning, I use  1.0.24

#               pdfunite to merge multiple PDF into one, I use 0.26.5

#

#       Use scanimage -L to get a list of devices.

#       e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner

#       then copy/paste genesys:libusb:006:003 into SCANNER below.

#       play with CONTRAST to get good images

DPI=300

TESS_LANG=nor  #Language that Tesseract uses for OCR

SCANNER=genesys:libusb:006:003  #My USB scanner

CONTRAST=35   #Contrast to remove paper look



FILENAME=$1 #Agrument 1,filename

PAGES=$2    #Argument 2, number of pages



re='^[0-9]+$'  #Check if second argument is a number

if ! [[ ${PAGES} =~ $re ]] ; then

   echo "error: Usage: $0 filename number_of_pages" >&2; exit 1

fi



SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files

TMP_DIR=${SCRIPT_NAME}-tmp



if [ -d ${TMP_DIR} ]  #Check if it exists a directory already

then

        echo Error: The directory ${TMP_DIR} exists.

        exit 2

fi

mkdir ${TMP_DIR}  #Make and go to temp dir

cd ${TMP_DIR}



echo Starts Scanimage...

scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES}  --batch-prompt





echo Starts Tesseract OCR



for file in  *.tif  #Goes through every tif file in temp dir

do

        tesseract $file  ${file%.tif} -l ${TESS_LANG} pdf



done



if [ "$PAGES" = "1" ] #How many pages

then

    cp out1.pdf ../${FILENAME}.pdf  #Only one page, just copy the PDF back

else

        for file in *.pdf  #More pages, merge the pages into one PDF and copy back

    do

            pdfuniteargs+=${file} 

            pdfuniteargs+=" "

    done

    pdfunite $pdfuniteargs ../${FILENAME}.pdf

fi

    echo ${FILENAME}.pdf done



rm *                    #Done, clean up

cd ..

rmdir ${TMP_DIR}

edited Feb 12 '16 at 14:08

answered Feb 12 '16 at 13:52

morten

112

edited Feb 12 '16 at 14:08

answered Feb 12 '16 at 13:52

morten

112

answered Feb 12 '16 at 13:52

morten

112

answered Feb 12 '16 at 13:52

morten

112

it is a very Linuxoidal method

– rth
Oct 5 '18 at 11:21

add a comment |

it is a very Linuxoidal method

– rth
Oct 5 '18 at 11:21

it is a very Linuxoidal method

– rth
Oct 5 '18 at 11:21

add a comment |

HTH,

answered Dec 9 '18 at 7:25

user901387

add a comment |

HTH,

answered Dec 9 '18 at 7:25

user901387

add a comment |

HTH,

answered Dec 9 '18 at 7:25

user901387

HTH,

answered Dec 9 '18 at 7:25

user901387

answered Dec 9 '18 at 7:25

user901387

answered Dec 9 '18 at 7:25

user901387

answered Dec 9 '18 at 7:25

user901387

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

FhuId 516mwdFmcMHC,t,HQO,AOsn2 LzvxVYDUpvaLOHcDK CVVve1x4a0iC,a dqsb v

搜尋此網誌

Cfrgtkky