Scan many pages straight into a PDF












34















Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?










share|improve this question

























  • Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

    – JFW
    Oct 3 '10 at 11:27











  • @JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

    – poolie
    Apr 17 '11 at 22:49
















34















Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?










share|improve this question

























  • Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

    – JFW
    Oct 3 '10 at 11:27











  • @JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

    – poolie
    Apr 17 '11 at 22:49














34












34








34


6






Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?










share|improve this question
















Is there some easy to use program in Ubuntu that can scan many pages straight into a PDF file?







pdf scanning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 3 '10 at 10:11









Marcel Stimberg

26k63944




26k63944










asked Oct 3 '10 at 8:50









pupenopupeno

7262711




7262711













  • Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

    – JFW
    Oct 3 '10 at 11:27











  • @JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

    – poolie
    Apr 17 '11 at 22:49



















  • Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

    – JFW
    Oct 3 '10 at 11:27











  • @JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

    – poolie
    Apr 17 '11 at 22:49

















Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27





Just wondering, are there any special qualifications needed for scanners/printers that I would like to use in Ubuntu?

– JFW
Oct 3 '10 at 11:27













@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49





@JFW, here's a list of supported devices for XSane, the back-end used by most Ubuntu scanners. HP printer/scanner/copiers seem like a good reliable choice, if you're looking.

– poolie
Apr 17 '11 at 22:49










6 Answers
6






active

oldest

votes


















37














The idea of having a simple scan utility was behind the development of, well, Simple Scan - the scanning tool installed by default from 10.04 on (Applications ‣ Graphics ‣ Simple Scan).
alt text



Simply scan as many pages as you want and choose PDF as file format when saving.



Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
alt text






share|improve this answer



















  • 3





    +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

    – 8128
    Oct 3 '10 at 14:56



















5














"Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.






share|improve this answer
























  • Personally I see xsane as far from easy to use...

    – 8128
    Oct 3 '10 at 15:05











  • I've been using xsane all this time. It never occurred to me that there might be a better tool.

    – Amanda
    Jun 8 '11 at 14:24



















3














I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.



Upon seeing this question I went looking and found gscan2pdf living in the Ubuntu Lucid/Maverick repositories. It uses the same scanning (libsane) engine but the UI is far more Gnome-ish. For a good time, try:



sudo apt-get install gscan2pdf





share|improve this answer































    3














    Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.



    Tested on Ubuntu 14.04, Simple Scan 3.12.1.



    This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.



    This feature is documented on Help > Contents:




    From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.




    It says that the following formats are supported:




    • PDF

    • JPEG

    • PNG

    • TIFF


    Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.






    share|improve this answer

































      1














      Scan pages from USB scanner. Use tesseract to OCR into a PDF.
      Merge multiple pages into one PDF.
      Usage: scan2PDF outputfilename number_of_pages



      #!/bin/bash
      #scan2PDF
      #Requires: tesseract 3.03 for OCR to PDF
      # scanimage for scanning, I use 1.0.24
      # pdfunite to merge multiple PDF into one, I use 0.26.5
      #
      # Use scanimage -L to get a list of devices.
      # e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner
      # then copy/paste genesys:libusb:006:003 into SCANNER below.
      # play with CONTRAST to get good images
      DPI=300
      TESS_LANG=nor #Language that Tesseract uses for OCR
      SCANNER=genesys:libusb:006:003 #My USB scanner
      CONTRAST=35 #Contrast to remove paper look

      FILENAME=$1 #Agrument 1,filename
      PAGES=$2 #Argument 2, number of pages

      re='^[0-9]+$' #Check if second argument is a number
      if ! [[ ${PAGES} =~ $re ]] ; then
      echo "error: Usage: $0 filename number_of_pages" >&2; exit 1
      fi

      SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files
      TMP_DIR=${SCRIPT_NAME}-tmp

      if [ -d ${TMP_DIR} ] #Check if it exists a directory already
      then
      echo Error: The directory ${TMP_DIR} exists.
      exit 2
      fi
      mkdir ${TMP_DIR} #Make and go to temp dir
      cd ${TMP_DIR}

      echo Starts Scanimage...
      scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES} --batch-prompt


      echo Starts Tesseract OCR

      for file in *.tif #Goes through every tif file in temp dir
      do
      tesseract $file ${file%.tif} -l ${TESS_LANG} pdf

      done

      if [ "$PAGES" = "1" ] #How many pages
      then
      cp out1.pdf ../${FILENAME}.pdf #Only one page, just copy the PDF back
      else
      for file in *.pdf #More pages, merge the pages into one PDF and copy back
      do
      pdfuniteargs+=${file}
      pdfuniteargs+=" "
      done
      pdfunite $pdfuniteargs ../${FILENAME}.pdf
      fi
      echo ${FILENAME}.pdf done

      rm * #Done, clean up
      cd ..
      rmdir ${TMP_DIR}





      share|improve this answer


























      • it is a very Linuxoidal method

        – rth
        Oct 5 '18 at 11:21



















      0














      For those of you wishing to use XSANE. It is very powerful, and intuitive once you read the setup guide linked from Help > XSane Doc in the program - to know how much you can do with it. It's also worth checking your SANE backend is working properly (not too Arch specific): https://wiki.archlinux.org/index.php/SANE



      If you want to automatically scan documents from a feeder, and wonder if XSane will know when to stop (and not stop too early), simply input a number at the top left (number of scans icon) larger than the number of pages that fit in your feeder. I.e. if your feeder can take 10 pages, then enter 15 (to account for thickness variation). If you have a duplex scanner, double this number.



      When the feeder runs out, you will get a dialog box with a green warning triangle saying ""Scanned pages: 0". This just means that the feeder is empty and you can close the dialog. If you selected "viewer" or "save" at the top right of XSane, then the files will all be there - remember to save them from the viewer. Now you can press scan again to carry on where you left off, with the numbers incrementing from the same point or you can start a new project. There will not be any blank pages added. If you selected "Multipage" the project dialog should show all the completed scans and you can click to save as a multipage PDF or TIFF or PostScript.



      HTH,



      DC






      share|improve this answer























        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "89"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f5112%2fscan-many-pages-straight-into-a-pdf%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        37














        The idea of having a simple scan utility was behind the development of, well, Simple Scan - the scanning tool installed by default from 10.04 on (Applications ‣ Graphics ‣ Simple Scan).
        alt text



        Simply scan as many pages as you want and choose PDF as file format when saving.



        Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
        alt text






        share|improve this answer



















        • 3





          +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

          – 8128
          Oct 3 '10 at 14:56
















        37














        The idea of having a simple scan utility was behind the development of, well, Simple Scan - the scanning tool installed by default from 10.04 on (Applications ‣ Graphics ‣ Simple Scan).
        alt text



        Simply scan as many pages as you want and choose PDF as file format when saving.



        Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
        alt text






        share|improve this answer



















        • 3





          +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

          – 8128
          Oct 3 '10 at 14:56














        37












        37








        37







        The idea of having a simple scan utility was behind the development of, well, Simple Scan - the scanning tool installed by default from 10.04 on (Applications ‣ Graphics ‣ Simple Scan).
        alt text



        Simply scan as many pages as you want and choose PDF as file format when saving.



        Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
        alt text






        share|improve this answer













        The idea of having a simple scan utility was behind the development of, well, Simple Scan - the scanning tool installed by default from 10.04 on (Applications ‣ Graphics ‣ Simple Scan).
        alt text



        Simply scan as many pages as you want and choose PDF as file format when saving.



        Another slightly less simple program that offers additional features like text recognition is gscan2pdf, also in the repositories.
        alt text







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 3 '10 at 10:10









        Marcel StimbergMarcel Stimberg

        26k63944




        26k63944








        • 3





          +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

          – 8128
          Oct 3 '10 at 14:56














        • 3





          +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

          – 8128
          Oct 3 '10 at 14:56








        3




        3





        +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

        – 8128
        Oct 3 '10 at 14:56





        +1 for Simple Scan - it's so easy and simple, but very powerful too - it's particularly suited to the job you mentioned.

        – 8128
        Oct 3 '10 at 14:56













        5














        "Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.






        share|improve this answer
























        • Personally I see xsane as far from easy to use...

          – 8128
          Oct 3 '10 at 15:05











        • I've been using xsane all this time. It never occurred to me that there might be a better tool.

          – Amanda
          Jun 8 '11 at 14:24
















        5














        "Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.






        share|improve this answer
























        • Personally I see xsane as far from easy to use...

          – 8128
          Oct 3 '10 at 15:05











        • I've been using xsane all this time. It never occurred to me that there might be a better tool.

          – Amanda
          Jun 8 '11 at 14:24














        5












        5








        5







        "Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.






        share|improve this answer













        "Easy to use" is in the eye of the user, but xsane provides this functionality. Choose multipage where it says viewer (or hit CTRL-M), and it shouldn't be too difficult to figure out from there.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Oct 3 '10 at 9:40









        Karl BielefeldtKarl Bielefeldt

        72348




        72348













        • Personally I see xsane as far from easy to use...

          – 8128
          Oct 3 '10 at 15:05











        • I've been using xsane all this time. It never occurred to me that there might be a better tool.

          – Amanda
          Jun 8 '11 at 14:24



















        • Personally I see xsane as far from easy to use...

          – 8128
          Oct 3 '10 at 15:05











        • I've been using xsane all this time. It never occurred to me that there might be a better tool.

          – Amanda
          Jun 8 '11 at 14:24

















        Personally I see xsane as far from easy to use...

        – 8128
        Oct 3 '10 at 15:05





        Personally I see xsane as far from easy to use...

        – 8128
        Oct 3 '10 at 15:05













        I've been using xsane all this time. It never occurred to me that there might be a better tool.

        – Amanda
        Jun 8 '11 at 14:24





        I've been using xsane all this time. It never occurred to me that there might be a better tool.

        – Amanda
        Jun 8 '11 at 14:24











        3














        I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.



        Upon seeing this question I went looking and found gscan2pdf living in the Ubuntu Lucid/Maverick repositories. It uses the same scanning (libsane) engine but the UI is far more Gnome-ish. For a good time, try:



        sudo apt-get install gscan2pdf





        share|improve this answer




























          3














          I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.



          Upon seeing this question I went looking and found gscan2pdf living in the Ubuntu Lucid/Maverick repositories. It uses the same scanning (libsane) engine but the UI is far more Gnome-ish. For a good time, try:



          sudo apt-get install gscan2pdf





          share|improve this answer


























            3












            3








            3







            I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.



            Upon seeing this question I went looking and found gscan2pdf living in the Ubuntu Lucid/Maverick repositories. It uses the same scanning (libsane) engine but the UI is far more Gnome-ish. For a good time, try:



            sudo apt-get install gscan2pdf





            share|improve this answer













            I was using xsane until I saw this question and considered its interface idiosyncratic to say the least, but effective.



            Upon seeing this question I went looking and found gscan2pdf living in the Ubuntu Lucid/Maverick repositories. It uses the same scanning (libsane) engine but the UI is far more Gnome-ish. For a good time, try:



            sudo apt-get install gscan2pdf






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Oct 3 '10 at 10:11









            mswmsw

            4,19611826




            4,19611826























                3














                Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.



                Tested on Ubuntu 14.04, Simple Scan 3.12.1.



                This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.



                This feature is documented on Help > Contents:




                From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.




                It says that the following formats are supported:




                • PDF

                • JPEG

                • PNG

                • TIFF


                Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.






                share|improve this answer






























                  3














                  Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.



                  Tested on Ubuntu 14.04, Simple Scan 3.12.1.



                  This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.



                  This feature is documented on Help > Contents:




                  From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.




                  It says that the following formats are supported:




                  • PDF

                  • JPEG

                  • PNG

                  • TIFF


                  Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.






                  share|improve this answer




























                    3












                    3








                    3







                    Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.



                    Tested on Ubuntu 14.04, Simple Scan 3.12.1.



                    This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.



                    This feature is documented on Help > Contents:




                    From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.




                    It says that the following formats are supported:




                    • PDF

                    • JPEG

                    • PNG

                    • TIFF


                    Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.






                    share|improve this answer















                    Change the file name from myfile.jpg to myfile.pdf on the save dialog of Simple Scan.



                    Tested on Ubuntu 14.04, Simple Scan 3.12.1.



                    This works even though the file type drop-down does not show "PDF", only "Images". I consider this an UI bug.



                    This feature is documented on Help > Contents:




                    From the "Save As" dialog box, choose one of the supported file types, or simply change the extension in the "Name" field.




                    It says that the following formats are supported:




                    • PDF

                    • JPEG

                    • PNG

                    • TIFF


                    Interesting fact: if you change the scan type (dropdown besides "Scan") to "Text", the default file type becomes PDF.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Feb 25 '16 at 22:14

























                    answered Aug 18 '15 at 10:31









                    Ciro Santilli 新疆改造中心 六四事件 法轮功Ciro Santilli 新疆改造中心 六四事件 法轮功

                    9,26444346




                    9,26444346























                        1














                        Scan pages from USB scanner. Use tesseract to OCR into a PDF.
                        Merge multiple pages into one PDF.
                        Usage: scan2PDF outputfilename number_of_pages



                        #!/bin/bash
                        #scan2PDF
                        #Requires: tesseract 3.03 for OCR to PDF
                        # scanimage for scanning, I use 1.0.24
                        # pdfunite to merge multiple PDF into one, I use 0.26.5
                        #
                        # Use scanimage -L to get a list of devices.
                        # e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner
                        # then copy/paste genesys:libusb:006:003 into SCANNER below.
                        # play with CONTRAST to get good images
                        DPI=300
                        TESS_LANG=nor #Language that Tesseract uses for OCR
                        SCANNER=genesys:libusb:006:003 #My USB scanner
                        CONTRAST=35 #Contrast to remove paper look

                        FILENAME=$1 #Agrument 1,filename
                        PAGES=$2 #Argument 2, number of pages

                        re='^[0-9]+$' #Check if second argument is a number
                        if ! [[ ${PAGES} =~ $re ]] ; then
                        echo "error: Usage: $0 filename number_of_pages" >&2; exit 1
                        fi

                        SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files
                        TMP_DIR=${SCRIPT_NAME}-tmp

                        if [ -d ${TMP_DIR} ] #Check if it exists a directory already
                        then
                        echo Error: The directory ${TMP_DIR} exists.
                        exit 2
                        fi
                        mkdir ${TMP_DIR} #Make and go to temp dir
                        cd ${TMP_DIR}

                        echo Starts Scanimage...
                        scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES} --batch-prompt


                        echo Starts Tesseract OCR

                        for file in *.tif #Goes through every tif file in temp dir
                        do
                        tesseract $file ${file%.tif} -l ${TESS_LANG} pdf

                        done

                        if [ "$PAGES" = "1" ] #How many pages
                        then
                        cp out1.pdf ../${FILENAME}.pdf #Only one page, just copy the PDF back
                        else
                        for file in *.pdf #More pages, merge the pages into one PDF and copy back
                        do
                        pdfuniteargs+=${file}
                        pdfuniteargs+=" "
                        done
                        pdfunite $pdfuniteargs ../${FILENAME}.pdf
                        fi
                        echo ${FILENAME}.pdf done

                        rm * #Done, clean up
                        cd ..
                        rmdir ${TMP_DIR}





                        share|improve this answer


























                        • it is a very Linuxoidal method

                          – rth
                          Oct 5 '18 at 11:21
















                        1














                        Scan pages from USB scanner. Use tesseract to OCR into a PDF.
                        Merge multiple pages into one PDF.
                        Usage: scan2PDF outputfilename number_of_pages



                        #!/bin/bash
                        #scan2PDF
                        #Requires: tesseract 3.03 for OCR to PDF
                        # scanimage for scanning, I use 1.0.24
                        # pdfunite to merge multiple PDF into one, I use 0.26.5
                        #
                        # Use scanimage -L to get a list of devices.
                        # e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner
                        # then copy/paste genesys:libusb:006:003 into SCANNER below.
                        # play with CONTRAST to get good images
                        DPI=300
                        TESS_LANG=nor #Language that Tesseract uses for OCR
                        SCANNER=genesys:libusb:006:003 #My USB scanner
                        CONTRAST=35 #Contrast to remove paper look

                        FILENAME=$1 #Agrument 1,filename
                        PAGES=$2 #Argument 2, number of pages

                        re='^[0-9]+$' #Check if second argument is a number
                        if ! [[ ${PAGES} =~ $re ]] ; then
                        echo "error: Usage: $0 filename number_of_pages" >&2; exit 1
                        fi

                        SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files
                        TMP_DIR=${SCRIPT_NAME}-tmp

                        if [ -d ${TMP_DIR} ] #Check if it exists a directory already
                        then
                        echo Error: The directory ${TMP_DIR} exists.
                        exit 2
                        fi
                        mkdir ${TMP_DIR} #Make and go to temp dir
                        cd ${TMP_DIR}

                        echo Starts Scanimage...
                        scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES} --batch-prompt


                        echo Starts Tesseract OCR

                        for file in *.tif #Goes through every tif file in temp dir
                        do
                        tesseract $file ${file%.tif} -l ${TESS_LANG} pdf

                        done

                        if [ "$PAGES" = "1" ] #How many pages
                        then
                        cp out1.pdf ../${FILENAME}.pdf #Only one page, just copy the PDF back
                        else
                        for file in *.pdf #More pages, merge the pages into one PDF and copy back
                        do
                        pdfuniteargs+=${file}
                        pdfuniteargs+=" "
                        done
                        pdfunite $pdfuniteargs ../${FILENAME}.pdf
                        fi
                        echo ${FILENAME}.pdf done

                        rm * #Done, clean up
                        cd ..
                        rmdir ${TMP_DIR}





                        share|improve this answer


























                        • it is a very Linuxoidal method

                          – rth
                          Oct 5 '18 at 11:21














                        1












                        1








                        1







                        Scan pages from USB scanner. Use tesseract to OCR into a PDF.
                        Merge multiple pages into one PDF.
                        Usage: scan2PDF outputfilename number_of_pages



                        #!/bin/bash
                        #scan2PDF
                        #Requires: tesseract 3.03 for OCR to PDF
                        # scanimage for scanning, I use 1.0.24
                        # pdfunite to merge multiple PDF into one, I use 0.26.5
                        #
                        # Use scanimage -L to get a list of devices.
                        # e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner
                        # then copy/paste genesys:libusb:006:003 into SCANNER below.
                        # play with CONTRAST to get good images
                        DPI=300
                        TESS_LANG=nor #Language that Tesseract uses for OCR
                        SCANNER=genesys:libusb:006:003 #My USB scanner
                        CONTRAST=35 #Contrast to remove paper look

                        FILENAME=$1 #Agrument 1,filename
                        PAGES=$2 #Argument 2, number of pages

                        re='^[0-9]+$' #Check if second argument is a number
                        if ! [[ ${PAGES} =~ $re ]] ; then
                        echo "error: Usage: $0 filename number_of_pages" >&2; exit 1
                        fi

                        SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files
                        TMP_DIR=${SCRIPT_NAME}-tmp

                        if [ -d ${TMP_DIR} ] #Check if it exists a directory already
                        then
                        echo Error: The directory ${TMP_DIR} exists.
                        exit 2
                        fi
                        mkdir ${TMP_DIR} #Make and go to temp dir
                        cd ${TMP_DIR}

                        echo Starts Scanimage...
                        scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES} --batch-prompt


                        echo Starts Tesseract OCR

                        for file in *.tif #Goes through every tif file in temp dir
                        do
                        tesseract $file ${file%.tif} -l ${TESS_LANG} pdf

                        done

                        if [ "$PAGES" = "1" ] #How many pages
                        then
                        cp out1.pdf ../${FILENAME}.pdf #Only one page, just copy the PDF back
                        else
                        for file in *.pdf #More pages, merge the pages into one PDF and copy back
                        do
                        pdfuniteargs+=${file}
                        pdfuniteargs+=" "
                        done
                        pdfunite $pdfuniteargs ../${FILENAME}.pdf
                        fi
                        echo ${FILENAME}.pdf done

                        rm * #Done, clean up
                        cd ..
                        rmdir ${TMP_DIR}





                        share|improve this answer















                        Scan pages from USB scanner. Use tesseract to OCR into a PDF.
                        Merge multiple pages into one PDF.
                        Usage: scan2PDF outputfilename number_of_pages



                        #!/bin/bash
                        #scan2PDF
                        #Requires: tesseract 3.03 for OCR to PDF
                        # scanimage for scanning, I use 1.0.24
                        # pdfunite to merge multiple PDF into one, I use 0.26.5
                        #
                        # Use scanimage -L to get a list of devices.
                        # e.g. device `genesys:libusb:006:003' is a Canon LiDE 210 flatbed scanner
                        # then copy/paste genesys:libusb:006:003 into SCANNER below.
                        # play with CONTRAST to get good images
                        DPI=300
                        TESS_LANG=nor #Language that Tesseract uses for OCR
                        SCANNER=genesys:libusb:006:003 #My USB scanner
                        CONTRAST=35 #Contrast to remove paper look

                        FILENAME=$1 #Agrument 1,filename
                        PAGES=$2 #Argument 2, number of pages

                        re='^[0-9]+$' #Check if second argument is a number
                        if ! [[ ${PAGES} =~ $re ]] ; then
                        echo "error: Usage: $0 filename number_of_pages" >&2; exit 1
                        fi

                        SCRIPT_NAME=`basename "$0" .sh` #Directory to store temporary files
                        TMP_DIR=${SCRIPT_NAME}-tmp

                        if [ -d ${TMP_DIR} ] #Check if it exists a directory already
                        then
                        echo Error: The directory ${TMP_DIR} exists.
                        exit 2
                        fi
                        mkdir ${TMP_DIR} #Make and go to temp dir
                        cd ${TMP_DIR}

                        echo Starts Scanimage...
                        scanimage -d ${SCANNER} --format=tiff --mode Color --resolution ${DPI} -p --contrast ${CONTRAST} --batch-start=1 --batch-count=${PAGES} --batch-prompt


                        echo Starts Tesseract OCR

                        for file in *.tif #Goes through every tif file in temp dir
                        do
                        tesseract $file ${file%.tif} -l ${TESS_LANG} pdf

                        done

                        if [ "$PAGES" = "1" ] #How many pages
                        then
                        cp out1.pdf ../${FILENAME}.pdf #Only one page, just copy the PDF back
                        else
                        for file in *.pdf #More pages, merge the pages into one PDF and copy back
                        do
                        pdfuniteargs+=${file}
                        pdfuniteargs+=" "
                        done
                        pdfunite $pdfuniteargs ../${FILENAME}.pdf
                        fi
                        echo ${FILENAME}.pdf done

                        rm * #Done, clean up
                        cd ..
                        rmdir ${TMP_DIR}






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Feb 12 '16 at 14:08

























                        answered Feb 12 '16 at 13:52









                        mortenmorten

                        112




                        112













                        • it is a very Linuxoidal method

                          – rth
                          Oct 5 '18 at 11:21



















                        • it is a very Linuxoidal method

                          – rth
                          Oct 5 '18 at 11:21

















                        it is a very Linuxoidal method

                        – rth
                        Oct 5 '18 at 11:21





                        it is a very Linuxoidal method

                        – rth
                        Oct 5 '18 at 11:21











                        0














                        For those of you wishing to use XSANE. It is very powerful, and intuitive once you read the setup guide linked from Help > XSane Doc in the program - to know how much you can do with it. It's also worth checking your SANE backend is working properly (not too Arch specific): https://wiki.archlinux.org/index.php/SANE



                        If you want to automatically scan documents from a feeder, and wonder if XSane will know when to stop (and not stop too early), simply input a number at the top left (number of scans icon) larger than the number of pages that fit in your feeder. I.e. if your feeder can take 10 pages, then enter 15 (to account for thickness variation). If you have a duplex scanner, double this number.



                        When the feeder runs out, you will get a dialog box with a green warning triangle saying ""Scanned pages: 0". This just means that the feeder is empty and you can close the dialog. If you selected "viewer" or "save" at the top right of XSane, then the files will all be there - remember to save them from the viewer. Now you can press scan again to carry on where you left off, with the numbers incrementing from the same point or you can start a new project. There will not be any blank pages added. If you selected "Multipage" the project dialog should show all the completed scans and you can click to save as a multipage PDF or TIFF or PostScript.



                        HTH,



                        DC






                        share|improve this answer




























                          0














                          For those of you wishing to use XSANE. It is very powerful, and intuitive once you read the setup guide linked from Help > XSane Doc in the program - to know how much you can do with it. It's also worth checking your SANE backend is working properly (not too Arch specific): https://wiki.archlinux.org/index.php/SANE



                          If you want to automatically scan documents from a feeder, and wonder if XSane will know when to stop (and not stop too early), simply input a number at the top left (number of scans icon) larger than the number of pages that fit in your feeder. I.e. if your feeder can take 10 pages, then enter 15 (to account for thickness variation). If you have a duplex scanner, double this number.



                          When the feeder runs out, you will get a dialog box with a green warning triangle saying ""Scanned pages: 0". This just means that the feeder is empty and you can close the dialog. If you selected "viewer" or "save" at the top right of XSane, then the files will all be there - remember to save them from the viewer. Now you can press scan again to carry on where you left off, with the numbers incrementing from the same point or you can start a new project. There will not be any blank pages added. If you selected "Multipage" the project dialog should show all the completed scans and you can click to save as a multipage PDF or TIFF or PostScript.



                          HTH,



                          DC






                          share|improve this answer


























                            0












                            0








                            0







                            For those of you wishing to use XSANE. It is very powerful, and intuitive once you read the setup guide linked from Help > XSane Doc in the program - to know how much you can do with it. It's also worth checking your SANE backend is working properly (not too Arch specific): https://wiki.archlinux.org/index.php/SANE



                            If you want to automatically scan documents from a feeder, and wonder if XSane will know when to stop (and not stop too early), simply input a number at the top left (number of scans icon) larger than the number of pages that fit in your feeder. I.e. if your feeder can take 10 pages, then enter 15 (to account for thickness variation). If you have a duplex scanner, double this number.



                            When the feeder runs out, you will get a dialog box with a green warning triangle saying ""Scanned pages: 0". This just means that the feeder is empty and you can close the dialog. If you selected "viewer" or "save" at the top right of XSane, then the files will all be there - remember to save them from the viewer. Now you can press scan again to carry on where you left off, with the numbers incrementing from the same point or you can start a new project. There will not be any blank pages added. If you selected "Multipage" the project dialog should show all the completed scans and you can click to save as a multipage PDF or TIFF or PostScript.



                            HTH,



                            DC






                            share|improve this answer













                            For those of you wishing to use XSANE. It is very powerful, and intuitive once you read the setup guide linked from Help > XSane Doc in the program - to know how much you can do with it. It's also worth checking your SANE backend is working properly (not too Arch specific): https://wiki.archlinux.org/index.php/SANE



                            If you want to automatically scan documents from a feeder, and wonder if XSane will know when to stop (and not stop too early), simply input a number at the top left (number of scans icon) larger than the number of pages that fit in your feeder. I.e. if your feeder can take 10 pages, then enter 15 (to account for thickness variation). If you have a duplex scanner, double this number.



                            When the feeder runs out, you will get a dialog box with a green warning triangle saying ""Scanned pages: 0". This just means that the feeder is empty and you can close the dialog. If you selected "viewer" or "save" at the top right of XSane, then the files will all be there - remember to save them from the viewer. Now you can press scan again to carry on where you left off, with the numbers incrementing from the same point or you can start a new project. There will not be any blank pages added. If you selected "Multipage" the project dialog should show all the completed scans and you can click to save as a multipage PDF or TIFF or PostScript.



                            HTH,



                            DC







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Dec 9 '18 at 7:25









                            user901387user901387

                            1




                            1






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Ask Ubuntu!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f5112%2fscan-many-pages-straight-into-a-pdf%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How to change which sound is reproduced for terminal bell?

                                Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                                Can I use Tabulator js library in my java Spring + Thymeleaf project?