|
|
AutofileFrom $1Table of contents
Autofile is a utility that scans your documents and attempts to recognize them using a set of user generated rules. So, it can take directory of PDF, TIFF, and G3 faxes/scans, and rename them to:
and this allows you to more easily move the files into patient folders If you use a base scanning directory of c:\scans\ then Autofile will place the fully recognized files in the c:\scans\recognized\ directory and the partially recognized, or unrecognized files into the c:\scans\unrecognized directory. Note that PDFs do not need to have any OCR'd text associated with them. RequirementsOperating SystemsThere are Linux and Windows 32bit versions of Autofile available. See the downloads at the bottom of this page. Source filesAutofile uses the file extension in order to process them correctly. Currently .pdf, .tif, .tiff and .g3 are recognized. InstallationSave autofile.exe in the directory you wish to use it. Then use the [Install] button at top to complete the installation of the helper applications. If you need to install them manually, see below. You will still need to put Ghostscript in the path as detailed below. GhostscriptYou will need to have ghostscript installed. This can be downloaded from here. 64bit windows versions are also available. On Linux, ghostscript is usually already installed. Ghostscript needs to be in the path. See instructions here on how to do this. Imagemagick (optional)You will also need ImageMagick if you plan on using .tif files, and this is available from their download site. ImageMagick is not needed if you will only be using .pdf files. Choose the file that says "Win32 dynamic at 16 bits-per-pixel". Tesseract (optional)Tesseract is an open source OCR program. Although the recognition quality is not good enough to be used as the main OCR engine, it however can be used for doing a scout scan to help choose the appropriate rules to try. The download site is here, and the Windows binary is this one. You need to unpack it so that tesseract.exe is in the same directory as the autofile.exe. You also need the english dictionary. This is in a directory called "tessdata" and the tessdata directory needs to be in the same directory as tesseract. You may need Winzip to unpack the gz archives. IP addressAutofile uses a commercial OCR web service. You need to register your internet address with us so that we can allow Autofile to work with our service after the testing period. If you don't know what your internet address is, try this site to discover it. Note that you will require a static IP address for the moment. Getting started wth AutofileThe first thing you need to do is create some rules for each type of document. I suggest that you start with only a couple of document types so that you understand how it works. A Synapse tutorial is here and much the same applies. Your rules are saved in a file called rules.r - do not delete this as then you will have to redefine all your rules again. You should also keep a copy of the files you used to create the rules in case you need to recreate them, or to check that the regions are correct. Once you have defined some rules, you then select the directory where your scans are, and then click on the [RUN] button. Each file will then be processed. You may have to tweak your rules if it is unable to recognize documents for which you have defined a rule. You may have to create a larger recognition area or reduce it if other text is being captured. If the file is from a fax you may need to increase the vertical area of your box to account for changes in vertical displacement. When the debug check is ticked, you will see some information on what Autofile is seeing which may give you some clues. Because of the way the images are converted from PDF and TIFF to PNG, the sizes are not the same for both. So, you can not apply a rule derived from a TIFF image to a PDF, and vice versa. TutorialHere's a Jing video tutorial on how to create a rule NB: There is a current bug that can lose some of your rules if you update a rule after you do the recognition. Best to update when you first start off Autofile for the moment. Release History
|