Synapse Developer > Autofile

Autofile

From $1

Autofile is a utility that scans your documents and attempts to recognize them using a set of user generated rules.

So, it can take directory of PDF, TIFF, and G3 faxes/scans, and rename them to:

surname_fname_dob_testtype_testdate_originalfilename

and this allows you to more easily move the files into patient folders

If you use a base scanning directory of c:\scans\ then Autofile will place the fully recognized files in the c:\scans\recognized\ directory and the partially recognized, or unrecognized files into the c:\scans\unrecognized directory.

Note that PDFs do not need to have any OCR'd text associated with them. 

Requirements

Operating Systems

There are Linux and Windows 32bit versions of Autofile available.  See the downloads at the bottom of this page.

Source files

Autofile uses the file extension in order to process them correctly.  Currently .pdf, .tif, .tiff and .g3 are recognized. 

Installation

Save autofile.exe in the directory you wish to use it.  Then use the [Install] button at top to complete the installation of the helper applications.  If you need to install them manually, see below.  You will still need to put Ghostscript in the path as detailed below.

Ghostscript

You will need to have ghostscript installed.  This can be downloaded from here.  64bit windows versions are also available.  On Linux, ghostscript is usually already installed.

Ghostscript needs to be in the path.  See instructions here on how to do this.

Imagemagick (optional)

You will also need ImageMagick if you plan on using .tif files, and this is available from their download site.  ImageMagick is not needed if you will only be using .pdf files.   Choose the file that says "Win32 dynamic at 16 bits-per-pixel".

Tesseract (optional)

Tesseract is an open source OCR program.  Although the recognition quality is not good enough to be used as the main OCR engine, it however can be used for doing a scout scan to help choose the appropriate rules to try.

The download site is here, and the Windows binary is this one.  You need to unpack it so that tesseract.exe is in the same directory as the autofile.exe.  You also need the english dictionary.  This is in a directory called "tessdata" and the tessdata directory needs to be in the same directory as tesseract. 

You may need Winzip to unpack the gz archives. 

IP address

Autofile uses a commercial OCR web service.  You need to register your internet address with us so that we can allow Autofile to work with our service after the testing period.

If you don't know what your internet address is, try this site to discover it.  Note that you will require a static IP address for the moment. 

Getting started wth Autofile

The first thing you need to do is create some rules for each type of document.  I suggest that you start with only a couple of document types so that you understand how it works.  A Synapse tutorial is here and much the same applies.

Your rules are saved in a file called rules.r - do not delete this as then you will have to redefine all your rules again.  You should also keep a copy of the files you used to create the rules in case you need to recreate them, or to check that the regions are correct.

Once you have defined some rules, you then select the directory where your scans are, and then click on the [RUN] button.  Each file will then be processed.  You may have to tweak your rules if it is unable to recognize documents for which you have defined a rule.  You may have to create a larger recognition area or reduce it if other text is being captured.  If the file is from a fax you may need to increase the vertical area of your box to account for changes in vertical displacement.

When the debug check is ticked, you will see some information on what Autofile is seeing which may give you some clues.

Because of the way the images are converted from PDF and TIFF to PNG, the sizes are not the same for both.  So, you can not apply a rule derived from a TIFF image to a PDF, and vice versa. 

Tutorial

Here's a Jing video tutorial on how to create a rule

NB: There is a current bug that can lose some of your rules if you update a rule after you do the recognition.  Best to update when you first start off Autofile for the moment. 

Release History

  • Build 23: Removes any spaces found in the NHI zone.
  • Build 22: Crops the image horizontally as welll as vertically.
  • Build 21: If a rule is selected in the rules table, it will be used as the first rule.  This is where you know which document type you are importing.
  • Build 20: Switches back to Web OCR if get non alpha characters in patient's name
  • Build 19: If using Tesseract, and OCR fails on the date, it switches back to the Web OCR to try again.
  • Build 18: Option to use Tesseract on high quality scans.  Regions are now cropped before OCR.  Files are copied from the source directory and renamed prior to OCR to fix paperport filenames which were too long
  • Build 17: Adds support for g3 raw fax format
  • Build 16: Now has an [Install] button to download all the necessary files
  • Build 15: brackets the zone for scanning
  • Build 13: Bug fix for date errors - not binding to date rule
  • Build 12: Bug fix for build 11, was not moving the files
  • Build 11: If debug is ticked, then the tesseract ocr'd text is written to the directory ./tessoutput
  • Build 10: More network error checking, and now waits if service is temporarily unreachable
  • Build 8: Does a scout scan using tesseract which improves speed considerably
  • Build 7: Optional given name field, and larger image screen for selecting regions
  • Build 5: Adds TIFF support
  • Build 4: Adds a [Load Coords] button so that definitions can be displayed on a PDF
  • Build 3: Removed residual debugging code still present
Tags:
FileSizeDateAttached by 
 autofile
Autofile build 5 for Linux
928.8 kB10:21, 9 Apr 2008GrahamActions
autofile.exe
Autofile build 23 for Windows
666.35 kB22:36, 3 May 2008GrahamActions
Images (0)
 
Comments (4)
Viewing 4 of 4 comments: view all
which version of ImageMagick ?

8 bit, 16 bit, static, dynamic ?
Posted 23:49, 10 Apr 2008
the first version in the download list ..
Posted 02:04, 11 Apr 2008
Ahh.....
ImageMagick-6.4.0-5-Q16-windows-dll.exe (Win32 dynamic at 16 bits-per-pixel).

I added a direct link.

Posted 02:07, 11 Apr 2008
Removed the direct link ... they keep updating the files and so your direct link already 404'd.
Posted 20:31, 12 Apr 2008
Viewing 4 of 4 comments: view all
You must login to post a comment.