Extracting Digital Results from OCR'd text

From $1

In an ideal practice, we would receive all our results digitally using established encoding systems (HL7) and via the internet.  Having results in a digital format greatly facilitates our ability to manage patients with long term management problems.

However, for many of us, the truth is far from this with results arriving by fax or paper which we then have to deal with.

New versions of Synapse (R212B21) attempt to partially address this by using technology to overcome the facsimile barrier.  We have already introduced tools that automate the OCR of any documents uploaded to Synapse, and now we are able to extract the results as digital values which can then be injected into the patient's record.

If you have used an up to date version of the Synapse database update tool, then you should see a number of default rules that are used to extract text from OCR'd fields.  Navigate to Settings/OCR to see if they are there, and click on "Refresh".  If nothing appears, you will need to run the update tool.

OCR'd blood results

Blood results can only be extracted if they follow certain formats.  Some common formats are:

Hb 121 g/L

Hemoglobin 123 138 133 g/L (115-155)

ESR (Westergren) 44 mm/h 

where the second hemoglobin is a serial result, and the last hemoglobin result ( 133 ) is the one we want.  Sometimes other words follow the test name as in (Westergren).  So, our rules have to take all of that into account.

Note that these rules only work when the name of the test is on the same line as the result. If the names of the tests are in a column format where the names are at the top, then it is not currently possible to extract these results.

The OCR Rule

The rule has the following characteristics:

  1. Name - the name of the test. This text will be used to see if there is a match anyway in the OCR field.
  2. Synonyms - these are alternate names for the same test.  One laboratory might use "Alkaline Phosphatase" whereas another might use "Alk.Phos.". 
  3. Followed - this refers to a bit of text that may optionally follow the test name.  Eg. we sometimes see "mm" after the ESR, or "calc" after "LDL cholesterol".
  4. Low - this refers to the lowest possible value that this result might have
  5. High - this refers to the highest possible value that this result might have.  This is so large numbers that are picked up can be discarded.
  6. Units - the units of measurement for this test
  7. LOINC - the LOINC code for the test.  This value is not necessary, but will help in graphing out trends, and fufilling heath maintenance guidelines

Customizing the Rules

You can click on a rule to change it to something that suits the way your blood results scan out, and also add new results.

Running the Extraction

In the patient's results tab, click on a scanned lab result and see if there is any text in the OCR tab.  If there is, then right click inside the OCR field to start the extraction process.  You will be asked if you want the first or last result in a row.

When it is finished, you have the option to modify results that have been identified incorrectly, remove them, or add a new result that was missed.  You can also bring up the original scan to check things.  Once you are happy with the outcome, you click on the "Insert into Synapse" button to insert the results.  The OCR field will be altered to show that the results have been extracted.

http://synapse-images.s3.amazonaws.com/digital-extraction.png

In the image above, we see that some results have a * or a ?  besides them.  The tool checks to see if you have a normal range for this particular LOINC value, and places a * if it does and is outside the range.  If no normal range exists, it will place a ? instead.  A blank indicates it is inside the normal range.


Tags:
 
Images (0)
 
Comments (0)
You must login to post a comment.