Skip to content

Morphological Operations – Text extraction

August 23, 2012

Having previously discussed morphological operations, we now attempt to apply this in certain areas. In this particular case, we will attempt to extract text from a scanned document, both handwritten and printed, and reduce it to lines of single pixel width.

We first obtain a scanned image of a document, as shown below.

 

We edit the image using GIMP 2 such that the lines on the page are properly horizontal. A potion of the above document is then selected for processing. Having selected a portion of the document, we apply what we have previously learned in signal filtering in order to remove the horizontal lines on the document, leaving an image with the text more apparent, as shown below.

 

We now work on how to binarize the text. We convert the grayscale image above into a binary image using im2bw() and then invert the image in order to simplify the use of morphological operations. The threshold value needs to be chosen very carefully at this point. Too high and crucial information is lost. Too low and the the data becomes noisy.

Once we have chosen an appropriate threshold value, we now have our very messy binary image. There are two things we would like to do at this point. Dilate the image so that portions that need to be closed are closed, or erode the image to remove unwanted connections.

In this case, we start with the latter. I modified my personal code for erosion such that it repeatedly erodes all points until a signal pixel is left. Once we are left with the single pixel, the erosion function will skip over that point. Pairing this erosion function with a small horizontal structuring element and a small vertical structuring element, we are able to reduce the image to vertical lines and horizontal lines.

I did this because I had the idea that reducing horizontals to a single pixel would be an easy way to get rid of the unwanted thick lines. If I were to accidentally get rid of an important horizontal line in the process, the result of the vertical erosion would compensate and resupply this information. The result of these two erosions combined is shown below.

For some reason, the result was the outline of the text. It may have been a result of how I modified the erosion code, but this result wasn’t entirely unexpected. Still, this wasn’t what we were looking for. We discard this result and simply make use of one erosion direction.

At this point, there is no straightforward way of further processing this image. Of course, the image can still be improved, but there is no longer any one-size-fits-all method. In general, we will have to alternate dilating and eroding the image with different structuring elements. The reason for this is that if we were to continuously dilate the image, we would also be connecting lines that should not be connected. By dilating only enough to connect what needs to be connected, then backpedaling with erosion to remove the residual effects of dilation,  we can further improve the image quality.

Above is the final result. While the handwriting is not clear, it has achieved the goal of reducing parts of the text elements into binary of pixel width.  As I have only achieved part of the activity’s goal, I will be giving myself a score of 8/10.

From → Uncategorized

Leave a Comment

Leave a comment