Title

Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents

Document Type

Article

Publication details

Alaei, A & Nagabhushan, P 2011, 'Piec-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents', Pattern Analysis and Applications, vol. 14, no. 4, pp. 381-394.

Published version available from

http://dx.doi.org/10.1007/s10044-011-0226-x

Peer Reviewed

Peer-Reviewed

Abstract

The most important and difficult task in text document analysis is to achieve line segmentation accurately, particularly when the document is composed of unconstrained handwritten text. To accomplish this objective a painting scheme is proposed in this research work. Being motivated by the fact that the handwritten Persian texts offer the most critical challenges in the process of text-line segmentation, the new method has been devised by studying the cursive Persian text scripts extensively; yet, in general the proposed line segmentation algorithm is applicable to handwritten text in any language/script. The text block is vertically decomposed into parallel pipe structures called as strip. Each row in each strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. Subsequently, the painted pipes are converted into two-tone painting and it is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, phrased as Piece-wise Potential Separating Line (PPSL), between two consecutive black spaces. The PPSLs are concatenated to produce the segmentation of text lines. Some additional procedures are built to handle certain anomalies, which may occur. The scheme is validated by extensive experimentation. We tested the proposed algorithm with 52 pages of Persian text documents containing totally 823 lines and correct line segmentation of 92.35% is achieved. Moreover, the proposed algorithm was also tested with two different datasets of 152 and 200 handwritten text-pages of different languages. Efficiency and script independency of the proposed algorithm were proved when compared with various approaches presented in recent literature.

Find in your library

Share

COinS