Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes,
scripts, and appearances.
Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting
scalability and maintainability.
To this end, we propose LineTR, a single model for all dataset collections.
LineTR is a two-staged approach. The first stage predicts text-strike-through lines called
scribbles and a novel text-energy map of the input document image. The second stage is a
seam-generation network which uses these to get precise polygons around the text-lines.
Text-line segmentation has been mainly approached as a dense-prediction task, which is ineffective, as the
inductive prior of a line is not utilized, and this leads to poor segmentation performance. Thus, our key
insight is to parametrize a text-line, thus preserving these inductive priors. To avoid resizing the
document, the input image is first broken down into context-adapted patches, and each patch is
processed by the stage-1 network independently. The patch-level outputs are combined using a
dataset-agnostic post processing pipeline. Notably, we show that carefully choosing the patch size to
capture enough context is crucial for generalization, as document images come in arbitrary
resolutions.
LineTR has been evaluated extensively through experiments and qualitative comparisons. Additionally, our
method exhibits strong zero-shot generalization to unseen document collections.
Previous work treats text-line segmentation as a dense-prediction task. This leads to merging of adjacent text-lines, leading to poor segmentation performance.
Our method first breaks the input image into context-adapted patches (1). These image patches are
processed independently by a branched network (stage-1) to output line-parameters and a text-energy map.
Specifically, an image patch is passed through a ViT encoder to obtain image features. A DETR-style
network, called the Line-Parameter Generator (2a) decodes a set of randomly initialized
line-queries conditioned on the image features, and finally predicts the line parameters and
probability scores. The second branch, the Text-Energy Map Generator is a hybrid CNN-transformer
network which predicts the text-energy map as shown. The patch-level outputs from both the branches are
independently post-processed to obtain global outputs (3).
Stage-2 is a seam generation network, which uses the outputs of stage-1 to output precise polygons
enclosing the text lines (4).
We use the point-slope form to parametrize a text-line, as shown.
Patching avoids resizing the document to a small size. However, choosing fixed size patches is ineffective and hinders out of domain generalization. This is explained by the fact that document images come in arbitrary resolutions, and therefore a fixed patch may not capture good context.
To this end, we propose an algorithm for context-aware patching. (1) We sample raw patches of varying sizes. (2) For these raw patches, we perform inference through the Line-Parameter Generator to get noisy predictions. (3) These noisy predictions are used to estimate an average value of the interline gap in the document. (4) This interline gap is then used to get the context adapted patch size. Patches of this size are finally sampled from the document, and fed to LineTR for inference.
SeamFormer and Palmira - fail when the text-lines have a curvature spread across the document width. But LineTR is able to detect all the text-lines accurately.
SeamFormer and Palmira - fails on images where the density of text is very high. But LineTR succeeds in detecting all the text-lines accurately.
Zero-shot outputs of LineTR on the newly introduced datasets.
Even though LineTR was trained only on palm leaf manuscripts, it is able to generalize to documents well outside its domain.
@article{vaibav2024linetr,
author = {Agrawal, Vaibhav and Vadlamudi, Niharika and Waseem, Muhammad and Joseph, Amal and Chitluri, Sreenya and Sarvadevabhatla, Ravi Kiran},
title = {LineTR:Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts},
journal = {ICPR},
year = {2024},
}
If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in.