LineTR: Re-Imagining
Text-Line Segmentation

International Institute of Information Technology,Hyderabad
Center for Visual Information Technology (CVIT)

LineTR works on palm leaf manuscripts in a dataset agnostic manner.

Abstract

Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability.
To this end, we propose LineTR, a single model for all dataset collections. LineTR is a two-staged approach. The first stage predicts text-strike-through lines called scribbles and a novel text-energy map of the input document image. The second stage is a seam-generation network which uses these to get precise polygons around the text-lines.
Text-line segmentation has been mainly approached as a dense-prediction task, which is ineffective, as the inductive prior of a line is not utilized, and this leads to poor segmentation performance. Thus, our key insight is to parametrize a text-line, thus preserving these inductive priors. To avoid resizing the document, the input image is first broken down into context-adapted patches, and each patch is processed by the stage-1 network independently. The patch-level outputs are combined using a dataset-agnostic post processing pipeline. Notably, we show that carefully choosing the patch size to capture enough context is crucial for generalization, as document images come in arbitrary resolutions. LineTR has been evaluated extensively through experiments and qualitative comparisons. Additionally, our method exhibits strong zero-shot generalization to unseen document collections.

Why do previous methods fail?

Previous work treats text-line segmentation as a dense-prediction task. This leads to merging of adjacent text-lines, leading to poor segmentation performance.

method

Proposed Approach: LineTR

Our method first breaks the input image into context-adapted patches (1). These image patches are processed independently by a branched network (stage-1) to output line-parameters and a text-energy map. Specifically, an image patch is passed through a ViT encoder to obtain image features. A DETR-style network, called the Line-Parameter Generator (2a) decodes a set of randomly initialized line-queries conditioned on the image features, and finally predicts the line parameters and probability scores. The second branch, the Text-Energy Map Generator is a hybrid CNN-transformer network which predicts the text-energy map as shown. The patch-level outputs from both the branches are independently post-processed to obtain global outputs (3).
Stage-2 is a seam generation network, which uses the outputs of stage-1 to output precise polygons enclosing the text lines (4).

method
LineTR

Re-Imagining Text-Lines!

We use the point-slope form to parametrize a text-line, as shown.

method

Choose Your Patches Wisely!

Patching avoids resizing the document to a small size. However, choosing fixed size patches is ineffective and hinders out of domain generalization. This is explained by the fact that document images come in arbitrary resolutions, and therefore a fixed patch may not capture good context.

method

To this end, we propose an algorithm for context-aware patching. (1) We sample raw patches of varying sizes. (2) For these raw patches, we perform inference through the Line-Parameter Generator to get noisy predictions. (3) These noisy predictions are used to estimate an average value of the interline gap in the document. (4) This interline gap is then used to get the context adapted patch size. Patches of this size are finally sampled from the document, and fed to LineTR for inference.

method
Context-Adaptive Patching

Qualitative Results

SeamFormer and Palmira - fail when the text-lines have a curvature spread across the document width. But LineTR is able to detect all the text-lines accurately.

Predictions from SeamFormer for a manuscript with curved text.
SeamFormer.
Predictions from Palmira for a manuscript with curved text.
Palmira.
Predictions from LineTR (ours) for a manuscript with curved text.
LineTR (Ours).

SeamFormer and Palmira - fails on images where the density of text is very high. But LineTR succeeds in detecting all the text-lines accurately.

Predictions from SeamFormer for a manuscript with dense text.
SeamFormer.
Predictions from Palmira for a manuscript with dense text.
Palmira.
Predictions from LineTR (ours) for a manuscript with dense text.
LineTR (Ours).

Zero-shot Results

Zero-shot outputs of LineTR on the newly introduced datasets.

Zero-shot result from SM.
Zero-shot result from UB.
Zero-shot result from WM.

LineTR generalizes well!

Even though LineTR was trained only on palm leaf manuscripts, it is able to generalize to documents well outside its domain.

ICDAR2017 dataset prediction 1
ICDAR2017 HTR dataset
ICDAR2017 dataset prediction 2
ICDAR2017 HTR dataset

Quantitative Results

Comparative evaluation of LineTR against baseline models using benchmark datasets.

BibTeX

@article{vaibav2024linetr,
  author    = {Agrawal, Vaibhav and Vadlamudi, Niharika and Waseem, Muhammad and Joseph, Amal and Chitluri, Sreenya and Sarvadevabhatla, Ravi Kiran},
  title     = {LineTR:Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts},
  journal   = {ICPR},
  year      = {2024},
}

Contact

If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in.