LineTR

International Institute of Information Technology,Hyderabad
Center for Visual Information Technology (CVIT)

LineTR works on palm leaf manuscripts in an dataset agnostic manner.

Abstract

We propose LineTR, a novel two-stage line segmentation approach which can process a diverse variety of challenging handwritten documents in a unified, dataset-agnostic manner.

Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability. In the first stage, LineTR processes context-adaptive image patches using a DETR-style network to generate parametric representations of text lines and a hybrid CNN-transformer network to create a text energy map. A robust post-processing procedure converts these into document-level scribbles. In the second stage, these scribbles and the text energy map are used to generate precise polygons enclosing the text lines. Experimental results demonstrate that LineTR achieves superior line segmentation with a single model and performs well in zero-shot inference on the new datasets.

Video

Introduction

Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability. In the first stage, LineTR processes context-adaptive image patches using a DETR-style network to generate parametric representations of text lines and a hybrid CNN-transformer network to create a text energy map. A robust post-processing procedure converts these into document-level scribbles. In the second stage, these scribbles and the text energy map are used to generate precise polygons enclosing the text lines. Experimental results demonstrate that LineTR achieves superior line segmentation with a single model and performs well in zero-shot inference on the new datasets.

Network Architecture

Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability. In the first stage, LineTR processes context-adaptive image patches using a DETR-style network to generate parametric representations of text lines and a hybrid CNN-transformer network to create a text energy map. A robust post-processing procedure converts these into document-level scribbles. In the second stage, these scribbles and the text energy map are used to generate precise polygons enclosing the text lines. Experimental results demonstrate that LineTR achieves superior line segmentation with a single model and performs well in zero-shot inference on the new datasets.

BibTeX

@article{vaibav2024linetr,
  author    = {TBD},
  title     = {LineTR:Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts},
  journal   = {ICPR},
  year      = {2024},
}

Contact

If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in.