BoundaryNet

BoundaryNet - An Attentive Deep Network with Fast Marching Distance Maps for Semi-automatic Layout Annotation
[ORAL PRESENTATION]

Abhishek Trivedi, Ravi Kiran Sarvadevabhatla

International Institute of Information Technology, Hyderabad
International Conference on Document Analysis and Recognition (ICDAR 2021)

[Paper] [Code]

Explore BoundaryNet results with our interactive dashboard - Click [here]

Overview

In this work, we propose BoundaryNet, a novel resizing-free approach for high-precision semi-automatic layout annotation. The variable-sized user selected region of interest is first processed by an attention-guided skip network. The network optimization is guided via Fast Marching distance maps to obtain a good quality initial boundary estimate and an associated feature representation. These outputs are processed by a Residual Graph Convolution Network optimized using Hausdorff loss to obtain the final region boundary. Results on a challenging image manuscript dataset demonstrate that BoundaryNet outperforms strong baselines and produces high-quality semantic region boundaries. Qualitatively, our approach generalizes across multiple document image datasets containing different script systems and layouts, all without additional fine-tuning.

Architecture

The architecture of BoundaryNet (top) and various sub-components (bottom). The variable-sized H×W input image is processed by Mask-CNN (MCNN) which predicts a region mask estimate and an associated region class. The mask’s boundary is determined using a contourization procedure (light brown) applied on the estimate from MCNN. M boundary points are sampled on the boundary. A graph is constructed with the points as nodes and edge connectivity defined by 6 k-hop neighborhoods of each point. The spatial coordinates of a boundary point location p = (x, y) and corresponding back- bone skip attention features from MCNN f^r are used as node features for the boundary point. The feature-augmented contour graph G = (F, A) is iteratively processed by Anchor GCN to obtain the final output contour points defining the region boundary.

`Teaser Video (Click on Image above)`

Material

Paper

Code [PyTorch]

Supplementary

Instance-level Results from Bounding Box Supervision

Page-level Results on Historical Documents

Interaction with Human Annotators

A small scale experiment was conducted with real human annotators in the loop to determine BoundaryNet utility in a practical setting. The annotations for a set of images were sourced using HINDOLA document annotation system in three distinct modes: Manual Mode (hand-drawn contour generation and region labelling), Fully Automatic Mode (using an existing instance segmentation approach- Indiscapes with post-correction using the annotation system) and Semi-Automatic Mode (manual input of region bounding boxes which are subsequently sent to BoundaryNet, followed by post-correction). For each mode, we recorded the end-to-end annotation time at per-document level, including manual correction time (Violin plots shown in the figure). BoundaryNet outperforms other approaches by generating superior quality contours which minimize post-inference manual correction burden.

Citation

  @inProceedings{trivedi2021boundarynet,
    title   = {BoundaryNet: An Attentive Deep Network with Fast Marching Distance Maps for Semi-automatic 
               Layout Annotation},
    author  = {Trivedi, Abhishek and Sarvadevabhatla, Ravi Kiran},
    booktitile = {International Conference on Document Analysis and Recognition},
    year    = {2021}
  }

Related Work

A. Trivedi, RK. Sarvadevabhatla. Hindola: A Unified Cloud-based Platform for Annotation, Visualization and MachineLearning-based Layout Analysis of HistoricalManuscripts. ICDAR-OST, 2019.
Comment: Features an intuitive Annotation GUI, a graphical analytics dashboard and interfaces with machine-learning based intelligent modules for Historical Documents Annotation.

A. Prusty, S. Aitha, A. Trivedi, RK. Sarvadevabhatla. Indiscapes: Instance Segmentation Networks forLayout Parsing of Historical Indic Manuscripts. ICDAR, 2019.
Comment: The first ever dataset with multi-regional layout annotations for historical Indic manuscripts..

S.P Sharan, S.Aitha, A.Kumar, A. Trivedi, A. Augustine, RK. Sarvadevabhatla. PALMIRA: A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts. ICDAR, 2021.
Comment: A novel deep network Palmira for robust, deformation-aware instance segmentation of regions in handwritten manuscripts.

Contact

If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in .