Stone inscription images pose severe challenges for binarization due to poor contrast between etched characters and the stone background, non-uniform surface degradation, distracting artifacts, and highly variable text density and layouts. These conditions frequently cause existing binarization techniques to fail and struggle to isolate coherent character regions. Many approaches sub-divide the image into patches to improve text fragment resolution and improve binarization performance. With this in mind, we present a robust and adaptive patching strategy to binarize challenging Indic inscriptions.
The patches from our approach are used to train an Attention U-Net for binarization. The attention mechanism allows the model to focus on subtle structural cues, while our dynamic sampling and patch selection method ensures that the model learns to overcome surface noise and layout irregularities. We also introduce a carefully annotated, pixel-precise dataset of Indic stone inscriptions at the character-fragment level. We demonstrate that our novel patching mechanism significantly boosts binarization performance across classical and deep learning baselines.
Despite training only on single script Indic dataset, our model exhibits strong zero-shot generalization to other Indic and non-indic scripts, highlighting its robustness and script-agnostic generalization capabilities. By producing clean, structured representations of inscription content, our method lays the foundation for downstream tasks such as script identification, OCR, and historical text analysis.
We propose a novel, spatially adaptive, Character-Context-Aware patching mechanism. The resulting patches are used to train a binarization network. At test time, a self-refining inference pipeline is used to intelligently mimic the trainingtime strategy, thereby enabling robust binarization.
Our novel patching strategy uses the character components size to set the patch size and is inherently adaptive to the specific content of each image. The character component height is estimated from the ground truth masks and the pixels are grouped to foreground (text) and background (stone texture) using morphological dilation with an adaptive kernel. Patches are sampled from the foreground and background regions at multiple scales based on the character height ensuring that characters always appear at a consistent scale.
Our patching strategy teaches the model to “see” inscriptions the way humans do — by focusing on meaningful character regions at the right scale while also learning how the surrounding stone texture looks. It builds a more robust and context-aware foundation for the binarization model that follows.
During inference, the trained Attention U-Net is applied in a two-stage process to achieve accurate binarization of unseen stone inscription images.
Stage 1 – Initial Prediction
The image is processed at multiple scales (256, 384, 512, 768 px) using a sliding-window approach. For each pixel, the maximum prediction probability across scales is taken to form a coarse binary map. This map serves as a pseudo-ground truth that roughly identifies text regions, guiding the next stage.
Stage 2 – Context-Aware Refinement
The coarse map is used to apply the Character-Context-Aware Patching strategy for better sampling of text and background regions at optimal scales. These refined patches are reprocessed through the same Attention U-Net to produce the final binarized output. This stage reduces false positives and improves text boundary coherence.
Qualitative Comparison between our method and other approaches. From left to right, the input inscription image and ground truth mask, and the predictions by Otsu, Savoula, FCN, NAF-DPM and our model. The characters restored by our network are clearly more readable and accurate.
Demonstration of our model’s robust zero-shot generalization. Examples are from challenging, in-the-wild Indic and Byzantine-era Medieval Greek inscriptions. Despite significant variations in script, lighting, and surface degradation, our method consistently produces clean, legible binarizations. Note: The predicted binary maps are overlaid on the inscriptions
The Shilalekhya Inscription Binarization Dataset will be made publically available soon.
@inproceedings{jena2025inscription,
author = {Pratyush Jena and Amal Joseph and Arnav Sharma and Ravi Kiran Sarvadevabhatla},
title = {Unveiling Text in Challenging Stone Inscriptions: A Character-Context-Aware Patching Strategy for Binarization},
booktitle = {Indian Conference on Computer Vision, Graphics, and Image Processing (ICVGIP 2025)},
year = {2025},
address = {Mandi, India},
doi = {10.1145/3774521.3774539},
isbn = {979-8-4007-1930-1/25/12},
note = {To appear}
}
If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in.