EpiSAM

Abstract

Stone inscriptions are invaluable sources of historical and linguistic knowledge, yet their automated analysis remains a major challenge due to surface irregularities, erosion, and low visual contrast. Conventional document and handwriting analysis techniques fail to perform well in these scenarios. In this work, we propose character detection as a core strategy for robust inscription analysis. We introduce EpiSAM, a prompt-guided transformer framework for character segmentation in stone inscriptions. Rather than treating characters in isolation, EpiSAM employs a novel neighbor-aware strategy, explicitly predicting adjacent characters alongside the target. These contextual cues resolves boundary ambiguities, improving mask generation and enabling more accurate character segmentation. Furthermore, we expand an existing stone inscription dataset by adding dense polygonal annotations for characters, thereby enabling comprehensive research on Southeast Asian epigraphy. Experimental results shows that EpiSAM achieves consistent improve- ments over existing baselines, while also exhibiting strong zero-shot generalization in challenging epigraphic scenarios.

Proposed Approach

Example inference result from various models.

Rather than predicting each character in isolation, we exploit the structured nature of inscriptions, where characters appear in locally coherent spatial sequences. We reformulate character segmentation as a localized, neighbor-aware prediction task: estimating a target character jointly with its immediate left and right neighbors.

This formulation introduces strong contextual constraints that reduce boundary ambiguities and improve separation between adjacent characters, especially under heavy noise and texture ambiguity. By leveraging local spatial continuity, the model achieves more reliable instance-level character segmentation without requiring any explicit line-level modeling.

Based on this formulation, we propose EpiSAM, a prompt-guided transformer framework that integrates neighbor-aware contextual awareness for robust character segmentation. Our method leverages the strong visual representation and prompt-based localization capabilities of the Segment Anything Model (SAM) to produce fine-grained masks guided by sparse prompts.

Qualitative Results

Qualitative comparison of character segmentation against the strongest baseline (YOLOv11x-Seg). Under severe degradation and surface erosion, EpiSAM produces tighter and more precise character masks, resulting in improved instance separation in densely packed inscriptions.

Quantitative Results

EpiSAM demonstrates superior instance-level segmentation performance across metrics.

Zero Shot Results

Zero-shot character segmentation results of EpiSAM on (from top to bottom) Roman, Brahmi and Thai inscriptions. Even when characters are partially missing or severely eroded, the model produces spatially coherent and well-separated character masks.

Dataset

The extended Shilalekhya Inscription Dataset will be made publically available soon.

Contact

If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in.

Acknowledgement

We sincerely acknowledge The Mythic Society Bengaluru for providing the inscription images used in this work. These resources are part of the Inscriptions 3D Digital Conservation Project, an initiative aimed at preserving and digitizing valuable epigraphic heritage.

For more information about the project, please visit: Akshara Bhandara – Inscriptions 3D Digital Conservation Project .

EpiSAM: Character Segmentation in Challenging Stone Inscriptions

Sample images and their corresponding character masks from our dataset. Notice the difficulty distinguishing the shallow handwritten text etching from the background stone texture with naked eye. For clarity, the character masks are shown as boundary overlays.