This page will help you understand how to use the Image Selection component of the tool. It will also describe how to use the various settings provided by the OCR-Layout of the DocVisor tool to get a better experience using the Image selection component.

What is Image to Text Mapping

In the process of learning to map a line image to its corresponding text, in intermediate stage produces the alignment matrix via the attention component. We leverage this attention/alignment matrix to introduce a cool feature of Image to Text Mapping and Text to Image mapping.

Alignment Matrix

The attentions produced at each time step, can be viewed as a probability distributions indicating the decoder to pay-attention to a certain features of the image. Using this matrix we find the location the model is paying attention at that time step. To see how this is done, visit this page:


Step 1: In the Visualize Settings of the OCR-Layout, select the Image Selection Component.

Step 2: Select a any sub-portion of the image.

Step 3: View the the corresponding text highlighted on all the models that have attentions provided.

The following gif should give you an idea of the Image Selection Feature and how cool it is.

Gif displaying how to use the Image Selection Feature

Text Font Size

For purpose of easy-visualization, users might want to aling text along with the characters in the image. To do so, follow the steps below:

To do so, follow the below steps:

  1. Open the sidebar (if it is not open)
  2. Expand the Render component under Settings
  3. Go to Text Font Size and choose the font-size value that bests suits the image/dataset.

The following gif should help you understand the feature.

Gif displaying how to change the text font-size

Notice that the predicted text words are almost in line with the lines image words, which can help in visualization.