Benchmarked multi-script Thai scene text dataset and its multi-class detection solution
Suwanwiwat, Hemmaphan, Das, Abhijit, Saqib, Muhammad, and Pal, Umapada (2021) Benchmarked multi-script Thai scene text dataset and its multi-class detection solution. Multimedia Tools and Applications, 80. pp. 11843-11863.
PDF (Accepted Publisher Version)
- Accepted Version
Restricted to Repository staff only |
|
PDF (Accepted Publisher Version)
- Published Version
Restricted to Repository staff only |
Abstract
Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.
Item ID: | 65702 |
---|---|
Item Type: | Article (Research - C1) |
ISSN: | 1573-7721 |
Copyright Information: | © Springer Science+Business Media, LLC, part of Springer Nature 2021 |
Date Deposited: | 19 Jan 2021 23:07 |
FoR Codes: | 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460304 Computer vision @ 50% 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460308 Pattern recognition @ 50% |
SEO Codes: | 89 INFORMATION AND COMMUNICATION SERVICES > 8999 Other Information and Communication Services > 899999 Information and Communication Services not elsewhere classified @ 100% |
Downloads: |
Total: 4 |
More Statistics |