Skeleton-based Generative Adversarial Networks and Novel Evaluation Metrics for Font Structural Style Transfer

Main Article Content

Thanaphon Thanusan
Karn Patanukhom

Abstract

We present a novel approach for font style transfer using Generative Adversarial Networks (GANs) to enhance the scene text editing process, enabling text editing from any characters to any characters, including cross-language editing. Our GAN model utilizes pairs of sample images of the target font style and the corresponding skeleton-based features to learn their key structural details without relying on pre-trained models. Once the generator is trained, it can transform any character from the base font style to the target font style. Our approach offers the flexibility to select a base font similar to the target font for enhancing results and the ability to manipulate the stroke width of the output text. Additionally, in few-shot scenarios, we introduce a double generator scheme that integrates other existing methods with our approach. In this work, we also introduce two new evaluation metrics: Difference in Histogram of Oriented Gradients and Stroke Width Similarity. Our experimental results demonstrate that the proposed evaluation metrics can better measure font style similarity with greater robustness compared to conventional metrics. We evaluate the performance of our GAN model on style transfer for six target fonts and real scene text editing tasks, comparing it with existing methods. Our approach provides better structural similarity, readability, and visual appeal than other methods, especially for generating unseen characters.

Article Details

How to Cite
[1]
T. Thanusan and K. Patanukhom, “Skeleton-based Generative Adversarial Networks and Novel Evaluation Metrics for Font Structural Style Transfer”, ECTI-CIT Transactions, vol. 19, no. 3, pp. 501–515, Aug. 2025.
Section
Research Article

References

B. Epshtein, E. Ofek and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp. 2963-2970, 2010.

S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “TextSnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 20–36, 2018.

X. Zhou et al., “EAST: An Efficient and Accurate Scene Text Detector,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 26422651, 2017.

Y. Gao, Y. Guo, Z. Lian, Y. Tang, and J. Xiao, “Artistic glyph image synthesis via onestage few-shot learning,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–12, 2019.

T. Thanusan and K. Patanukhom, “Skeletonbased generative adversarial networks for font shape style transfer,” in Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition, no.2, pp. 1–7, 2023.

S. Yang, Z. Wang, Z. Wang, N. Xu, J. Liu and Z. Guo, “Controllable Artistic Text Style Transfer via Shape-Matching GAN,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 4441-4450, 2019.

L. Wu, C. Zhang, J. Liu, J. Han, J. Liu, E. Ding and X. Bai, “Editing text in the wild,” in Proceedings of the 27th ACM International Conference on Multimedia, pp. 1500–1508, 2019.

P. Roy, S. Bhattacharya, S. Ghosh and U. Pal, “STEFANN: Scene Text Editor Using Font Adaptive Neural Network,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 13225-13234, 2020.

Q. Yang, J. Huang and W. Lin, “SwapText: Image Based Texts Transfer in Scenes,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 14688-14697, 2020.

A. B. Anees, A. C. Baykal, M. B. Kizil, D. Ceylan, E. Erdem, and A. Erdem, “HyperGANCLIP: A unified framework for domain adaptation, image synthesis and manipulation,” in Proc. SIGGRAPH Asia Conf. Papers (SA ’24), Art. 105, pp. 1–12, 2024.

V. Singh, “Data augmentation techniques using generative adversarial networks: Employing GANs to create synthetic data for enhancing machine learning model training,” Journal of Engineering Research and Reports, vol. 27, no. 2, pp. 228–248, 2025.

M. Ayadi et al., “Empowering accessibility in handwritten Arabic text recognition for visually impaired individuals through optimized generative adversarial network (GAN) model,” Journal of Disability Research, vol. 4, no. 1, 2025.

Y. LeCun, P. Haffner, L. Bottou and Y. Bengio, “Object recognition with gradient-based learning,” in Shape, Contour and Grouping in Computer Vision, pp. 319–345, Springer, 1999.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 27, 2014.

Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.

J. Susskind, A. Anderson and G. E. Hinton, “The Toronto Face Dataset,” Univ. Toronto, Tech. Rep., UTML TR 2010-001, 2010.

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Univ. Toronto,Tech. Rep., 2009.

Y. Tian, X. Peng, L. Zhao, S. Zhang, and D. N. Metaxas, “CRGAN: Learning complete representations for multi-view generation,” arXiv preprint, arXiv:1806.11191, 2018. ´

G. Iglesias, E. Talavera, and A. D´ıazAlvarez, “A survey on GANs for computer vision: Recent research, analysis and taxonomy,” Computer Science Review, vol. 48, p. 100553, 2023.

S. Azadi, M. Fisher, V. Kim, Z. Wang, E. Shechtman and T. Darrell, “Multi-content GAN for Few-Shot Font Style Transfer,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 75647573, 2018.

M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint, arXiv:1411.1784, 2014.

P. Isola, J. -Y. Zhu, T. Zhou and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 5967-5976, 2017.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen, “Improved techniques for training GANs,” in NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems , vol. 29, pp. 2234 2242, 2016.

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems , vol. 30, 2017.

Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” in IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004.

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, vol. 1, pp. 886-893, 2005

B. Shi, X. Bai and C. Yao, “An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017.

D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.

Z. Guo and R. W. Hall, “Parallel thinning with two-subiteration algorithms,” Communications of the ACM, vol. 32, no. 3, pp. 359–373, 1989.