Ontology Generation and Instance Extraction of Medicine Information from Thai Semi-structured Data

Main Article Content

Taneth Ruangrajitpakorn
Thepchai Supnithi
Rachada Kongkachandra

Abstract

An ontology is a widely used knowledge base for representing domain knowledge. Developing a knowledge-representing ontology is difficult, as it requires both domain and engineering expertise. Yet, such ontologies are essential for enabling intelligent systems to comprehend real-world knowledge through structured concept networks. In the Thai context, ontology research remains limited due to the scarcity of structured resources, standardized schemas, and annotated corpora for automatic knowledge extraction. This study addresses this gap by proposing a pattern-based methodology for ontology generation and instance extraction from Thai semi-structured medicine data, providing an alternative to resource-intensive deep-learning methods. The proposed approach identifies patterns of collocated Thai text and builds a collocation tree of word sequences, in which shared sequences represent ontological properties and variable sequences represent instance values. The method was applied to two complementary Thai medicine datasets, namely I-Med (a hospital dispensing-record database) and Pobpad (a public health-information website), to generate and integrate ontology components. These templates were transformed into ontological properties and converted into RDF/OWL format to produce a standard ontology usable for querying and reasoning. The generated ontology achieved high performance (Precision = 0.97, Recall = 0.90, F1 = 0.91) and received favorable assessments from domain experts. The results indicate that the proposed approach can effectively extract structured knowledge from Thai semi-structured text and produce a reliable ontology suitable for medical knowledge representation, providing a data-driven foundation for future Thai intelligent systems.

Article Details

How to Cite
[1]
T. Ruangrajitpakorn, T. Supnithi, and R. Kongkachandra, “Ontology Generation and Instance Extraction of Medicine Information from Thai Semi-structured Data”, ECTI-CIT Transactions, vol. 20, no. 1, pp. 77–91, Jan. 2026.
Section
Research Article

References

R. Studer, R. Benjamins and D. Fensel, “Knowledge engineering: Principles and methods,” Data & Knowledge Engineering, vol. 25, no. 1–2, pp. 161–198, 1998.

W. Swartout and A. Tate, “Ontologies,” in IEEE Intelligent Systems and their Applications, vol. 14, no. 1, pp. 18-19, Jan.-Feb. 1999

N. F. Noy and D. L. McGuinness, “Ontology development 101: A guide to creating your first ontology,” Stanford Knowledge Systems Laboratory Technical Report KSL-01-05, 2001.

B. Smith, “Beyond concepts: Ontology as reality representation,” in Formal Ontology in Information Systems – Proc. 3rd Int. Conf. (FOIS 2004), A. C. Varzi and L. Vieu, Eds. Amsterdam: IOS Press, pp. 73–85, 2004.

R. Mizoguchi, “Tutorial on ontological engineering—Part 1: Introduction to ontological engineering,” New Generation Computing, vol. 21, no. 4, pp. 365–384, 2003.

J. Davies, “Lightweight ontologies,” in Theory and Applications of Ontology: Computer Applications, pp. 197–229, 2010.

T. R. Gruber, “Towards principles for the design of ontologies used for knowledge sharing,” in Formal Ontology in Conceptual Analysis and Knowledge Representation, N. Guarino and R. Poli, Eds. Deventer, The Netherlands: Kluwer Academic Publishers, 1993.

Prot´eg´e, “A free, open-source ontology editor and framework for building intelligent systems,” [Online]. Available: https://protege. stanford.edu. [Accessed: Jan. 6, 2020].

K. Kozaki, Y. Kitamura, M. Ikeda and R. Mizoguchi, “Hozo: An environment for building/using ontologies based on a fundamental consideration of ‘role’ and ‘relationship’,” in Proc. 13th Int. Conf. Knowledge Engineering and Knowledge Management (EKAW 2002), Sig¨uenza, Spain, pp. 213–218, Oct. 1–4, 2002.

B. Motik, R. Shearer and I. Horrocks, “Hypertableau reasoning for description logics,” Journal of Artificial Intelligence Research, vol. 36, pp. 165–228, 2009.

V. Haarslev and R. M¨oller, “Racer: A core inference engine for the semantic web,” in Proc. 2nd Int. Workshop on Evaluation of Ontology-based Tools (EON2003), held at ISWC 2003, Oct. 20, 2003.

N. Chalortham, P. Leesawat, T. Ruangrajitpakorn and T. Supnithi, “A framework of ontology-based tablet production supporting system for a drug reformulation,” IEICE Trans. Inf. & Syst., vol. E94-D, no. 3, pp. 448–455, 2011.

T. Ruangrajitpakorn, C. Prombut and T. Supnithi, “A development of an ontology-based personalised web from rice knowledge website,” in Proc. 13th Int. Conf. on Knowledge, Information and Creativity Support Systems (KICSS 2018), pp. 1–6, 2018.

R. Navigli and S. Ponzetto, “BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network,” Artificial Intelligence, vol. 193, pp. 217–250, 2012.

M. Sabou, C. Wroe, C. Goble and G. Mishne, “Learning domain ontologies for web service descriptions: An experiment in bioinformatics,” in Proceedings of the 14th international conference on World Wide Web, Chiba, Japan, pp. 190-198, 2005.

P. Cimiano, A. Hotho and S. Staab, “Learning concept hierarchies from text corpora using formal concept analysis,” Journal of Artificial Intelligence Research, vol. 24, pp. 305–339, 2005.

P. Cimiano and J. V¨olker, “Text2Onto: A framework for ontology learning and data-driven change discovery,” in Proc. 10th Int. Conf. on Applications of Natural Language to Information Systems (NLDB), pp. 227-238, 2005.

H. Davalcu, S. Vadrevu, S. Nagarajan and I. V. Ramakrishnan, “OntoMiner: bootstrapping and populating ontologies from domain-specific Web sites,” in IEEE Intelligent Systems, vol. 18, no. 5, pp. 24-33, Sept.-Oct. 2003.

E. Maier, S. Streit, T. Diggelmann and M. Hoffleisch, “Learning a lightweight ontology for semantic retrieval in patient-centered information systems,” Int. Journal of Knowledge Management, vol. 7, no. 3, 2011.

F. Suchanek, G. Kasneci and G. Weikum, “Yago: A core of semantic knowledge,” in Proceedings of the 16th international conference on World Wide Web, pp. 697–706, 2004.

BabelNet, [Online]. Available: https: //babelnet.org.

D. Ferrucci et al., “Building Watson: An overview of the DeepQA project,” AI Magazine, vol. 31, no. 3, pp. 59–79, 2010.

T. Boi´nski and A. Ambro˙zewicz, “DBpedia and YAGO as knowledge base for natural language based question answering—The evaluation,” in Proc. Int. Conf. on Man-Machine Interactions 5 (ICMMI 2017), pp. 251-260, 2018.

S. Calegari and G. Pasi, “Personal ontologies: Generation of user profiles based on the YAGO ontology,” Information Processing & Management, vol. 49, no. 3, pp. 640–658, 2013.

A. Amalki, K. Tatane and A. Bouzit, “Deep learning-driven ontology learning: A systematic mapping study,” Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 9461–9468, 2025.

J. Chen, O. Mashkova, F. Zhapa-Camacho, R. Hoehndorf, Y. He and I. Horrocks, “Ontology Embedding: A Survey of Methods, Applications and Resources,” in IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 7, pp. 4193-4212, July 2025.

K. Yang, C. Huo, Y. Zhang, Q. Feng, and Y. Song, “TransBox: EL++-closed ontology embedding,” in Proc. 33rd ACM Int. Conf. on Information and Knowledge Management (CIKM ’24), ACM, 2024. [Online]. Available: https: //arxiv.org/abs/2410.14571.

S. B. Giglou, J. D’Souza, and S. Auer, “LLMs4OL: Large language models for ontology learning,” in Proc. 22nd Int. Semantic Web Conf. (ISWC 2023), CEUR-WS, pp. 1–13, 2023.

M. Azzi, “A methodology for building a medical ontology with limited domain experts’ involvement,” Digital Health and AI Ethics, vol. 5, no. 2, pp. 18–32, 2025.

C. Saetia et al., “Financial product ontology population with large language models,” in Proc. TextGraphs-17: Graph-based Methods for Natural Language Processing, Bangkok, Thailand, 2024, pp. 31–40. [Online]. Available: https: //aclanthology.org/2024.textgraphs-1.4.

PobPad Website, “Medicine information,” (in Thai). [Online]. Available: https://www. pobpad.com/ยา-a-z. [Accessed: Aug. 18, 2021].

National Electronics and Computer Technology Center, “Thai Lexeme Tokenizer,” [Online]. Available: http://www.sansarn.com/lexto/. [Accessed: Dec. 8, 2019].

V. A. Nunez, B. A. Hong and E. Ong, “Automatically extracting templates from examples for NLP tasks,” in Proc. 22nd Pacific Asia Conf. on Language, Information and Computation, Cebu, Philippines, pp. 452–459, 2008.

T. Ruangrajitpakorn, W. na Chai, P. Boonkwan, M. Boriboon and T. Supnithi, “The design of lexical information for Thai to English MT,” in Proc. SNLP 2007, Pattaya, Thailand, 2007.

P. Palingoon, P. Chantanapraiwan, S. Theerawattanasuk, T. Charoenporn and V. Sornlertlumvanich, “Qualitative and quantitative approaches in bilingual corpus-based dictionary,” in Proc. 5th Symp. on Natural Language Processing & Oriental COCOSDA Workshop 2002, 2002.

P. Patel-Schneider, P. Hayes and I. Horrocks, “OWL web ontology language semantics and abstract syntax,” W3C Recommendation, Feb. 2004. [Online]. Available: https://www.w3. org/TR/2004/REC-owl-semantics-20040210.

G. Klyne and J. Carroll, “Resource description framework (RDF): Concepts and abstract syntax,” W3C Recommendation, Feb. 2004. [Online]. Available: https://www.w3.org/TR/ 2004/REC-rdf-concepts-20040210.

OWL API, version 4.5.9. [Online]. Available: https://github.com/owlcs/owlapi. [Accessed: Jan. 22, 2021].

The Gene Ontology Consortium, “Gene ontology: tool for the unification of biology,” Nature Genetics, vol. 25, no. 1, pp. 25-29, May 2000.

M. Feld and C. M¨uller, “The Automotive Ontology: Managing knowledge inside the vehicle and sharing it between cars,” in Proc. 10th Int. Conf. Automotive User Interfaces and Interactive Vehicular Applications (Auto-UI 2010), Pittsburgh, USA, Nov. 2010.