A semantic approach to automated design and construction of star schemas

Main Article Content

Non Sanprasit
Taravichet Titijaroonroj
Kraisak Kesorn
http://orcid.org/0000-0002-5195-8038

Abstract

Designing a star schema is a complex and time-consuming process requiring an expert to perform several tasks such as denormalization, dimension design, and construction of fact tables. This study presents a method to automatically design and generate star schema models, or so-called multidimensional models. We first introduce a method to incorporate a novel knowledge-based framework to enable an automation system to construct dimensional and fact tables as well as measures, which are the key elements of star schema models. The proposed framework provides a capability of column name identification using the arithmetic coding approach and measures identification using a natural language processing framework (NLP), resulting in dimensions and fact tables being constructed automatically without human intervention. Although the current version of our system is limited to reading data from semi-structured datasets such as CSV files and spreadsheets, the experimental results demonstrate that our framework can generate a star schema effectively, and can support online analytical processing (OLAP) operations. The experimental results show that our method is superior to other conventional approaches, achieving 96.67% accuracy for numerical data, higher than any of the prior models used for comparison.

Article Details

How to Cite
Sanprasit, N., Titijaroonroj, T., & Kesorn, K. (2021). A semantic approach to automated design and construction of star schemas. Engineering and Applied Science Research, 48(5), 518–528. Retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/243602
Section
ORIGINAL RESEARCH

References

Elamin E, Feki J. Toward an ontology based approach for data warehousing state of the art and proposal. The international Arab conference on information technology (ACIT2014); 2014 Dec 9-11; University of Nizwa, Oman. p. 170-9.

Abai NHZ, Yahaya JH, Deraman A. User requirement analysis in data warehouse design: a review. Proc Tech. 2013;11:801-6.

Romero O, Abello A. A framework for multidimensional design of data warehouses from ontologies. Data Knowl Eng 2010;69:1138-57.

Pardillo J, Mazon JN. Using ontologies for the design of data warehouses. Int J Database Manag Syst. 2011;3:73-87.

Hansen JB, Jensen S, Tarp M, Thomsen C. DWStar - automated star schema generation. Aalborg: Aalborg University; 2017.

Bimonte S, Antonelli L, Rizzi S. Requirements-driven data warehouse design based on enhanced pivot tables. Requir Eng. 2021;26(4):1-23.

Moukhi NE, El I, Mouloudi A, Elmounadi A. Merge of X-ETL and XCube towards a standard hybrid method for designing data warehouses. Int J Adv Comput Sci Appl. 2019;10(10):130-9.

Nebot V, Berlanga R, Perez JM, Aramburu MJ, Pedersen TB. Multidimensional integrated ontologies: a framework for designing semantic data warehouses. J Data Semant XIII. 2009;5530:1-36.

Nebot V, Berlanga R. Building data warehouses with semantic web data. Decis Support Syst. 2012;52:853-68.

Liu X, Iftikhar N. Ontology-based big dimension modeling in data warehouse schema design. In: Abramowicz W, editor. International conference on business information systems; 2013 Jun 19-21; Poznan, Poland. Berlin: Springer; 2013. p. 75-87.

Khouri S, Boukhari I, Bellatreche L, Sardet E, Jean S, Baron M. Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool. Comput Ind. 2012;63:799-812.

Witten IH, Neal RM, Cleary JG. Arithmetic coding for data compression. Commun ACM. 1987;30:520-40.

Howard PG, Vitter JS. Analysis of arithmetic coding for data compression. Inf Process Manag. 1992;28:749-63.

Rubin F. Arithmetic stream coding using fixed precision registers. IEEE Trans Inf Theory. 1979;25:672-5.

Bowman AW, Azzalini A. Applied smoothing techniques for data analysis. New York: Oxford University Press; 1997.

Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, et al. Spark SQL: Relational data processing in Spark. Proceedings of the 2015 ACM SIGMOD international conference on management of data; 2015 May 31 – Jun 4; Melbourne, Australia. New York: ACM; 2015. p. 1383-94.

Partalidou E, Xioufis ES, Doropoulos S, Vologiannidis S, Diamantaras K. Design and implementation of an open source Greek POS tagger and entity recognizer using spaCy. WI '19: IEEE/WIC/ACM International conference on web intelligence; 2019 Oct 14-17; Thessaloniki, Greece. New York: Association for Computing Machinery; 2019. p. 337-41.

Tarcar AK, Tiwari A, Dhaimodker V, Rebelo P, Desai R, Rao D. Healthcare NER models using language model pretraining. Health Search and Data Mining Workshop (HSDM 2020) in the 13th ACM International WSDM Conference (WSDM 2020); 2020 Feb; Houston, USA. p. 12-8.

Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38:39-41.

Mizoguchi R, Sunagawa E, Kozaki K, Kitamura Y. The model of roles within an ontology development tool: Hozo. Appl Ontol 2007;2(2):159-79.

Lumbantoruan R, Sibarani EM, Sitorus MV, Mindari A, Sinaga SP. An approach for automatically generate star schema from natural language. TELKOMNIKA (Telecommun Comput Electron Control). 2014;12:501-10.

Sehgal S, Ranga KK. Translation of entity relational model to dimensional model. Int J Comput Sci Mob Comput. 2016;5:439-47.

Bentayeb F, Maiz N, Mahboubi H, Favre C, Loudcher S, Harbi N, et al. Innovative approaches for efficiently warehousing complex data from the web. Data Min Concepts Methodol Tools Appl. 2013:1422-48.

Song IY, Khare R, Dai B. SAMSTAR: a semi-automated lexical method for generating star schemas from an entity-relationship diagram. DOLAP 2007, ACM 10th International workshop on data warehousing and OLAP; 2007 Nov 9; Lisbon, Portugal. New York: Association for Computing Machinery; 2007. p. 9-16.