Library and Information Sciences

Library and Information Sciences

Development Iran's General Administrative Thesaurus based on Semantic Web Standards of SKOS/RDF Using Open-Source Software

Document Type : Original Article

Author
Assistant Professor, National Library and Archive of Iran, Tehran, Iran.
Abstract
Objective: This study aims to extract an optimal model for the implementation, management, and publication of Iran's General Administrative Thesaurus based on semantic web standards (SKOS/RDF) using open-source software.
Methodology: This research was conducted using the action research method within the framework of Lewin’s three-phase model (planning, execution, and evaluation). The research population comprises the Iran's General Administrative Thesaurus dataset. In the planning phase, the initial status of the thesaurus data was analyzed, and three models for data preparation were identified and evaluated. Following the analysis, an Excel template compatible with SKOS-Play was developed. In the execution phase, the data were organized in a spreadsheet and converted into RDF format. To ensure data validity, the SKOS-Play validation rules were reviewed, and identified errors were corrected. Subsequently, the thesaurus was uploaded into VocBench, processed, and exported in Turtle format.
Findings: The research findings led to the development of a six-stage model for managing and publishing the Iranian Public Administration Thesaurus. This model consists of: (1) Data preparation, (2) Conversion of the thesaurus dataset to RDF,
(3) Transfer of RDF data to VocBench, (4) Serialization of the thesaurus dataset in Turtle format, (5) Publication of the thesaurus dataset in Skosmos, and (6) Provision of access and retrieval services. The results showed that the thesaurus consists of 564 concepts, five main collections, and 18 sub-thesauri, comprising 3,136 RDF triples, with an average triple density of 5.56 per concept. Furthermore, three primary access methods were implemented: (1) Browsing via a web-based system, (2) Using a RESTful API, and (3) Executing semantic queries through SPARQL. Additionally, standard data formats including RDF, Turtle, N-Triples, and N-Quads were provided for data retrieval and integration into other systems.

Conclusion: The developed model in this research is a comprehensive and process-driven approach that can be generalized to other thesauri. The results indicated that the average RDF triple density of 5.56 in this thesaurus demonstrates a well-structured conceptual relationship, contributing to enhanced semantic search and information retrieval in the semantic web. Moreover, the hierarchical structure, the assignment of globally unique identifiers (URIs), and the resolution of technical challenges related to Persian language processing significantly improved the accuracy and efficiency of the thesaurus compared to similar projects. Additionally, the availability of multiple data formats (RDF, Turtle, N-Triples, N-Quads) and access via REST API and SPARQL facilitates the integration of thesaurus data into knowledge management systems. One of the key applications of this study is the integration of the thesaurus into administrative automation systems across the country, enabling interaction and gradual standardization of terminology within governmental organizations. This study demonstrated that adopting semantic web standards and open-source tools provides a sustainable and operational model for managing and publishing national thesauri, serving as a framework for future national projects in this domain.
 
Keywords

اکبری داریان، سعیده (دی 1399). ارائه مدل پیاده‌سازی اصطلاحنامه‌های سازمان اسناد وکتابخانه ملی ایران در چارچوب‌های وب معنایی SKOS/RDFدر محیط نرم‌افزارهای منبع‌باز ]طرح پژوهشی[. همکار طرح علیرضا انتهایی. سازمان اسناد و کتابخانه ملی ایران.
 
References
Akbari Daryan, T. (Jan. 2021). Providing an implementation model for thesauri of the National Library and Archives of Iran based on semantic web frameworks (SKOS/RDF) using open-source software [Research Project]. Collaborator: Alireza Entehai. National Library and Archives of Iran.  (in Persian)
Almeida, B., Freire, N., & Monteiro, D. V. (2021). The Development of the ROSSIO Thesaurus. In Proceedings of the 17th Italian Research Conference on Digital Libraries (pp. 138-146). CEUR Workshop Proceedings.
Assem, M. V., Menken, M. R., Schreiber, G., Wielemaker, J., & Wielinga, B. (2004, November). A method for converting thesauri to RDF/OWL. In International Semantic Web Conference (pp. 17-31). Springer, Berlin, Heidelberg.
Bakker, J. (2021). Agrovoc and Knowledge Management in Agriculture. 65-7.
Barbosa, E. R., Dutra, M. L., Godoy Viera, A. F., & Macedo, D. D. J. D. (2021). Thesaurus and subject heading lists as Linked Data. Transinformação, 33.
Barros, T. H. B., Bastos, C. M. C., & Santos, A. C. R. (2022). Knowledge organization systems in the archival science context: Methodological contributions for their development. Acervo, 35(2), 1–20.
FAO )2020(. The AGROVOC Editorial Guidelines. http://www.fao.org/3/cb2328en/cb2328en.pdf
Francart, T. (2017). UNESCO Thesaurus published with Semantic Web standards and Open-Source software. http://aims.fao.org/ru/news/unesco-thesaurus-published-semantic-web-standards-and-open-source-software
Frazier, P. J. (2015). SKOS: A Guide for Information Professionals: A Guide to Representing Structured Controlled Vocabularies in the Simple Knowledge Organization System. https://www.ala.org/alcts/resources/z687/skos
Gibbons, P., & Shenton, C. (2003). Implementing a records management strategy for the UK Parliament: The experience of using Keyword AAA. Journal of the Society of Archivists, 24(2), 141-157. https://doi.org/10.1080/0037981032000127016
Isaac, A., & Summers, E. (2009). SKOS simple knowledge organization system primer. W3C working group note. https://www.w3.org/TR/skos-primer/
 ISO 25964-1(2011). Thesauri and interoperability with other vocabularies. Part 1: Thesauri for information retrieval. Geneva: International Organization for Standards.
Miles, A., Bechhofer, S. (2009). SKOS Simple Knowledge Organization System
Reference.
https://www.w3.org/TR/skos-reference/
Miles, A., Rogers, N., & D. Beckett. Migrating Thesauri to the Se-mantic Web - Guidelines and case studies for generating RDF encodings of existing thesauri. Deliverable 8.8, SWAD-Europe, 2004. URL: http://www.w3.org/2001/sw/Europe/reports/thes/8.8
Martínez-González, M. M., & Alvite-Diez, M. L. (2019). Thesauri and Semantic Web: Discussion of the evolution of thesauri toward their integration with the Semantic Web. IEEE Access, 7, 153151-153170
Moore, M. (2010). Taxonomy usage and skills by Australian information professionals. Online Currents, 24(6), 301–313.
Pastor-Sánchez, J. A. (2016). Proposal to represent the UNESCO Thesaurus for the semantic web applying ISO-25964. Brazilian Journal of Information Studies: Research Trends, 10(1), 1-8.
Robinson, C. (1997). Records control and disposal using functional analysis. Archives and Manuscripts, 25(2), 289-303. https://doi.org/10.3316/ielapa.980706655
Smallwood, R. F. (2013). Managing electronic records: Methods, best practices, and technologies. Wiley
Van Assem, M., Malaisé, V., Miles, A., & Schreiber, G. (2006, June). A method to convert thesauri to SKOS. In European Semantic Web Conference (pp. 95-109). Springer, Berlin, Heidelberg.
State Records NSW. (2024). Keyword AAA: A thesaurus of general terms. Retrieved December 3, 2024, from https://staterecords.nsw.gov.au/recordkeeping/advice/records-classification/keyword-aaa?utm_source=chatgpt.com
Stellato, A., Rajbhandari, S., Turbati, A., Fiorelli, M., Caracciolo, C., Lorenzetti, T., Keizer, J., & Teresa, P. M. (2015, May). VocBench: a web application for collaborative development of multilingual thesauri. In European Semantic Web Conference (pp .38-53). Springer, Cham.