عنوان مقاله [English]
Objective:Since the discovery and representation of subjects contained in information resources is the most important goal in information description, analysis and retrieval systems, and since subject search is the most common type of user search in databases and library catalogs, in this study what is "subject" and how to analyze and extract subject matters in the process of determining subject keywords for an information resource, as well as some research and operational perspectives are discussed.
Methods:This conceptual paper uses documentary research method to examine the concepts of "subject" and "subject analysis", presented in a selection of related research works in the field of Information Science, as well as considering some corresponding concepts in the field of Computer Science and Natural Language Processing.
Results: Despite various interpretations of what "subject" is, which some assume to be an axiom, and its definition unnecessary, there is consensus, explicitly or implicitly, on its meaning of "aboutness". Subject analysis of information resources, as part of the indexing process, is the analysis and identification of stated topics and concepts and / or obvious features of the information source that may involve manpower, computer algorithms designed to identify textual terms, or the combination of the two is done. Each of these methods has its strengths and weaknesses. Search and retrieval problems due to the multiplicity of subject analysis tools and keyword assignment, low usage rate of standard descriptive schemas, inconsistencies between indexed terms assigned by a single indexer at different times, as well as between multiple indexers for the same information resource, human error in general, the inconsistency of users' search terms with the assigned keywords, and the lack of on time description of resources due to the growing production of them are among the weaknesses of the human subject analysis approach. The application of automated methods of artificial intelligence and natural language processing offers promising prospects for increasing speed and consistency in various processes of describing and organizing information, including the extraction of subject keywords. However, evaluating the efficiency of the output of these methods, alone or in comparison with the keywords of human production, specially from the perspective of users, is required.
Conclusion: In the competition of machine algorithms with the human mind in the analysis and recognition of resource subjects, the human mind excels; whether it is an indexer who expresses in his own language what the source is about, or a designer who can design a machine that mimics the computational steps of his mind to saves time and resources. Libraries and information centers can, far from rushing to replace conventional processes and procedures, use the various methods and tools available from the Natural Language Processing and Machine Learning fields to design and develop automated systems for indexing and extracting or assigning subject keywords - and in the larger perspective, automated classification- to use resources (financial, human, and time) for the ultimate goal of maximizing productivity, shortening the subject access path to resources for users, and facilitating the description process.