Thesis: Text mining model for retrieval of explicit knowledge at Kenya coastal development project, Mombasa
Authors
Onkundi, Ednah NyakerarioAbstract
The study investigated the prospects of applying a Text Mining model in the retrieval of explicit knowledge at the Kenya Coastal Development Project (KCDP). The study’s main objective was to establish how a Text Mining model could be used in explicit knowledge retrieval at KCDP. The study identified text-mining techniques that could be used to develop a text-mining model, evaluate the model to be able to retrieve explicit knowledge at KCDP. The study targeted staff of the agencies that constituted the KCDP project which included, Kenya Marine and Fisheries Research Institute (KMFRI), Kenya Wildlife Service (KWS), State Department of Fisheries (SDF), Coastal Development Project (CDA), Department of Physical Planning, Kenya Forest Service (KFS) and National Environment Management Authority (NEMA). The study used the exploratory and experimental research design to be able to understand the research problem, answer the research objectives and questions. The total population of staff in the project was one hundred and fifty (150), out of which fifty two (52) were sampled. Purposive sampling was used to select samples from the representative groups that comprised the target population. Two methods of data collection were used namely; questionnaires and focus group discussion. The questionnaire was applied to members of staff in four major departments namely the top management, research and administration, knowledge management and finally the ICT department. The focus group discussion was applied to a special group in the knowledge management section. Content analysis was used to analyze the focus group discussions. Questionnaires were analyzed using the Statistical Package for the Social Sciences (SPSS) version 25 software. The use of questionnaires and focus groups were used to establish the current situation at the KCDP in terms of knowledge management systems in place and whether text mining could be used to retrieve explicit knowledge at KCDP. Text were collected from websites of organizations that took part in the KCDP project by using python libraries namely Python Request 2.22 and Beautiful Soup 3. The collected text was then summarized using text summarization algorithms used in the model like Luhnsummarizer, Lsansummarizer, Lexranksummarizer and Edmondsummarizer. After summarization topic, modelling was performed on the text collected using Latent Dirichlet Allocation (LDA) topic-modelling algorithm to create topics based on patterns in text. The model was then evaluated to establish its performance by measuring the four variables identified using precision and recall to measure accuracy, topic modelling to measure rate of similarity, and perplexity to measure evaluation of the model which gave a perplexity of -6.0455 from the text analyzed and modelled. It was concluded that text analysis could be used to analyze text and create explicit knowledge from both structured and unstructured data formats using the model. Future models should incorporate artificial intelligence into machine learning, so that semantics (i.e., English grammar) are deciphered and not only syntax of the language. The system should be willing to differentiate between “willing flesh” and “good meat”. The system should detect the intrinsic difference between the phrases “weak spirit” and “bad liquor”. This will help the system to avoid getting lost in translation via the use of synonyms and will incrementally rely on semantic, as facilitated by artificial intelligence
Cite this Publication
Usage Statistics
Files
- Total Views 86
- Total Downloads 164