Incorporating Rule-based Pattern Recognition Approach for Document Structure Classification on Cloud-based Document Management System
Numerous documents in the form of emails, business letters, reports and transactions among others are created and received by many organizations. Thus, organizing them is a challenge. There are various ways to organize documents such as according to the header, sender, or content. However, following the rules in organizing documents might not be consistent with organizations like schools as this may vary from one person to another and is prone to human errors. The manual organization also requires a lot of time and may lead to difficulty in finding the documents’ location. Thus, this study aimed to develop an intelligent document organizing system named Docudile. It is a self-organizing system that classifies each document and seamlessly places them in the computer directory using the rule-based pattern recognition for quick and accurate locating of documents. A cloud-based document management system with storage that syncs documents from local storage to cloud server was also developed to mitigate the inaccessibility of the documents when they are accessed from a remote area. Term Frequency-Inverse Document Frequency (TF-IDF) was used to retrieve the documents. Results showed that the system yielded 98 and 89% accuracy in classifying and retrieving the documents, respectively, based on the rule-based pattern recognition. Compared with Naïve Bayes Classifier and support vector machine accuracy results, it was found that using cosine feature similarity of the rule-based pattern recognition obtained a better accuracy in classifying school-related documents. Furthermore, this study recommends that supporting documents should go with the main document during the classification.