Building the Waray-waray Neural Language Model using Recurrent Neural Network

Authors

  • Fernando E. Quiroz, Jr. School of Technology and Computer Studies, Biliran Province State University, Naval, Biliran 6560 Philippines
  • Chona B. Sabinay School of Technology and Computer Studies, Biliran Province State University, Naval, Biliran 6560 Philippines
  • Jeneffer A. Sabonsolin School of Technology and Computer Studies, Biliran Province State University, Naval, Biliran 6560 Philippines

Keywords:

computational linguistic, language model, natural language processing, Waray-waray language

Abstract

In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.

Downloads

Published

2023-06-23