Language Threshold for Multilingual Sentiment Analysis System
DOI:
https://doi.org/10.11113/elektrika.v23n1.446Abstract
Code-mixed sentences are very common in social media platforms especially in countries such as Malaysia that have more than 2 speaking languages. Although multilingual Bidirectional Encoder Representations from Transformers (mBERT) has the capability of understanding multilingualism, the sentence embeddings obtained from mBERT can be very complex for a code-mixed sentence. This is a challenge in Natural Language processing when processing informal social media text due to its complexity, especially in mixed languages like Malay-English where there is an insufficient amount of training datasets available. Thus, this paper proposes a language threshold to translate the affected words or sentence into a single language sentence and relabel the language of the sentence. The result shows an increase of 8% in accuracy when translating affected words in a sentence at the 60% language threshold using SEC PCA-200.
Downloads
Published
How to Cite
Issue
Section
License
Copyright of articles that appear in Elektrika belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.