Language Threshold for Multilingual Sentiment Analysis System

Authors

  • YU HENG KIT UNIVERSITI TEKNOLOGI MALAYSIA
  • Musa Mohd Mokji

DOI:

https://doi.org/10.11113/elektrika.v23n1.446

Abstract

Code-mixed sentences are very common in social media platforms especially in countries such as Malaysia that have more than 2 speaking languages. Although multilingual Bidirectional Encoder Representations from Transformers (mBERT) has the capability of understanding multilingualism, the sentence embeddings obtained from mBERT can be very complex for a code-mixed sentence. This is a challenge in Natural Language processing when processing informal social media text due to its complexity, especially in mixed languages like Malay-English where there is an insufficient amount of training datasets available. Thus, this paper proposes a language threshold to translate the affected words or sentence into a single language sentence and relabel the language of the sentence. The result shows an increase of 8% in accuracy when translating affected words in a sentence at the 60% language threshold using SEC PCA-200.

Downloads

Published

2024-04-27

How to Cite

KIT, Y. H., & Mohd Mokji, M. (2024). Language Threshold for Multilingual Sentiment Analysis System. ELEKTRIKA- Journal of Electrical Engineering, 23(1), 96–102. https://doi.org/10.11113/elektrika.v23n1.446

Issue

Section

Articles